Pages:
Author

Topic: Phoenix 2 beta discussion - page 4. (Read 58046 times)

sr. member
Activity: 435
Merit: 250
February 13, 2012, 11:21:46 PM
Code:
[04:20:30] [Cypress 0] Error: Device returned hash with difficulty < 1

Unusable for P2Pool, then?
newbie
Activity: 46
Merit: 0
February 13, 2012, 07:58:21 AM
This is a good point, might as well just use .s0 - .sf
sr. member
Activity: 378
Merit: 250
February 13, 2012, 05:26:10 AM
#99
Supposing we did fit two found nonce values into a single nonce, could the miner output them both from a uint2?  And the v.s0 is the same as v.x for uint2 or uint4 as well.  When you get into uint8 or above, they become explicitly v.s0 etc.  They're really just vector locations.
But would the miner know to cut the nonce apart or would it expect a single uint?

Yes there is a way to output directly 4 values

and again it can't be a uint, it has to be a float

with float4 v.s0 and v.x are not the same, I don't think at least since, it's still a 4 component variable (float4)

Why should .s0 and .x be not the same? For 4-component vectors it's possible to use .x .y .z .w or s0 s1 s2 s3 as per OpenCL spec.
To oputput 2 nonce values in one variable we can use ulong, that's what's currently used in DiaKGCN or my last phatk_dia (I did change that though in my last internal version). To output 4 values this could be done via vstore() I guess.

Dia
Dia, didn't you use a conditional statement that directly output the nonce to the miner instead of storing them and then outputting them at the end?
Why should .s0 and .x be not the same? For 4-component vectors it's possible to use .x .y .z .w or s0 s1 s2 s3 as per OpenCL spec.
To oputput 2 nonce values in one variable we can use ulong, that's what's currently used in DiaKGCN or my last phatk_dia (I did change that though in my last internal version). To output 4 values this could be done via vstore() I guess.

Dia

My mistake it is the same, just be careful when using .lo or .hi functions as if you have a float2 .lo can mean .x or .s0, of course you can always use .lo.x if you want to be extra specific.

Sure you can use ulong, and it takes an extra 10+ instructions to output the nonces while you upsample it, instead of outputting all nonces at once with just one output function.

A vstore() writes the vector data to memory, and a vload() reads it from memory. Probably with Async copies and Prefetch it can be done
In 2 vectors, .x is the same as .s0, .even and .lo.  However, the only reason to use the last two is if you have code designed to handle multiple different vector types and you need to output the even or lower half of the results and don't want to rewrite the entire code for each vector type.
I tend to just bother with the .s# version as it's easier and the other appears to be becoming legacy (matter of opinion of course).  Just remember that the vector after .s9 is .sa as it's numbered in hexidecimal.
newbie
Activity: 46
Merit: 0
February 13, 2012, 03:00:47 AM
#98
Why should .s0 and .x be not the same? For 4-component vectors it's possible to use .x .y .z .w or s0 s1 s2 s3 as per OpenCL spec.
To oputput 2 nonce values in one variable we can use ulong, that's what's currently used in DiaKGCN or my last phatk_dia (I did change that though in my last internal version). To output 4 values this could be done via vstore() I guess.

Dia

My mistake it is the same, just be careful when using .lo or .hi functions as if you have a float2 .lo can mean .x or .s0, of course you can always use .lo.x if you want to be extra specific.

Sure you can use ulong, and it takes an extra 10+ instructions to output the nonces while you upsample it, instead of outputting all nonces at once with just one output function.

A vstore() writes the vector data to memory, and a vload() reads it from memory. Probably with Async copies and Prefetch it can be done
hero member
Activity: 772
Merit: 500
February 12, 2012, 04:00:41 PM
#97
Supposing we did fit two found nonce values into a single nonce, could the miner output them both from a uint2?  And the v.s0 is the same as v.x for uint2 or uint4 as well.  When you get into uint8 or above, they become explicitly v.s0 etc.  They're really just vector locations.
But would the miner know to cut the nonce apart or would it expect a single uint?

Yes there is a way to output directly 4 values

and again it can't be a uint, it has to be a float

with float4 v.s0 and v.x are not the same, I don't think at least since, it's still a 4 component variable (float4)

Why should .s0 and .x be not the same? For 4-component vectors it's possible to use .x .y .z .w or s0 s1 s2 s3 as per OpenCL spec.
To oputput 2 nonce values in one variable we can use ulong, that's what's currently used in DiaKGCN or my last phatk_dia (I did change that though in my last internal version). To output 4 values this could be done via vstore() I guess.

Dia
newbie
Activity: 46
Merit: 0
February 12, 2012, 03:47:14 PM
#96
Supposing we did fit two found nonce values into a single nonce, could the miner output them both from a uint2?  And the v.s0 is the same as v.x for uint2 or uint4 as well.  When you get into uint8 or above, they become explicitly v.s0 etc.  They're really just vector locations.
But would the miner know to cut the nonce apart or would it expect a single uint?

Yes there is a way to output directly 4 values

and again it can't be a uint, it has to be a float

with float4 v.s0 and v.x are not the same, I don't think at least since, it's still a 4 component variable (float4)
full member
Activity: 219
Merit: 120
February 11, 2012, 02:42:35 PM
#95
Edited: my bad, don't forget "agression" != "aggression" Smiley

Hash rate still seems a bit lower than phoenix 1.7.5 on an OC'd 6850


Is there any possibility to select the pool from command line?
Either passing the full url, or a configuration key name would be great.


There are a few ways to do this.

First, the only argument Phoenix 2 accepts on the command line is the path to its config file. If none is supplied, it defaults to phoenix.cfg in the current working directory. Using this method you could have several config files (one per pool) and simply switch between them.

The second method is to use Phoenix 2's RPC interface to switch pools at runtime. The following code is an example of how to do this in Python:
Code:
import jsonrpc
sp = jsonrpc.ServiceProxy('http://x:phoenix@localhost:7780')
sp.setconfig('general', 'backend', 'http://user:[email protected]:8332')

This will cause Phoenix to switch to the new pool immediately.
newbie
Activity: 25
Merit: 0
February 11, 2012, 10:28:00 AM
#94
Edited: my bad, don't forget "agression" != "aggression" Smiley

Hash rate still seems a bit lower than phoenix 1.7.5 on an OC'd 6850


Is there any possibility to select the pool from command line?
Either passing the full url, or a configuration key name would be great.
sr. member
Activity: 378
Merit: 250
February 11, 2012, 04:40:15 AM
#93
I don't think that autoconfig actually sets Vector and worksize properly anyway.  When I used it all it set was the phatk2 kernel usage and no other parameters.

Speaking of Phatk2, I think something needs to be done with the nonce determination at the end of the program.  You see, it compares the different vectors of v and g for equivalence and then sets nonce to the corresponding value of W[3]'s vector.  Unfortunately, nonce is a uint and can contain only one vector's value.  This makes the assumption that only a single vector will be okay to use.  And, should more than one vector be proper to utilize, nonce is set to the highest vector's value.  So, one of two things needs to be done:
1)  Only check vectors as long as nonce remains 0.  Any remaining checks will be wasted.
2)  Expand nonce to the size of the vectors used and allow the use of multiple nonce, if found, to increase efficiency.

In any case, the nonce determination code needs to be modified.  I suggest doing a 64-bit atom_and of v and g and then pulling the values of W[3] out based upon each vector in v having all bits equal to 1.  This still has the problem of having multiple nonce possible though.  But it will only be comparing bits in v to 1.  This also mean that we can skip the nonce check all together in the case that we have more results without nonce than with by testing if v=0.  If v=0, don't bother pulling out the nonce values, because they're not there.  If every bit in a vector of v is not 1, don't pull out the corresponding value in W[3].  It saves a heck of a lot of time.

I made a miscalculation here.  Even if we AND the vectors, some bits will still probably match which means it still needs to pull the vectors apart to check each one against the constant of all 1 bits.  Too bad we can't AND or XOR an entire vector to equal 1 or 0 that I know of.

You can AND or XOR whatever you want if it's set up to do so
T atom_and (Q T*p, T val) | Read, Store (*p & val)
T atom_xor (Q T*p, T val) | Read, Store(*p ^ val)

The reason you can't expand the nonce is because the base is a uint (as you pointed out), only with floats or double can you do that

If you have float4 v; corresponding values are (v.x, v.s0), (v.y, v.s1), (v.z, v.s2), (v.w, v.s3)

Then just next them together into v.xyzw or v.s0123
or v.lo (v.s01, v.xy) or v.hi (v.s23, v.zw) or v.odd(v.s13, v.yw) or v.even(v.s02, v.xz)
or whatever combination you want

There is a way to directly test sign bits btw,
intn signbit(floatn)

There is also a separate function to test if all bits are 1 compared to a constant

You can test for finite values, +infinity or -infinity, NaN, do a bitselect or a select function for vector types

Unfortunately the _init_.py does not have any of it set up properly, the way the buffer is created, stored, and read is a very slow method compared to what is possible now

Also, there is a way to create an offset function natively as well plus many more options

Supposing we did fit two found nonce values into a single nonce, could the miner output them both from a uint2?  And the v.s0 is the same as v.x for uint2 or uint4 as well.  When you get into uint8 or above, they become explicitly v.s0 etc.  They're really just vector locations.
But would the miner know to cut the nonce apart or would it expect a single uint?
newbie
Activity: 46
Merit: 0
February 10, 2012, 09:58:15 PM
#92
I don't think that autoconfig actually sets Vector and worksize properly anyway.  When I used it all it set was the phatk2 kernel usage and no other parameters.

Speaking of Phatk2, I think something needs to be done with the nonce determination at the end of the program.  You see, it compares the different vectors of v and g for equivalence and then sets nonce to the corresponding value of W[3]'s vector.  Unfortunately, nonce is a uint and can contain only one vector's value.  This makes the assumption that only a single vector will be okay to use.  And, should more than one vector be proper to utilize, nonce is set to the highest vector's value.  So, one of two things needs to be done:
1)  Only check vectors as long as nonce remains 0.  Any remaining checks will be wasted.
2)  Expand nonce to the size of the vectors used and allow the use of multiple nonce, if found, to increase efficiency.

In any case, the nonce determination code needs to be modified.  I suggest doing a 64-bit atom_and of v and g and then pulling the values of W[3] out based upon each vector in v having all bits equal to 1.  This still has the problem of having multiple nonce possible though.  But it will only be comparing bits in v to 1.  This also mean that we can skip the nonce check all together in the case that we have more results without nonce than with by testing if v=0.  If v=0, don't bother pulling out the nonce values, because they're not there.  If every bit in a vector of v is not 1, don't pull out the corresponding value in W[3].  It saves a heck of a lot of time.

I made a miscalculation here.  Even if we AND the vectors, some bits will still probably match which means it still needs to pull the vectors apart to check each one against the constant of all 1 bits.  Too bad we can't AND or XOR an entire vector to equal 1 or 0 that I know of.

You can AND or XOR whatever you want if it's set up to do so
T atom_and (Q T*p, T val) | Read, Store (*p & val)
T atom_xor (Q T*p, T val) | Read, Store(*p ^ val)

The reason you can't expand the nonce is because the base is a uint (as you pointed out), only with floats or double can you do that

If you have float4 v; corresponding values are (v.x, v.s0), (v.y, v.s1), (v.z, v.s2), (v.w, v.s3)

Then just next them together into v.xyzw or v.s0123
or v.lo (v.s01, v.xy) or v.hi (v.s23, v.zw) or v.odd(v.s13, v.yw) or v.even(v.s02, v.xz)
or whatever combination you want

There is a way to directly test sign bits btw,
intn signbit(floatn)

There is also a separate function to test if all bits are 1 compared to a constant

You can test for finite values, +infinity or -infinity, NaN, do a bitselect or a select function for vector types

Unfortunately the _init_.py does not have any of it set up properly, the way the buffer is created, stored, and read is a very slow method compared to what is possible now

Also, there is a way to create an offset function natively as well plus many more options
sr. member
Activity: 378
Merit: 250
February 10, 2012, 01:06:59 AM
#91
6550D:


7950:


I set an affinity to only one of my 4 CPU cores and utilisation jumps between 100% and 50% for that single core.
The CPU device was disabled via:
Code:
[cl:0:2]
disabled = true

Utilisation for the GPUs looks very weird...

Autoconfiguration for both GPUs lead to a higher Hashrate, but 7970 has a GPU load only at 75% max. I guess there is a problem with setting different worksizes or vector widths for different GPUs here and that autoconfig sets the same values for both devices, which is not optimal, too and only makes the problem less obvious.

Dia
I don't think that autoconfig actually sets Vector and worksize properly anyway.  When I used it all it set was the phatk2 kernel usage and no other parameters.

Speaking of Phatk2, I think something needs to be done with the nonce determination at the end of the program.  You see, it compares the different vectors of v and g for equivalence and then sets nonce to the corresponding value of W[3]'s vector.  Unfortunately, nonce is a uint and can contain only one vector's value.  This makes the assumption that only a single vector will be okay to use.  And, should more than one vector be proper to utilize, nonce is set to the highest vector's value.  So, one of two things needs to be done:
1)  Only check vectors as long as nonce remains 0.  Any remaining checks will be wasted.
2)  Expand nonce to the size of the vectors used and allow the use of multiple nonce, if found, to increase efficiency.

In any case, the nonce determination code needs to be modified.  I suggest doing a 64-bit atom_and of v and g and then pulling the values of W[3] out based upon each vector in v having all bits equal to 1.  This still has the problem of having multiple nonce possible though.  But it will only be comparing bits in v to 1.  This also mean that we can skip the nonce check all together in the case that we have more results without nonce than with by testing if v=0.  If v=0, don't bother pulling out the nonce values, because they're not there.  If every bit in a vector of v is not 1, don't pull out the corresponding value in W[3].  It saves a heck of a lot of time.

I made a miscalculation here.  Even if we AND the vectors, some bits will still probably match which means it still needs to pull the vectors apart to check each one against the constant of all 1 bits.  Too bad we can't AND or XOR an entire vector to equal 1 or 0 that I know of.
hero member
Activity: 772
Merit: 500
February 09, 2012, 03:28:41 AM
#90
These pictures are not phatk, but DiaKGCN!

6550D:


7950:


I set an affinity to only one of my 4 CPU cores and utilisation jumps between 100% and 50% for that single core.
The CPU device was disabled via:
Code:
[cl:0:2]
disabled = true

Utilisation for the GPUs looks very weird...

Autoconfiguration for both GPUs lead to a higher Hashrate, but 7970 has a GPU load only at 75% max. I guess there is a problem with setting different worksizes or vector widths for different GPUs here and that autoconfig sets the same values for both devices, which is not optimal, too and only makes the problem less obvious.

Dia
full member
Activity: 219
Merit: 120
February 09, 2012, 02:48:56 AM
#89
Hashrate display is fixed on latest build, good job!

Another problem I observed with my own kernel (DiaKGCN), if I use a 7970 (Tahiti - GCN) seperate everything is okay, if I use a 6550D (BeaverCreek - VLIW5) everything is okay, but if I try to use both of them together there seems to be a problem.

Code:
[general]
autodetect = +cl -cpu
backend = http://XYZ:[email protected]
ratesamples = 100
verbose = true

[cl:0:0]
disabled = false
kernel = diakgcn
aggression = 12
goffset = true
vectors2 = true
vectors4 = false
vectors8 = false
worksize = 256

[cl:0:1]
disabled = false
kernel = diakgcn
aggression = 12
goffset = true
vectors2 = false
vectors4 = false
vectors8 = true
worksize = 128

Hashrate for each device is ~540 MH/s + 60 MH/s, which should lead to ~600 MH/s for the above config. Real displayed rate is only 114 MH/s, so it seems the kernel does not use the supplied settings for each device but perhaps uses only the last supplied parameters (here for [cl:0:1]. Could well be a problem of my init / kernel, but could also be a general problem. Any ideas?

Dia

I'm going to need more info to figure this one out. Multiple kernels is working fine on one of my rigs with a 5870 + 5830.

Things to check:
Can you try this using opencl and/or phatk2 kernels for both devices?
What load % are you getting on each GPU?
Is the CPU load % being maxed out? (perhaps a bug in the CPU detection code you submitted?) "autodetect = +cl -cpu" would use the CPU if the detection code doesn't work.
hero member
Activity: 772
Merit: 500
February 09, 2012, 02:35:21 AM
#88
Hashrate display is fixed on latest build, good job!

Another problem I observed with my own kernel (DiaKGCN), if I use a 7970 (Tahiti - GCN) seperate everything is okay, if I use a 6550D (BeaverCreek - VLIW5) everything is okay, but if I try to use both of them together there seems to be a problem.

Code:
[general]
autodetect = +cl -cpu
backend = http://XYZ:[email protected]
ratesamples = 100
verbose = true

[cl:0:0]
disabled = false
kernel = diakgcn
aggression = 12
goffset = true
vectors2 = true
vectors4 = false
vectors8 = false
worksize = 256

[cl:0:1]
disabled = false
kernel = diakgcn
aggression = 12
goffset = true
vectors2 = false
vectors4 = false
vectors8 = true
worksize = 128

Hashrate for each device is ~540 MH/s + 60 MH/s, which should lead to ~600 MH/s for the above config. Real displayed rate is only 114 MH/s, so it seems the kernel does not use the supplied settings for each device but perhaps uses only the last supplied parameters (here for [cl:0:1]. Could well be a problem of my init / kernel, but could also be a general problem. Any ideas?

Dia
legendary
Activity: 1344
Merit: 1004
February 08, 2012, 11:47:29 PM
#87

CFSWorks, could you please verify the crc of current windows build of phoenix.exe is 89B9736A ? I keep getting told that the windows build was updated but i'm just not seeing it because the "new" and "old" crc32s are identical, so I don't believe anything was updated/changed.

I just checked - the zipfile itself is identical. I'm going to go badger jedi95 to compile a new one.

Thanks! Smiley
member
Activity: 63
Merit: 10
February 08, 2012, 11:01:49 PM
#86
-- snip -- (accidentally quoted the main post)
member
Activity: 63
Merit: 10
February 08, 2012, 10:49:59 PM
#85

CFSWorks, could you please verify the crc of current windows build of phoenix.exe is 89B9736A ? I keep getting told that the windows build was updated but i'm just not seeing it because the "new" and "old" crc32s are identical, so I don't believe anything was updated/changed.

I just checked - the zipfile itself is identical. I'm going to go badger jedi95 to compile a new one.
member
Activity: 63
Merit: 10
February 08, 2012, 10:43:57 PM
#84
What I doing wrong? Ubuntu:

I'm getting what looks to be the same issue as mich, I'm on Debian 6:

Sorry about that! Apparently I don't know how to use setuptools correctly. Cheesy

I just pushed a fix for that problem into the Github repository. You can try that if you like.

Because o has to come before p, opencl before phatk2. Maybe you should rename it 00-opencl. Wink

A better fix for that problem is to use a function which loads the kernel if it hasn't already been imported. I'm working on that now...
legendary
Activity: 1344
Merit: 1004
February 08, 2012, 10:32:01 PM
#83
Instead of the program creating a blank config file and aborting, that would be a good opportunity to run a little wizard questionnaire to set up the program:

The plan is actually to tell the user to fire up a web browser and have PhoenixWeb run the first-time setup wizard. The only trouble is that PhoenixWeb still needs a lot of work before it's ready to be included in the standard Phoenix download. (Right now it only has enough there for a "real" web developer to pick it up and work on it...)

CFSWorks, could you please verify the crc of current windows build of phoenix.exe is 89B9736A ? I keep getting told that the windows build was updated but i'm just not seeing it because the "new" and "old" crc32s are identical, so I don't believe anything was updated/changed.
member
Activity: 63
Merit: 10
February 08, 2012, 10:27:43 PM
#82
Instead of the program creating a blank config file and aborting, that would be a good opportunity to run a little wizard questionnaire to set up the program:

The plan is actually to tell the user to fire up a web browser and have PhoenixWeb run the first-time setup wizard. The only trouble is that PhoenixWeb still needs a lot of work before it's ready to be included in the standard Phoenix download. (Right now it only has enough there for a "real" web developer to pick it up and work on it...)
Pages:
Jump to: