Pages:
Author

Topic: Phoenix - Efficient, fast, modular miner - page 5. (Read 760690 times)

full member
Activity: 180
Merit: 100
January 12, 2012, 10:13:12 PM
Nice Upgrade.. Excellent Work..

I finally took the time to upgrade from 1.6.2 to 1.7.3..  Nice results..

Just for example, my test rig has a single 5850 and went from 398 -> 438 MH/s.  10% !!

Enigma

hero member
Activity: 769
Merit: 500
January 11, 2012, 11:52:28 AM
It really bothers me, the kernel you linked is one I know, it's an earlier version, who uses a different approach to compute the bases used in the kernel. Phateus introduced another method in his 2.x kernel, which I took and refined. But this seems to break uint3 support for my current version. Only other big difference I can think of is the use of an ulong as output buffer in my version, where your linked one still has an uint ... weird and it sucks ^^.

By the way, what have you added in 1.7.3? I downloaded and it works so far ... thanks.

Dia

I mainly intended to eliminate pyopencl as the source of the problem.

1.7.3 has a few fixes:
 - Updated phatk and poclbm to use bitselect() with BFI_INT disabled. This should help for GCN based GPUs (7970)
 - Modified WORKSIZE checking for poclbm, phatk, phatk2.
 - Disallowed -q 0 since it won't work correctly.

From what I have read bitselect() will properly compile to BFI_INT on GCN based GPUs without the crazy binary patching method required for VLIW5/VLIW4.

I'm pretty sure your change to bitselect() was not needed, because it should have worked before, too Wink. I was the one who wrote it that way.
But thanks for your hard work and the exchange we have, it's fun!

Dia
full member
Activity: 221
Merit: 100
January 11, 2012, 10:17:20 AM
It really bothers me, the kernel you linked is one I know, it's an earlier version, who uses a different approach to compute the bases used in the kernel. Phateus introduced another method in his 2.x kernel, which I took and refined. But this seems to break uint3 support for my current version. Only other big difference I can think of is the use of an ulong as output buffer in my version, where your linked one still has an uint ... weird and it sucks ^^.

By the way, what have you added in 1.7.3? I downloaded and it works so far ... thanks.

Dia

I mainly intended to eliminate pyopencl as the source of the problem.

1.7.3 has a few fixes:
 - Updated phatk and poclbm to use bitselect() with BFI_INT disabled. This should help for GCN based GPUs (7970)
 - Modified WORKSIZE checking for poclbm, phatk, phatk2.
 - Disallowed -q 0 since it won't work correctly.

From what I have read bitselect() will properly compile to BFI_INT on GCN based GPUs without the crazy binary patching method required for VLIW5/VLIW4.

Crashes for me on win7 64bit
full member
Activity: 219
Merit: 120
January 10, 2012, 12:33:33 AM
It really bothers me, the kernel you linked is one I know, it's an earlier version, who uses a different approach to compute the bases used in the kernel. Phateus introduced another method in his 2.x kernel, which I took and refined. But this seems to break uint3 support for my current version. Only other big difference I can think of is the use of an ulong as output buffer in my version, where your linked one still has an uint ... weird and it sucks ^^.

By the way, what have you added in 1.7.3? I downloaded and it works so far ... thanks.

Dia

I mainly intended to eliminate pyopencl as the source of the problem.

1.7.3 has a few fixes:
 - Updated phatk and poclbm to use bitselect() with BFI_INT disabled. This should help for GCN based GPUs (7970)
 - Modified WORKSIZE checking for poclbm, phatk, phatk2.
 - Disallowed -q 0 since it won't work correctly.

From what I have read bitselect() will properly compile to BFI_INT on GCN based GPUs without the crazy binary patching method required for VLIW5/VLIW4.
hero member
Activity: 769
Merit: 500
January 09, 2012, 04:24:26 PM
It really bothers me, the kernel you linked is one I know, it's an earlier version, who uses a different approach to compute the bases used in the kernel. Phateus introduced another method in his 2.x kernel, which I took and refined. But this seems to break uint3 support for my current version. Only other big difference I can think of is the use of an ulong as output buffer in my version, where your linked one still has an uint ... weird and it sucks ^^.

By the way, what have you added in 1.7.3? I downloaded and it works so far ... thanks.

Dia
full member
Activity: 219
Merit: 120
January 08, 2012, 07:51:39 PM
Using Phoenix 1.72 with 12.1 Preview drivers and 2.6 SDK - poclbm Aggression 7 - Worksize 64 - I'm getting ~433 mhash on my HD 6970 at Stock settings and voltage.... I'm delighted with those results, esp. considering that it's not using 100% of a CPu core now either. Smiley  

poclbm kernel is at least 10 MH/s slower than my phatk version on my system with 6950, strange.

@jedi: Any reason to not use the latest pyOpenCL version for Phoenix? Perhaps there lies the vec3 problem? I read a bit in it's documentation and it seems a few commands Phoenix uses are now deprecated.

Dia

I know for a fact that uint3 vectors works under this version of pyOpenCL. I have been able to get the stock phatk to run with uint3 vectors enabled. The problems only occurs when I try to adjust the rateDivisor in the Python portion of the kernel. Dividing 2^N by 3 always has a remainder, so you end up sending bad work to the kernel.

This should at least give you a starting point:
https://github.com/downloads/jedi95/Phoenix-Miner/kernel.cl

The only change I made to the Python portion of the kernel is setting it to pass -DVECTORS3 to the compiler. Also note that without changing the rateDivisor your reported hashrate is going to be 2/3 of the actual value. Obviously this won't work well for mining as-is, but it at least it demonstrates that uint3 vectors work.
sr. member
Activity: 1246
Merit: 274
January 08, 2012, 07:11:58 PM
Using Phoenix 1.72 with 12.1 Preview drivers and 2.6 SDK - poclbm Aggression 7 - Worksize 64 - I'm getting ~433 mhash on my HD 6970 at Stock settings and voltage.... I'm delighted with those results, esp. considering that it's not using 100% of a CPu core now either. Smiley  

poclbm kernel is at least 10 MH/s slower than my phatk version on my system with 6950, strange.



I was surprised to, as it seems that most people are using phatk or phatk2.  I haven't mined with that particular card (it's my gaming machine) in about 3 months, but I never got anything over 390-395 before, using GUIMiner.  I decided to mine again with it and was only getting ~366 mhash, so I tried out phoenix 1.72 and after a little tinkering and testing had my jaw drop when I saw it cranking out 433 or so mhash... going strong for over 24 hrs now too. Smiley
hero member
Activity: 769
Merit: 500
January 08, 2012, 05:08:35 PM
Using Phoenix 1.72 with 12.1 Preview drivers and 2.6 SDK - poclbm Aggression 7 - Worksize 64 - I'm getting ~433 mhash on my HD 6970 at Stock settings and voltage.... I'm delighted with those results, esp. considering that it's not using 100% of a CPu core now either. Smiley  

poclbm kernel is at least 10 MH/s slower than my phatk version on my system with 6950, strange.

@jedi: Any reason to not use the latest pyOpenCL version for Phoenix? Perhaps there lies the vec3 problem? I read a bit in it's documentation and it seems a few commands Phoenix uses are now deprecated.

Dia
sr. member
Activity: 1246
Merit: 274
January 08, 2012, 03:13:05 PM
Using Phoenix 1.72 with 12.1 Preview drivers and 2.6 SDK - poclbm Aggression 7 - Worksize 64 - I'm getting ~433 mhash on my HD 6970 at Stock settings and voltage.... I'm delighted with those results, esp. considering that it's not using 100% of a CPu core now either. Smiley 
full member
Activity: 219
Merit: 120
January 08, 2012, 12:27:24 PM
You can also use -q 0 to disable the queue entirely.

Hmmm... at -q 0 Phoenix (current git running on Windows 7 x64, 64-bit Python 2.7.2) just keeps complaining that the queue is empty and never starts any actual mining.

The queue issue seems to be the reason why I get relatively high amounts of rejects when running against a local proxy; the proxy adds another layer of pre-fetching which of course makes the issue worse (and which you, jedi95, aren't responsible for at all).

I can confirm this, -q 0 is a no got currently.

Dia

Sorry about that, I forgot that -q 0 only works in my current development version. It's an ugly workaround so I can test some things for rollntime.

The other solution is to use an askrate, which will reduce the problem.

Askrate can be specified by using this format for HTTP connections:
http://login:[email protected]:8332/;askrate=20
sr. member
Activity: 274
Merit: 250
January 08, 2012, 11:30:55 AM
@jedi95
 Sorry for my error, i have mssspeled. Of course i ment VECTORS.
hero member
Activity: 769
Merit: 500
January 08, 2012, 09:37:41 AM
You can also use -q 0 to disable the queue entirely.

Hmmm... at -q 0 Phoenix (current git running on Windows 7 x64, 64-bit Python 2.7.2) just keeps complaining that the queue is empty and never starts any actual mining.

The queue issue seems to be the reason why I get relatively high amounts of rejects when running against a local proxy; the proxy adds another layer of pre-fetching which of course makes the issue worse (and which you, jedi95, aren't responsible for at all).

I can confirm this, -q 0 is a no got currently.

Dia
member
Activity: 78
Merit: 10
January 08, 2012, 08:42:42 AM
You can also use -q 0 to disable the queue entirely.

Hmmm... at -q 0 Phoenix (current git running on Windows 7 x64, 64-bit Python 2.7.2) just keeps complaining that the queue is empty and never starts any actual mining.

The queue issue seems to be the reason why I get relatively high amounts of rejects when running against a local proxy; the proxy adds another layer of pre-fetching which of course makes the issue worse (and which you, jedi95, aren't responsible for at all).
hero member
Activity: 769
Merit: 500
January 08, 2012, 08:24:52 AM
If NVIDIA has OpenCL 1.1 drivers, they should support uint3 ... but as I observed, too, Phoenix crashes in
Code:
self.kernel.search(...)
. Perhaps a certain parameter is not valid there. I had
Code:
self.size = (nonceRange.size / rateDivisor) / self.iterations
in my mind, because for VECTORS3 rateDivisor is 3 and so self.size is not correct, because nonceRange.size is perhaps not dividable by 3?

Dia
full member
Activity: 219
Merit: 120
January 08, 2012, 07:50:11 AM
Any1 can tell me where can i find fastest kernel to use witn 5870 ?
Currently i`m using stock 1.7 phatk2 from github with those param:
VACTORS AGGRESSION=8 BFI_INT FASTLOOP=false WORKSIZE=256

As far as I know, phatk2 is the fastest kernel for a 5870 on Phoenix when using SDK 2.4 or 2.5. I have only used 2.4 myself, so there might be a better option for SDK 2.5.

The settings to use depend on your memory clock. If you are running stock or near stock (1200MHz) then WORKSIZE=128 will give better performance. If you have downclocked your memory significantly (to 300MHz, for example) then WORKSIZE=256 will be faster.

If that 5870 isn't used for anything besides mining, I would recommend a bit higher AGGRESSION. (10 or so) If not, then I recommend that you don't disable FASTLOOP since it is beneficial at AGGRESSION 8 on a 5870.

Finally, it's VECTORS, not VACTORS



Thanks for this explanation, so I was on the right track. Are you trying to implement a solution for this currently or is there anything we / I can do to assist you there? Have you got an idea for the vec3 stuff not working?

I'm now trying AGRESSION=10, will report back after Phoenix ran a bit with that setting.

Edit 1: That's another error and has nothing to do with the discussed problem, right?
Code:
[08/01/2012 12:45:35] TypeError in RPC sendResult callback
[08/01/2012 12:45:35] Result 000000001fa9ed72... rejected

Thanks,
Dia

I am working on the problem, but I want to take the time to do it correctly instead of putting together a quick hacked-together fix. Thanks for the offer to help, but I think I have a pretty good idea of how I want to implement this.

I had a look at your VECTORS3 code, but I couldn't find any issues at first glance. Testing it is somewhat problematic because the computer I am working with only has a single GTX 580. All my 5870s are in dedicated mining systems which are not suitable for in-depth kernel debugging.

I can run VECTORS, VECTORS4, and no vectors on my GTX 580, but VECTORS3 gives me this:
Code:
---  ---
  File "C:\Program Files (x86)\Python\lib\site-packages\twisted\python\threadpool.py", line 207, in_worker
    result = context.call(ctx, function, *args, **kwargs)
  File "C:\Program Files (x86)\Python\lib\site-packages\twisted\python\context.py", line 118, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "C:\Program Files (x86)\Python\lib\site-packages\twisted\python\context.py", line 81, in callWithContext
    return func(*args,**kw)
  File "kernels\phatk\__init__.py", line 442, in mineThread
    self.output_buf)
  File "C:\Program Files (x86)\Python\lib\site-packages\pyopencl\__init__.py", line 204, in kernel_call
    self.set_args(*args)
  File "C:\Program Files (x86)\Python\lib\site-packages\pyopencl\__init__.py", line 245, in kernel_set_args
    % (i+1, str(e)))
pyopencl.LogicError: when processing argument #16 (1-based): clSetKernelArg failed: invalid arg size
hero member
Activity: 769
Merit: 500
January 08, 2012, 07:43:15 AM
I'm using 1.7.2 with an own kernel version and 1/3 of the shares for my 6550D (A8-3850 APU) are rejected with:
Code:
[07/01/2012 16:15:34] Reject reason: unknown-work
[07/01/2012 16:15:34] Result 00000000ab246f85... rejected

What does this mean? Agression to high? Is the valid nonce invalid, because processing took too long?
My command line is:
Code:
-a 50 -k phatk AGGRESSION=12 DEVICE=1 FASTLOOP=false VECTORS2 WORKSIZE=64

Code:
[65.86 Mhash/sec] [103 Accepted] [33 Rejected] [RPC (+LP)]

Thanks,
Dia

Btw.: Anyone here, who is able to debug why I can't seem to get 3-component vectors working (see: http://www.mediafire.com/?r3n2m5s2y2b32d9 and use VECTORS3).

It looks like you have run into the same fundamental problem I am trying to address before I can implement rollntime.

Essentially the server is rejecting your work because it is too old, not because kernel processing took too long. At 65 Mhash/sec, it will take just over a minute to process an entire WorkUnit of 2^32 nonces. Now, this work is not necessarily "stale" because it's from the current block and it meets the target specified by the server. This is made worse by the fact that Phoenix maintains a second WorkUnit in the queue by default. On a 5870 that can check the entire 2^32 space in 10-11 seconds, this is beneficial because it allows the miner to continue running during momentary connection disruptions or server load spikes. However, on hardware that takes over a minute to process 2^32 nonces, this can cause shares to be submitted from WorkUnits that are over 2 minutes old. This won't cause problems when the block changes because all work from previous block will be discarded immediately.

Rollntime introduces the same problems, since even very fast miners can now function for 30-60+ seconds on a single WorkUnit. The queue behavior needs to be adjusted to account for this. Most likely the solution will involve monitoring how much time is left until the miner runs out of work, and fetching more work when it falls below some value.

The message you are getting is sent by the pool server using x-reject-reason. I'm going to guess that the pool you are connecting to "forgets" about work it has assigned after some amount of time. (probably to save resources) I recommend trying other pools to see if you get the same behavior.

That said, aggression 12 on hardware doing only 65 Mhash/sec is probably not a good idea. Your kernel execution times are going to be something like ~4 seconds. This is problematic because you can't reasonably interrupt kernel executions to switch work. (eg: when the block changes) On average this is going to mean 2 seconds of wasted time per block (which amounts to a 0.33% loss) Try aggression 10 or so, which should limit this effect without much of a hashrate drop.

You can also use -q 0 to disable the queue entirely. This cuts your work age in half compared to the default of 1, but you might see the miner idling once per minute when it needs new work. At aggression 10 (kernel execution time 1 second) this shouldn't be a problem under normal cases. This allows for enough time to fetch new work before the miner runs out completely. However, this doesn't work at low aggression where the kernel execution time is a small fraction of a second.

Thanks for this explanation, so I was on the right track. Are you trying to implement a solution for this currently or is there anything we / I can do to assist you there? Have you got an idea for the vec3 stuff not working?

I'm now trying AGRESSION=10, will report back after Phoenix ran a bit with that setting.

Edit 1: That's another error and has nothing to do with the discussed problem, right?
Code:
[08/01/2012 12:45:35] TypeError in RPC sendResult callback
[08/01/2012 12:45:35] Result 000000001fa9ed72... rejected

Thanks,
Dia
sr. member
Activity: 274
Merit: 250
January 08, 2012, 02:33:58 AM
Any1 can tell me where can i find fastest kernel to use witn 5870 ?
Currently i`m using stock 1.7 phatk2 from github with those param:
VACTORS AGGRESSION=8 BFI_INT FASTLOOP=false WORKSIZE=256
full member
Activity: 219
Merit: 120
January 07, 2012, 04:48:39 PM
I'm using 1.7.2 with an own kernel version and 1/3 of the shares for my 6550D (A8-3850 APU) are rejected with:
Code:
[07/01/2012 16:15:34] Reject reason: unknown-work
[07/01/2012 16:15:34] Result 00000000ab246f85... rejected

What does this mean? Agression to high? Is the valid nonce invalid, because processing took too long?
My command line is:
Code:
-a 50 -k phatk AGGRESSION=12 DEVICE=1 FASTLOOP=false VECTORS2 WORKSIZE=64

Code:
[65.86 Mhash/sec] [103 Accepted] [33 Rejected] [RPC (+LP)]

Thanks,
Dia

Btw.: Anyone here, who is able to debug why I can't seem to get 3-component vectors working (see: http://www.mediafire.com/?r3n2m5s2y2b32d9 and use VECTORS3).

It looks like you have run into the same fundamental problem I am trying to address before I can implement rollntime.

Essentially the server is rejecting your work because it is too old, not because kernel processing took too long. At 65 Mhash/sec, it will take just over a minute to process an entire WorkUnit of 2^32 nonces. Now, this work is not necessarily "stale" because it's from the current block and it meets the target specified by the server. This is made worse by the fact that Phoenix maintains a second WorkUnit in the queue by default. On a 5870 that can check the entire 2^32 space in 10-11 seconds, this is beneficial because it allows the miner to continue running during momentary connection disruptions or server load spikes. However, on hardware that takes over a minute to process 2^32 nonces, this can cause shares to be submitted from WorkUnits that are over 2 minutes old. This won't cause problems when the block changes because all work from previous block will be discarded immediately.

Rollntime introduces the same problems, since even very fast miners can now function for 30-60+ seconds on a single WorkUnit. The queue behavior needs to be adjusted to account for this. Most likely the solution will involve monitoring how much time is left until the miner runs out of work, and fetching more work when it falls below some value.

The message you are getting is sent by the pool server using x-reject-reason. I'm going to guess that the pool you are connecting to "forgets" about work it has assigned after some amount of time. (probably to save resources) I recommend trying other pools to see if you get the same behavior.

That said, aggression 12 on hardware doing only 65 Mhash/sec is probably not a good idea. Your kernel execution times are going to be something like ~4 seconds. This is problematic because you can't reasonably interrupt kernel executions to switch work. (eg: when the block changes) On average this is going to mean 2 seconds of wasted time per block (which amounts to a 0.33% loss) Try aggression 10 or so, which should limit this effect without much of a hashrate drop.

You can also use -q 0 to disable the queue entirely. This cuts your work age in half compared to the default of 1, but you might see the miner idling once per minute when it needs new work. At aggression 10 (kernel execution time 1 second) this shouldn't be a problem under normal cases. This allows for enough time to fetch new work before the miner runs out completely. However, this doesn't work at low aggression where the kernel execution time is a small fraction of a second.
hero member
Activity: 769
Merit: 500
January 07, 2012, 11:22:18 AM
I'm using 1.7.2 with an own kernel version and 1/3 of the shares for my 6550D (A8-3850 APU) are rejected with:
Code:
[07/01/2012 16:15:34] Reject reason: unknown-work
[07/01/2012 16:15:34] Result 00000000ab246f85... rejected

What does this mean? Agression to high? Is the valid nonce invalid, because processing took too long?
My command line is:
Code:
-a 50 -k phatk AGGRESSION=12 DEVICE=1 FASTLOOP=false VECTORS2 WORKSIZE=64

Code:
[65.86 Mhash/sec] [103 Accepted] [33 Rejected] [RPC (+LP)]

Thanks,
Dia

Btw.: Anyone here, who is able to debug why I can't seem to get 3-component vectors working (see: http://www.mediafire.com/?r3n2m5s2y2b32d9 and use VECTORS3).
full member
Activity: 221
Merit: 100
January 01, 2012, 06:21:45 PM

other way around phoenix counts the callback share as a reject
deepbit counted them as accepted

I can't seem to make this happen running directly from source or with the compiled binaries.

This is what I get:
Code:
C:\phoenix-1.7.2>phoenix.exe -u http://[email protected]_0:[email protected]:8332-k poclbm PLATFORM=1 DEVICE=0 VECTORS WORKSIZE=256 AGGRESSION=2
[01/01/2012 14:37:13] Phoenix v1.7.2 starting...
[01/01/2012 14:37:13] Connected to server
[01/01/2012 14:37:17] Result: 577a1849 accepted
[01/01/2012 14:37:54] Result: d7086c70 accepted
[01/01/2012 14:38:09] LP: New work pushed
[01/01/2012 14:40:13] Result: 6c9dac94 accepted
[01/01/2012 14:40:52] Result: 8597e915 accepted
[01/01/2012 14:41:29] Result: b471f52b accepted
[01/01/2012 14:43:05] Result: 7b804d65 accepted
[01/01/2012 14:43:20] Result: 6fa8cd29 accepted
[01/01/2012 14:43:41] Re
sult: 0d44ad94 accepted
[01/01/2012 14:43:44] Result: d3257977 accepted
[01/01/2012 14:43:58] Result: ff050303 accepted
[01/01/2012 14:44:30] Result: 532bb58a accepted
[01/01/2012 14:44:48] Result: 601ae9a6 accepted
[01/01/2012 14:45:11] Result: 14ec7578 accepted
[01/01/2012 14:45:24] Result: 625a3fa9 accepted
[01/01/2012 14:46:47] Result: 2799a017 accepted
[01/01/2012 14:47:12] Result: 0cb8b48c accepted
[01/01/2012 14:47:23] Result: e51ebf99 accepted
[01/01/2012 14:47:31] Result: c70a50bb accepted
[01/01/2012 14:47:34] Result: 73235812 accepted
[01/01/2012 14:48:18] Result: c82089b9 accepted
[01/01/2012 14:48:42] Result: e85f85a8 accepted
[01/01/2012 14:49:23] Result: 2122b1b1 accepted
[01/01/2012 14:49:31] Result: 531f1a43 accepted
[01/01/2012 14:50:06] Result: baed89e0 accepted
[01/01/2012 14:50:09] Result: 5875f08c accepted
[01/01/2012 14:51:20] Result: 5db1e47a accepted
[01/01/2012 14:51:43] Result: a93c95dc accepted
[01/01/2012 14:52:23] Result: cb70f3bb accepted
[01/01/2012 14:52:28] Result: 71fa528c accepted
[138.48 Mhash/sec] [29 Accepted] [0 Rejected] [RPC (+LP)]

That's just a quick example, but I have other miners that have been running for a couple days on Deepbit without displaying this problem either.

don't know what to tell you

maybe it nrollover shit

needless to say you & deepbit need to get it fixed
Pages:
Jump to: