Pages:
Author

Topic: Modified Kernel for Phoenix 1.5 - page 6. (Read 96725 times)

donator
Activity: 2352
Merit: 1060
between a rock and a block!
August 05, 2011, 01:53:01 PM
I am getting rejects and hardware problem errors since the modification of __init__.py. Diapolo's 7-11 version is the last stable version for me on my 2 5830's @ 1000/350 stock voltage. I switched back to 7-11 version last night and today on BTCGuild I am back to showing 4500 (31, 0.68%) on one card and 4324 (27, 0.62%). I am getting 320mh/s with the 7-11 version and was getting 324mh/s with your phatk 2.1. The number of stales/rejects showing on BTCGuild and in phoenix log after a short period of mining was up to and over 3% on both cards yesterday. If you would like I can let one card run each version to compare the difference. I can also provide a log of phoenix if needed. Not for certain about this, but I believe I was only getting the "hardware problem?" error from diapolo's versions after 7-11 and  your phatk2.0. With phatk2.1 I saw the introduction of the rejected shares. I can provide any info if needed. I assume the kernel is pushing the card too hard as Diapolo mentioned earlier in his thread. I have no idea why it is affecting some and not others.

I had to reduce the OC by 10, kept same memory settings with my 5830s using 2.1
sr. member
Activity: 362
Merit: 250
August 05, 2011, 01:50:17 PM
DELETED for privacy
sr. member
Activity: 476
Merit: 250
moOo
August 04, 2011, 10:28:59 PM
well crap Phateus, I guess I owe ya an apology. 300 shares, no rejects. it must have been a bad day on the pools i was on.
member
Activity: 77
Merit: 10
August 04, 2011, 10:00:23 PM
I like the VECTORS4 feature, it gives me extra 5Mhash/s using SDK2.5
sr. member
Activity: 476
Merit: 250
moOo
August 04, 2011, 09:39:40 PM
I'll give it another try.. and use the verbose tag to see what is going on.
right now i have 2 rejects over 360 shares on diablos newest 8-4 version.
3 different pools, both rejects at the same pool, all 3 have over 100 shares.
30 shares with yours 2.1 and no rejects which looks good so far.. I'll let you know when i get up over 300, maybe it was a fluke as some of my pools had connection issues.
legendary
Activity: 1512
Merit: 1036
August 04, 2011, 09:15:00 PM
If it's relavant, I have not had any increase in stales. GUIMiner v2011-07-01, built-in phoenix, phatk 2.1, catalyst 11.7, 2.5 SDK, 1 5870 + 5 5830's using these extra flags in GUI miner: -k phatk VECTORS BFI_INT WORKSIZE=256 FASTLOOP=false AGGRESSION=14

Unless you use the -v flag for verbose logging in phoenix, set your console window so it has a log of thousands of lines you can scroll back through, and look for the "Result didn't meet full difficulty, not sending" error message, you wouldn't see any difference.
legendary
Activity: 1344
Merit: 1004
August 04, 2011, 08:51:10 PM
If it's relavant, I have not had any increase in stales. GUIMiner v2011-07-01, built-in phoenix, phatk 2.1, catalyst 11.7, 2.5 SDK, 1 5870 + 5 5830's using these extra flags in GUI miner: -k phatk VECTORS BFI_INT WORKSIZE=256 FASTLOOP=false AGGRESSION=14
newbie
Activity: 52
Merit: 0
August 04, 2011, 05:15:06 PM
Quote
The output that I pastebinned is the standard console output of phoenix in -v verbose mode
Oh, thanks, I didn't even know you could do that... I'll do some testing with that

I think I'm going to have to download the source code for Phoenix and see what is actually happening...

Quote
I am running two miners per GPU not for some random reason, but because it works. With the right card/overclock/OS/etc, I seem to get a measurable improvement in total hashrate vs one miner (documented by using the phoenix -a average flag with a very long time period and letting the miners run days). The only way my results could not be true would be if the time-slicing between two miners messes up the hashrate calculation displayed, but if this was true, such a bug would present with multiple-gpu systems running one phoenix per gpu too.

With only a 1% improvement from a kernel that works for me, reducing the aggression or running one miner would put the phatk 2.1 performance below what I already had. Putting back the diapolo kernel, I'm back to below 2500/100 on my miners.

I agree totally, go with what works.  I am just trying to figure all this out.  Thanks for all your help.

-Phateus
member
Activity: 77
Merit: 10
August 04, 2011, 04:14:48 PM
Do you think VLIW4 is a step backward from VLIW5?

VLIW4 is slower than VLIW5 in many computational tasks
legendary
Activity: 1512
Merit: 1036
August 04, 2011, 04:04:00 PM
@deepceleron

That is completely baffling to me... I use the EXACT same code to determine which hashes to send as diapolo (from the poclbm kernel).

How are you getting the hash results?

If they are from the server, then i'm pretty sure they never will be 00000.... because the rejection comes because of a mismatch in state data, so the hash comes out different on the server than on your client, right?

If they are from the Client, are you running a modified version of phoenix?  I don't think the stock phoenix logs that information.  If you are, could you post details, so I can look into the bug.

Quote
Two miner instances per GPU.
Why are you running 2 instances per GPU?  That seems like it would just increase overhead and double the amount of stales.  Try only running 1 instance per GPU and perhaps decreasing the AGGRESSION from 13 to 12.  If that doesn't fix it, I'm not sure what else I can do without further information.

Anyone else getting this bug?


The output that I pastebinned is the standard console output of phoenix in -v verbose mode, I just highlighted the screen output on my console (with a 3000 line buffer) and copy-pasted it. It includes the first eight bytes of the hash in the results as you can see.

Actually when I said that it was unmodified phoenix that I was running, I lied, by forgetting I had done this modification at line 236 in KernelInterface.py (because of a difficulty bug in a namecoin pool I was previously using):

Original:
        if self.checkTarget(hash, nr.unit.target):
            formattedResult = pack('<76sI', nr.unit.data[:76], nonce)
            d = self.miner.connection.sendResult(formattedResult)
            def callback(accepted):
                self.miner.logger.reportFound(hash, accepted)
            d.addCallback(callback)
            return True
        else:
            self.miner.logger.reportDebug("Result didn't meet full "
                   "difficulty, not sending")
            return False

Mine:
        formattedResult = pack('<76sI', nr.unit.data[:76], nonce)
        d = self.miner.connection.sendResult(formattedResult)
        def callback(accepted):
            self.miner.logger.reportFound(hash, accepted)
        d.addCallback(callback)
        return True


All I've done is remove the second difficulty check in phoenix, and trust that the kernel is returning only valid difficulty 1 shares. Now, instead of spitting out an error "Result didn't meet full difficulty, not sending", phoenix sends on all results returned by the kernel to the pool. Without this mod, logs of your kernel would just show a "didn't meet full difficulty" error message instead of rejects from the pool, which would still be a problem (but the helpful hash value wouldn't be printed for debugging). We can see from the hash value that the bad results are nowhere near a valid share.

This code mod only exposes a problem in the kernel optimization code, that sometimes wild hashes are being returned by the kernel from some bad math (or by the kernel code being vulnerable to some overclocking glitch that no other kernel activates.) Are these just "extra" hashes that are leaking though, or is the number of valid shares being returned by the kernel lower too - hard to tell without a very long statistics run.

I am running two miners per GPU not for some random reason, but because it works. With the right card/overclock/OS/etc, I seem to get a measurable improvement in total hashrate vs one miner (documented by using the phoenix -a average flag with a very long time period and letting the miners run days). The only way my results could not be true would be if the time-slicing between two miners messes up the hashrate calculation displayed, but if this was true, such a bug would present with multiple-gpu systems running one phoenix per gpu too.

With only a 1% improvement from a kernel that works for me, reducing the aggression or running one miner would put the phatk 2.1 performance below what I already had. Putting back the diapolo kernel, I'm back to below 2500/100 on my miners.

My python lib versions are documented here.

Joulesbeef:
I don't like the word 'stales' for rejected shares unless it specifically refer to shares rejected at a block change because they were obsolete when submitted to a pool, as logged by pushpool. The results I have above are not stale work, they are invalid hashes.
newbie
Activity: 52
Merit: 0
August 04, 2011, 03:30:41 PM
@joulesbeef

Hmm... this might be a really hard bug to find.  If anyone has any ideas...
At first I was thinking it was because I compare the nonce to 0, but that would only give false negatives (1 in every 4 billion nonce will not be found)
The main difference between mine and diapolo's init file is that I pack 2 bases together and send them to the kernel.  I may try to get rid of the Base variable altogether and just use the offset parameter of the EnqueueKernel() command (I think you can do that in pyopencl)... 
Basically just thinking out loud... Undecided
If i didn't love low level programming so much, I think I would shoot myself  Tongue

@Diapolo

Yeah, the unreleased version I am working on uses 20 registers (It performs about the same as a configuration which uses 19 but has 2 more ALU OPs)
Also, are you getting increased number of stales now that you have implemented some of the optimizations from phatk?
sr. member
Activity: 476
Merit: 250
moOo
August 04, 2011, 02:57:53 PM
Yeah Phatk i hate to say it but I am having similar issues as deepceleron.

I started to notice an uptick in stales, I thought it was due to our proxy as we had problems before and we update it a lot.
about 3-5% across the board

i reverted back to dia 7-17 for the the past 10 hours, and I have less than 1% stales.. which is normal for me.
Using a 5830, 2.4 11.6 win7 32

phoenix,  guiminer  VECTORS BFI_INT -k phatk FASTLOOP=false WORKSIZE=256 AGGRESSION=12 -q2
hero member
Activity: 772
Merit: 500
August 04, 2011, 02:43:07 PM
@Phat:

What is your experience with SDK 2.5 so far? It seems to behave somewhat odd in terms of expected vs. real performance (KernelAnalyzer vs. Phoenix).
Do you use the CAL 11.7 profile for ALU OP usage information or an earlier version?

Dia

Yeah... not sure... I wasn't paying that much attention to in 2.4, but what I've noticed is that using less registers improves performance.  Mainly, I've noticed that with heavy register usage (or high WORKSIZE numbers), the MH/s became more and more dependent on memory speed (probably the main reason why VECTORS4 performs terribly at low memory speeds).

But, overall I didn't even know I had 2.5, so clearly I didn't really notice a difference, lol.

As of the newest edit on the first page, I am using CAL 11.7.


On a side note, my card exploded (well, the fan died) yesterday, so my work may be slowed a little.  I'm not saying this has anything to do with joulesbeef's cat sicking up on the carpet, but cats are a crafty bunch...

In terms of efficiency one has to consider if a higer RAM frequency is worth it, becaus the cards draws much more power with a higher mem clock :-/. The sweet spot for my 5870 and 5830 seems to be @ 350 MHz Mem.

Hope you get a new card soon Smiley!

Dia
legendary
Activity: 1855
Merit: 1016
August 04, 2011, 01:17:14 PM
My always sweet spot for 5870 is memory clock is equal to core clock divided by three.
mem= core/3 = 975/3=325

Would this hold true for 5850 and/or 5830s?
Only trial & error will tell.
My sweet spot for 6870 is mem clk = (core clk/3) + 14.
I havn't tested sweet spot for 6970 yet, since my mother board is in repair for the past 7 days & when i was mining Linux didn't allowed to under clock not more than core clock minus 125 Mhz.
I hope Windows 11.8 will give correct sweet spot for 6970, which i know once i got my mother board back.
newbie
Activity: 52
Merit: 0
August 04, 2011, 12:33:39 PM
@Phat:

What is your experience with SDK 2.5 so far? It seems to behave somewhat odd in terms of expected vs. real performance (KernelAnalyzer vs. Phoenix).
Do you use the CAL 11.7 profile for ALU OP usage information or an earlier version?

Dia

Yeah... not sure... I wasn't paying that much attention to in 2.4, but what I've noticed is that using less registers improves performance.  Mainly, I've noticed that with heavy register usage (or high WORKSIZE numbers), the MH/s became more and more dependent on memory speed (probably the main reason why VECTORS4 performs terribly at low memory speeds).

But, overall I didn't even know I had 2.5, so clearly I didn't really notice a difference, lol.

As of the newest edit on the first page, I am using CAL 11.7.


On a side note, my card exploded (well, the fan died) yesterday, so my work may be slowed a little.  I'm not saying this has anything to do with joulesbeef's cat sicking up on the carpet, but cats are a crafty bunch...
newbie
Activity: 52
Merit: 0
August 04, 2011, 12:18:10 PM
@deepceleron

That is completely baffling to me... I use the EXACT same code to determine which hashes to send as diapolo (from the poclbm kernel).

How are you getting the hash results?

If they are from the server, then i'm pretty sure they never will be 00000.... because the rejection comes because of a mismatch in state data, so the hash comes out different on the server than on your client, right?

If they are from the Client, are you running a modified version of phoenix?  I don't think the stock phoenix logs that information.  If you are, could you post details, so I can look into the bug.

Quote
Two miner instances per GPU.
Why are you running 2 instances per GPU?  That seems like it would just increase overhead and double the amount of stales.  Try only running 1 instance per GPU and perhaps decreasing the AGGRESSION from 13 to 12.  If that doesn't fix it, I'm not sure what else I can do without further information.

Anyone else getting this bug?
full member
Activity: 182
Merit: 100
August 04, 2011, 11:58:13 AM
Mike, did you ever figure out that problem you had with MSI Afterburner?  I think you posted on my thread as well about same issue I had...
Which thread are you talking about? I know I made a thread the other day in support about afterbuner freezing when I hit apply for my 5850's but cant remember what your thread was. If your talking about the freezing I haven't had a chance to test any further with the 5850's because I literally spent from the time I got out of work to like 1 am working on a skeleton case and trying to figure out why the psu was making a clicking noise.

Also I know you got the ncixus 5850's as well whats your top speed on those so far? I can get up to 395ish with stock voltage.
donator
Activity: 2352
Merit: 1060
between a rock and a block!
August 04, 2011, 11:35:46 AM
My always sweet spot for 5870 is memory clock is equal to core clock divided by three.
mem= core/3 = 975/3=325

Would this hold true for 5850 and/or 5830s?
I'd say it hold fairly true for my 5830's I get my best megahash as 1030/350.

I've only done minor tests with my 5850's but I have a really hard time getting the mem clock down because it starts to crash my system when I do that.

Mike, did you ever figure out that problem you had with MSI Afterburner?  I think you posted on my thread as well about same issue I had...
full member
Activity: 182
Merit: 100
August 04, 2011, 11:33:01 AM
My always sweet spot for 5870 is memory clock is equal to core clock divided by three.
mem= core/3 = 975/3=325

Would this hold true for 5850 and/or 5830s?
I'd say it hold fairly true for my 5830's I get my best megahash as 1030/350.

I've only done minor tests with my 5850's but I have a really hard time getting the mem clock down because it starts to crash my system when I do that.
donator
Activity: 2352
Merit: 1060
between a rock and a block!
August 04, 2011, 11:16:57 AM
My always sweet spot for 5870 is memory clock is equal to core clock divided by three.
mem= core/3 = 975/3=325

Would this hold true for 5850 and/or 5830s?
Pages:
Jump to: