Modified Kernel for Phoenix 1.5 - page 6.

CanaryInTheMine

donator

Activity: 2352

Merit: 1060

between a rock and a block!

Quote from: navigator on August 05, 2011, 01:50:17 PM

I am getting rejects and hardware problem errors since the modification of __init__.py. Diapolo's 7-11 version is the last stable version for me on my 2 5830's @ 1000/350 stock voltage. I switched back to 7-11 version last night and today on BTCGuild I am back to showing 4500 (31, 0.68%) on one card and 4324 (27, 0.62%). I am getting 320mh/s with the 7-11 version and was getting 324mh/s with your phatk 2.1. The number of stales/rejects showing on BTCGuild and in phoenix log after a short period of mining was up to and over 3% on both cards yesterday. If you would like I can let one card run each version to compare the difference. I can also provide a log of phoenix if needed. Not for certain about this, but I believe I was only getting the "hardware problem?" error from diapolo's versions after 7-11 and your phatk2.0. With phatk2.1 I saw the introduction of the rejected shares. I can provide any info if needed. I assume the kernel is pushing the card too hard as Diapolo mentioned earlier in his thread. I have no idea why it is affecting some and not others.

I had to reduce the OC by 10, kept same memory settings with my 5830s using 2.1

navigator

sr. member

Activity: 362

Merit: 250

DELETED for privacy

joulesbeef

sr. member

Activity: 476

Merit: 250

moOo

well crap Phateus, I guess I owe ya an apology. 300 shares, no rejects. it must have been a bad day on the pools i was on.

BOARBEAR

member

Activity: 77

Merit: 10

I like the VECTORS4 feature, it gives me extra 5Mhash/s using SDK2.5

joulesbeef

sr. member

Activity: 476

Merit: 250

moOo

I'll give it another try.. and use the verbose tag to see what is going on.
right now i have 2 rejects over 360 shares on diablos newest 8-4 version.
3 different pools, both rejects at the same pool, all 3 have over 100 shares.
30 shares with yours 2.1 and no rejects which looks good so far.. I'll let you know when i get up over 300, maybe it was a fluke as some of my pools had connection issues.

deepceleron

legendary

Activity: 1512

Merit: 1036

Quote from: ssateneth on August 04, 2011, 08:51:10 PM

If it's relavant, I have not had any increase in stales. GUIMiner v2011-07-01, built-in phoenix, phatk 2.1, catalyst 11.7, 2.5 SDK, 1 5870 + 5 5830's using these extra flags in GUI miner: -k phatk VECTORS BFI_INT WORKSIZE=256 FASTLOOP=false AGGRESSION=14

Unless you use the -v flag for verbose logging in phoenix, set your console window so it has a log of thousands of lines you can scroll back through, and look for the "Result didn't meet full difficulty, not sending" error message, you wouldn't see any difference.

ssateneth

legendary

Activity: 1344

Merit: 1004

If it's relavant, I have not had any increase in stales. GUIMiner v2011-07-01, built-in phoenix, phatk 2.1, catalyst 11.7, 2.5 SDK, 1 5870 + 5 5830's using these extra flags in GUI miner: -k phatk VECTORS BFI_INT WORKSIZE=256 FASTLOOP=false AGGRESSION=14

Phateus

newbie

Activity: 52

Merit: 0

Quote

The output that I pastebinned is the standard console output of phoenix in -v verbose mode

Oh, thanks, I didn't even know you could do that... I'll do some testing with that

I think I'm going to have to download the source code for Phoenix and see what is actually happening...

Quote

I am running two miners per GPU not for some random reason, but because it works. With the right card/overclock/OS/etc, I seem to get a measurable improvement in total hashrate vs one miner (documented by using the phoenix -a average flag with a very long time period and letting the miners run days). The only way my results could not be true would be if the time-slicing between two miners messes up the hashrate calculation displayed, but if this was true, such a bug would present with multiple-gpu systems running one phoenix per gpu too.

With only a 1% improvement from a kernel that works for me, reducing the aggression or running one miner would put the phatk 2.1 performance below what I already had. Putting back the diapolo kernel, I'm back to below 2500/100 on my miners.

I agree totally, go with what works. I am just trying to figure all this out. Thanks for all your help.

-Phateus

BOARBEAR

member

Activity: 77

Merit: 10

Do you think VLIW4 is a step backward from VLIW5?

VLIW4 is slower than VLIW5 in many computational tasks

deepceleron

legendary

Activity: 1512

Merit: 1036

Quote from: Phateus on August 04, 2011, 12:18:10 PM

@deepceleron

That is completely baffling to me... I use the EXACT same code to determine which hashes to send as diapolo (from the poclbm kernel).

How are you getting the hash results?

If they are from the server, then i'm pretty sure they never will be 00000.... because the rejection comes because of a mismatch in state data, so the hash comes out different on the server than on your client, right?

If they are from the Client, are you running a modified version of phoenix? I don't think the stock phoenix logs that information. If you are, could you post details, so I can look into the bug.

Quote

Two miner instances per GPU.

Why are you running 2 instances per GPU? That seems like it would just increase overhead and double the amount of stales. Try only running 1 instance per GPU and perhaps decreasing the AGGRESSION from 13 to 12. If that doesn't fix it, I'm not sure what else I can do without further information.

Anyone else getting this bug?

The output that I pastebinned is the standard console output of phoenix in -v verbose mode, I just highlighted the screen output on my console (with a 3000 line buffer) and copy-pasted it. It includes the first eight bytes of the hash in the results as you can see.

Actually when I said that it was unmodified phoenix that I was running, I lied, by forgetting I had done this modification at line 236 in KernelInterface.py (because of a difficulty bug in a namecoin pool I was previously using):

Original:

        if self.checkTarget(hash, nr.unit.target):
            formattedResult = pack('<76sI', nr.unit.data[:76], nonce)
            d = self.miner.connection.sendResult(formattedResult)
            def callback(accepted):
                self.miner.logger.reportFound(hash, accepted)
            d.addCallback(callback)
            return True
        else:
            self.miner.logger.reportDebug("Result didn't meet full "
                   "difficulty, not sending")
            return False

Mine:

        formattedResult = pack('<76sI', nr.unit.data[:76], nonce)
        d = self.miner.connection.sendResult(formattedResult)
        def callback(accepted):
            self.miner.logger.reportFound(hash, accepted)
        d.addCallback(callback)
        return True

All I've done is remove the second difficulty check in phoenix, and trust that the kernel is returning only valid difficulty 1 shares. Now, instead of spitting out an error "Result didn't meet full difficulty, not sending", phoenix sends on all results returned by the kernel to the pool. Without this mod, logs of your kernel would just show a "didn't meet full difficulty" error message instead of rejects from the pool, which would still be a problem (but the helpful hash value wouldn't be printed for debugging). We can see from the hash value that the bad results are nowhere near a valid share.

This code mod only exposes a problem in the kernel optimization code, that sometimes wild hashes are being returned by the kernel from some bad math (or by the kernel code being vulnerable to some overclocking glitch that no other kernel activates.) Are these just "extra" hashes that are leaking though, or is the number of valid shares being returned by the kernel lower too - hard to tell without a very long statistics run.

I am running two miners per GPU not for some random reason, but because it works. With the right card/overclock/OS/etc, I seem to get a measurable improvement in total hashrate vs one miner (documented by using the phoenix -a average flag with a very long time period and letting the miners run days). The only way my results could not be true would be if the time-slicing between two miners messes up the hashrate calculation displayed, but if this was true, such a bug would present with multiple-gpu systems running one phoenix per gpu too.

With only a 1% improvement from a kernel that works for me, reducing the aggression or running one miner would put the phatk 2.1 performance below what I already had. Putting back the diapolo kernel, I'm back to below 2500/100 on my miners.

My python lib versions are documented here.

Joulesbeef:
I don't like the word 'stales' for rejected shares unless it specifically refer to shares rejected at a block change because they were obsolete when submitted to a pool, as logged by pushpool. The results I have above are not stale work, they are invalid hashes.

Phateus

newbie

Activity: 52

Merit: 0

@joulesbeef

Hmm... this might be a really hard bug to find. If anyone has any ideas...
At first I was thinking it was because I compare the nonce to 0, but that would only give false negatives (1 in every 4 billion nonce will not be found)
The main difference between mine and diapolo's init file is that I pack 2 bases together and send them to the kernel. I may try to get rid of the Base variable altogether and just use the offset parameter of the EnqueueKernel() command (I think you can do that in pyopencl)...
Basically just thinking out loud... Undecided

If i didn't love low level programming so much, I think I would shoot myself Tongue

@Diapolo

Yeah, the unreleased version I am working on uses 20 registers (It performs about the same as a configuration which uses 19 but has 2 more ALU OPs)
Also, are you getting increased number of stales now that you have implemented some of the optimizations from phatk?

joulesbeef

sr. member

Activity: 476

Merit: 250

moOo

Yeah Phatk i hate to say it but I am having similar issues as deepceleron.

I started to notice an uptick in stales, I thought it was due to our proxy as we had problems before and we update it a lot.
about 3-5% across the board

i reverted back to dia 7-17 for the the past 10 hours, and I have less than 1% stales.. which is normal for me.
Using a 5830, 2.4 11.6 win7 32

phoenix, guiminer VECTORS BFI_INT -k phatk FASTLOOP=false WORKSIZE=256 AGGRESSION=12 -q2

Diapolo

hero member

Activity: 772

Merit: 500

Quote from: Phateus on August 04, 2011, 12:33:39 PM

Quote from: Diapolo on August 04, 2011, 07:32:11 AM

@Phat:

What is your experience with SDK 2.5 so far? It seems to behave somewhat odd in terms of expected vs. real performance (KernelAnalyzer vs. Phoenix).
Do you use the CAL 11.7 profile for ALU OP usage information or an earlier version?

Dia

Yeah... not sure... I wasn't paying that much attention to in 2.4, but what I've noticed is that using less registers improves performance. Mainly, I've noticed that with heavy register usage (or high WORKSIZE numbers), the MH/s became more and more dependent on memory speed (probably the main reason why VECTORS4 performs terribly at low memory speeds).

But, overall I didn't even know I had 2.5, so clearly I didn't really notice a difference, lol.

As of the newest edit on the first page, I am using CAL 11.7.

On a side note, my card exploded (well, the fan died) yesterday, so my work may be slowed a little. I'm not saying this has anything to do with joulesbeef's cat sicking up on the carpet, but cats are a crafty bunch...

In terms of efficiency one has to consider if a higer RAM frequency is worth it, becaus the cards draws much more power with a higher mem clock :-/. The sweet spot for my 5870 and 5830 seems to be @ 350 MHz Mem.

Hope you get a new card soon

!

Dia

dishwara

legendary

Activity: 1855

Merit: 1016

Quote from: CanaryInTheMine on August 04, 2011, 11:16:57 AM

Quote from: dishwara on August 04, 2011, 10:33:08 AM

My always sweet spot for 5870 is memory clock is equal to core clock divided by three.
mem= core/3 = 975/3=325

Would this hold true for 5850 and/or 5830s?

Only trial & error will tell.
My sweet spot for 6870 is mem clk = (core clk/3) + 14.
I havn't tested sweet spot for 6970 yet, since my mother board is in repair for the past 7 days & when i was mining Linux didn't allowed to under clock not more than core clock minus 125 Mhz.
I hope Windows 11.8 will give correct sweet spot for 6970, which i know once i got my mother board back.

Phateus

newbie

Activity: 52

Merit: 0

Quote from: Diapolo on August 04, 2011, 07:32:11 AM

@Phat:

What is your experience with SDK 2.5 so far? It seems to behave somewhat odd in terms of expected vs. real performance (KernelAnalyzer vs. Phoenix).
Do you use the CAL 11.7 profile for ALU OP usage information or an earlier version?

Dia

Yeah... not sure... I wasn't paying that much attention to in 2.4, but what I've noticed is that using less registers improves performance. Mainly, I've noticed that with heavy register usage (or high WORKSIZE numbers), the MH/s became more and more dependent on memory speed (probably the main reason why VECTORS4 performs terribly at low memory speeds).

But, overall I didn't even know I had 2.5, so clearly I didn't really notice a difference, lol.

As of the newest edit on the first page, I am using CAL 11.7.

On a side note, my card exploded (well, the fan died) yesterday, so my work may be slowed a little. I'm not saying this has anything to do with joulesbeef's cat sicking up on the carpet, but cats are a crafty bunch...

Phateus

newbie

Activity: 52

Merit: 0

@deepceleron

That is completely baffling to me... I use the EXACT same code to determine which hashes to send as diapolo (from the poclbm kernel).

How are you getting the hash results?

If they are from the server, then i'm pretty sure they never will be 00000.... because the rejection comes because of a mismatch in state data, so the hash comes out different on the server than on your client, right?

If they are from the Client, are you running a modified version of phoenix? I don't think the stock phoenix logs that information. If you are, could you post details, so I can look into the bug.

Quote

Two miner instances per GPU.

Why are you running 2 instances per GPU? That seems like it would just increase overhead and double the amount of stales. Try only running 1 instance per GPU and perhaps decreasing the AGGRESSION from 13 to 12. If that doesn't fix it, I'm not sure what else I can do without further information.

Anyone else getting this bug?

mike678

full member

Activity: 182

Merit: 100

Quote from: CanaryInTheMine on August 04, 2011, 11:35:46 AM

Mike, did you ever figure out that problem you had with MSI Afterburner? I think you posted on my thread as well about same issue I had...

Which thread are you talking about? I know I made a thread the other day in support about afterbuner freezing when I hit apply for my 5850's but cant remember what your thread was. If your talking about the freezing I haven't had a chance to test any further with the 5850's because I literally spent from the time I got out of work to like 1 am working on a skeleton case and trying to figure out why the psu was making a clicking noise.

Also I know you got the ncixus 5850's as well whats your top speed on those so far? I can get up to 395ish with stock voltage.

CanaryInTheMine

donator

Activity: 2352

Merit: 1060

between a rock and a block!

Quote from: mike678 on August 04, 2011, 11:33:01 AM

Quote from: CanaryInTheMine on August 04, 2011, 11:16:57 AM

Quote from: dishwara on August 04, 2011, 10:33:08 AM

My always sweet spot for 5870 is memory clock is equal to core clock divided by three.
mem= core/3 = 975/3=325

Would this hold true for 5850 and/or 5830s?

I'd say it hold fairly true for my 5830's I get my best megahash as 1030/350.

I've only done minor tests with my 5850's but I have a really hard time getting the mem clock down because it starts to crash my system when I do that.

Mike, did you ever figure out that problem you had with MSI Afterburner? I think you posted on my thread as well about same issue I had...

mike678

full member

Activity: 182

Merit: 100

Quote from: CanaryInTheMine on August 04, 2011, 11:16:57 AM

Quote from: dishwara on August 04, 2011, 10:33:08 AM

My always sweet spot for 5870 is memory clock is equal to core clock divided by three.
mem= core/3 = 975/3=325

Would this hold true for 5850 and/or 5830s?

I'd say it hold fairly true for my 5830's I get my best megahash as 1030/350.

I've only done minor tests with my 5850's but I have a really hard time getting the mem clock down because it starts to crash my system when I do that.

CanaryInTheMine

donator

Activity: 2352

Merit: 1060

between a rock and a block!

Quote from: dishwara on August 04, 2011, 10:33:08 AM

My always sweet spot for 5870 is memory clock is equal to core clock divided by three.
mem= core/3 = 975/3=325

Would this hold true for 5850 and/or 5830s?

Topic: Modified Kernel for Phoenix 1.5 - page 6. (Read 96811 times)