Pages:
Author

Topic: cgminer - CPU/GPU miner in C for linux/windows - page 9. (Read 81956 times)

sr. member
Activity: 256
Merit: 250
Sorry for the rude OT question but was that you that maintained the -ck tree? Smiley

Quote
Update tree:

I did incorporate that change into my kernel. It turns out that even though my hardware reports 4 as the preferred vector width, it's faster with 2. I assume many people have experienced the same. So I've made the default to be 2 when the hardware says its preferred vector width is anything larger than 1.

It's due to the high GPR usage, it is high enough to balance the poorer ALUPacking coming from uint2, not uint4 vectors. In fact I found out 3-component vectors to work best and they should be supported by opencl 1.1 standart, but the OpenCL compiler is buggy and generates bad code with uint3. Interlacing uint2 and uint works though Smiley
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
Updated tree:

I've imported the phatk kernel into minerd. The maximum possible throughput is slightly faster on machines that support amd media ops which is nice. However, even nicer is that on sane intensity levels (including the default value of 4), the throughput is significantly faster now as well. The phatk kernel unfortunately doesn't even work on hardware that doesn't have amd media ops (radeon 4x cards and nvidia) so for now it defaults back to the poclbm kernel.

I've also updated the cpu mining component. Now it tries to keep its work sizes within the log update interval instead of the scan interval so that the hash rate doesn't fluctuate all over the place. It is also possible now to set number of gpu threads to 0 to run minerd as just a cpu miner again.

TODO:
-I want to find ways of allowing even larger settings for intensity that would only be suitable for headless boxes. Currently the code ends up racing too much (with all the parallel processing) and generates far too many rejected blocks when the intensity is set to >10. Making the cl code synchronous would avoid that but it also slows it down, thereby making it pointless to push it further.
-Store binary versions of the kernels that could be loaded faster when restarting the app.
-Any bugfixing remaining.
-Profit.
newbie
Activity: 42
Merit: 0
It appears it's not possible to use minerd as a pure CPU miner anymore - setting GPU threads to zero doesn't work.
member
Activity: 98
Merit: 10
- one poclbm with phatk kernel for each card: 2*308MH/s = 616MH/s
- minerd with 2 threads for each card, gives me 605MH/s

Just for knowledge...  what performance do you get with 1 thread per card?



About 586MH/s, means 293MH/s per card.

EDIT: Considering minerd uses poclbm kernel (which is slower for me than phatk), minerd might be already on par (with twice the number of threads).
legendary
Activity: 1596
Merit: 1100
- one poclbm with phatk kernel for each card: 2*308MH/s = 616MH/s
- minerd with 2 threads for each card, gives me 605MH/s

Just for knowledge...  what performance do you get with 1 thread per card?

member
Activity: 98
Merit: 10
Current status for my dual 5830 setup:

- one poclbm with phatk kernel for each card: 2*308MH/s = 616MH/s
- minerd with 2 threads for each card, gives me 605MH/s

so there is still some room for improvements Smiley
hero member
Activity: 658
Merit: 500
Update tree:

I did incorporate that change into my kernel. It turns out that even though my hardware reports 4 as the preferred vector width, it's faster with 2.
yeah, same thing in poclbm, window size 128, vectors 2 is the fastest setting for me
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
Update tree:

I did incorporate that change into my kernel. It turns out that even though my hardware reports 4 as the preferred vector width, it's faster with 2. I assume many people have experienced the same. So I've made the default to be 2 when the hardware says its preferred vector width is anything larger than 1.

I found a little buglet that also would repeat some blocks, thereby artificially raising the hash rate, so the overall rate has dropped slightly (about the same amount it's increased with the other code!).

As for the daily builds, I assume the requester meant windows builds? Most people who have linux will likely be able to build it. It's not building on windows yet, but will in the near future I hope. If you really do want linux binaries, just say the word.

The problem with repeated blocks was my pool not sending me out longpoll information reliably.
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
@ckolivas, could u share daily builds of this minner?

linux only at this stage, sure I could do that.
newbie
Activity: 22
Merit: 0
@ckolivas, could u share daily builds of this minner?
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
Maybe it's just my pool. They're having a funky time so that would explain it.
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
I don't doubt it, and no one else is reporting this issue. The other machine I've tried it on it does give a speed up (with minerd) but this one 6770 I'm using it on reliably spits out tons of rejects when I make this change. It's not a heating issue, the card is at 64 degrees.
hero member
Activity: 658
Merit: 500
With 4 vectors, this change actually slows down the hash rate. With 2 vectors it speeds it up, but then I get runs of rejected shares. Not sure why but this is consistent now so I'm reluctant to include it at this stage.
Huh
are you sure?

I can keep trying it on and off to see, but every time so far it has happened. It could well be my pool as they're experiencing technical difficulties, but it's always been the same time I enable it that I get the rejects.

2011-06-27 20:22:46] [173.08 | 191.67 Mhash/s] [81 Accepted] [40 Rejected]

Look at that reject rate. Normally it's <5%

I'm running GUIMiner with this change and I see no difference other than slight speed increase
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
With 4 vectors, this change actually slows down the hash rate. With 2 vectors it speeds it up, but then I get runs of rejected shares. Not sure why but this is consistent now so I'm reluctant to include it at this stage.
Huh
are you sure?

I can keep trying it on and off to see, but every time so far it has happened. It could well be my pool as they're experiencing technical difficulties, but it's always been the same time I enable it that I get the rejects.

2011-06-27 20:22:46] [173.08 | 191.67 Mhash/s] [81 Accepted] [40 Rejected]

Look at that reject rate. Normally it's <5%
hero member
Activity: 658
Merit: 500
With 4 vectors, this change actually slows down the hash rate. With 2 vectors it speeds it up, but then I get runs of rejected shares. Not sure why but this is consistent now so I'm reluctant to include it at this stage.
Huh
are you sure?
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
With 4 vectors, this change actually slows down the hash rate. With 2 vectors it speeds it up, but then I get runs of rejected shares. Not sure why but this is consistent now so I'm reluctant to include it at this stage.
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
This small mod to the poclbm OpenCL kernel gives about 3% more performance if BFI_INT is used:
https://forum.bitcoin.org/index.php?topic=22965.0;topicseen
Quote
  #define Ma(x, y, z) amd_bytealign((y), (x | z), (z & x))
and change it to this line
  #define Ma(x, y, z) amd_bytealign( (z^x), (y), (x) )

For some reason that's greatly increased my reject rate.

Actually that was sheer coincidence. I'll test this change some more, thanks!
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
This small mod to the poclbm OpenCL kernel gives about 3% more performance if BFI_INT is used:
https://forum.bitcoin.org/index.php?topic=22965.0;topicseen
Quote
  #define Ma(x, y, z) amd_bytealign((y), (x | z), (z & x))
and change it to this line
  #define Ma(x, y, z) amd_bytealign( (z^x), (y), (x) )

For some reason that's greatly increased my reject rate.
newbie
Activity: 42
Merit: 0
This small mod to the poclbm OpenCL kernel gives about 3% more performance if BFI_INT is used:
https://forum.bitcoin.org/index.php?topic=22965.0;topicseen
Quote
  #define Ma(x, y, z) amd_bytealign((y), (x | z), (z & x))
and change it to this line
  #define Ma(x, y, z) amd_bytealign( (z^x), (y), (x) )
-ck
legendary
Activity: 4088
Merit: 1631
Ruu \o/
Updated tree.

32 bits and BFI int patching is now working. For some reason on 64bit the opencl compiler builds an elf file within an elf file but not so on 32 bits. Go figure.

STILL TODO:
Testing and performance evaluation.
Windows builds (getting help from someone there hopefully it works).
Pages:
Jump to: