cgminer - CPU/GPU miner in C for linux/windows - page 9.

gat3way

sr. member

Activity: 256

Merit: 250

Sorry for the rude OT question but was that you that maintained the -ck tree?

Quote

Update tree:

I did incorporate that change into my kernel. It turns out that even though my hardware reports 4 as the preferred vector width, it's faster with 2. I assume many people have experienced the same. So I've made the default to be 2 when the hardware says its preferred vector width is anything larger than 1.

It's due to the high GPR usage, it is high enough to balance the poorer ALUPacking coming from uint2, not uint4 vectors. In fact I found out 3-component vectors to work best and they should be supported by opencl 1.1 standart, but the OpenCL compiler is buggy and generates bad code with uint3. Interlacing uint2 and uint works though

-ck

legendary

Activity: 4088

Merit: 1631

Ruu \o/

Updated tree:

I've imported the phatk kernel into minerd. The maximum possible throughput is slightly faster on machines that support amd media ops which is nice. However, even nicer is that on sane intensity levels (including the default value of 4), the throughput is significantly faster now as well. The phatk kernel unfortunately doesn't even work on hardware that doesn't have amd media ops (radeon 4x cards and nvidia) so for now it defaults back to the poclbm kernel.

I've also updated the cpu mining component. Now it tries to keep its work sizes within the log update interval instead of the scan interval so that the hash rate doesn't fluctuate all over the place. It is also possible now to set number of gpu threads to 0 to run minerd as just a cpu miner again.

TODO:
-I want to find ways of allowing even larger settings for intensity that would only be suitable for headless boxes. Currently the code ends up racing too much (with all the parallel processing) and generates far too many rejected blocks when the intensity is set to >10. Making the cl code synchronous would avoid that but it also slows it down, thereby making it pointless to push it further.
-Store binary versions of the kernels that could be loaded faster when restarting the app.
-Any bugfixing remaining.
-Profit.

figvam

newbie

Activity: 42

Merit: 0

It appears it's not possible to use minerd as a pure CPU miner anymore - setting GPU threads to zero doesn't work.

burp

member

Activity: 98

Merit: 10

Quote from: jgarzik on June 27, 2011, 01:04:28 PM

Quote from: burp on June 27, 2011, 12:37:05 PM

- one poclbm with phatk kernel for each card: 2*308MH/s = 616MH/s
- minerd with 2 threads for each card, gives me 605MH/s

Just for knowledge... what performance do you get with 1 thread per card?

About 586MH/s, means 293MH/s per card.

EDIT: Considering minerd uses poclbm kernel (which is slower for me than phatk), minerd might be already on par (with twice the number of threads).

jgarzik

legendary

Activity: 1596

Merit: 1100

Quote from: burp on June 27, 2011, 12:37:05 PM

- one poclbm with phatk kernel for each card: 2*308MH/s = 616MH/s
- minerd with 2 threads for each card, gives me 605MH/s

Just for knowledge... what performance do you get with 1 thread per card?

burp

member

Activity: 98

Merit: 10

Current status for my dual 5830 setup:

- one poclbm with phatk kernel for each card: 2*308MH/s = 616MH/s
- minerd with 2 threads for each card, gives me 605MH/s

so there is still some room for improvements

iopq

hero member

Activity: 658

Merit: 500

Quote from: -ck on June 27, 2011, 09:37:09 AM

Update tree:

I did incorporate that change into my kernel. It turns out that even though my hardware reports 4 as the preferred vector width, it's faster with 2.

yeah, same thing in poclbm, window size 128, vectors 2 is the fastest setting for me

-ck

legendary

Activity: 4088

Merit: 1631

Ruu \o/

Update tree:

I did incorporate that change into my kernel. It turns out that even though my hardware reports 4 as the preferred vector width, it's faster with 2. I assume many people have experienced the same. So I've made the default to be 2 when the hardware says its preferred vector width is anything larger than 1.

I found a little buglet that also would repeat some blocks, thereby artificially raising the hash rate, so the overall rate has dropped slightly (about the same amount it's increased with the other code!).

As for the daily builds, I assume the requester meant windows builds? Most people who have linux will likely be able to build it. It's not building on windows yet, but will in the near future I hope. If you really do want linux binaries, just say the word.

The problem with repeated blocks was my pool not sending me out longpoll information reliably.

-ck

legendary

Activity: 4088

Merit: 1631

Ruu \o/

Quote from: Naven on June 27, 2011, 05:48:58 AM

@ckolivas, could u share daily builds of this minner?

linux only at this stage, sure I could do that.

Naven

newbie

Activity: 22

Merit: 0

@ckolivas, could u share daily builds of this minner?

-ck

legendary

Activity: 4088

Merit: 1631

Ruu \o/

Maybe it's just my pool. They're having a funky time so that would explain it.

-ck

legendary

Activity: 4088

Merit: 1631

Ruu \o/

I don't doubt it, and no one else is reporting this issue. The other machine I've tried it on it does give a speed up (with minerd) but this one 6770 I'm using it on reliably spits out tons of rejects when I make this change. It's not a heating issue, the card is at 64 degrees.

iopq

hero member

Activity: 658

Merit: 500

Quote from: -ck on June 27, 2011, 05:26:00 AM

Quote from: iopq on June 27, 2011, 05:22:05 AM

Quote from: -ck on June 27, 2011, 05:20:00 AM

With 4 vectors, this change actually slows down the hash rate. With 2 vectors it speeds it up, but then I get runs of rejected shares. Not sure why but this is consistent now so I'm reluctant to include it at this stage.

are you sure?

I can keep trying it on and off to see, but every time so far it has happened. It could well be my pool as they're experiencing technical difficulties, but it's always been the same time I enable it that I get the rejects.

2011-06-27 20:22:46] [173.08 | 191.67 Mhash/s] [81 Accepted] [40 Rejected]

Look at that reject rate. Normally it's <5%

I'm running GUIMiner with this change and I see no difference other than slight speed increase

-ck

legendary

Activity: 4088

Merit: 1631

Ruu \o/

Quote from: iopq on June 27, 2011, 05:22:05 AM

Quote from: -ck on June 27, 2011, 05:20:00 AM

With 4 vectors, this change actually slows down the hash rate. With 2 vectors it speeds it up, but then I get runs of rejected shares. Not sure why but this is consistent now so I'm reluctant to include it at this stage.

are you sure?

I can keep trying it on and off to see, but every time so far it has happened. It could well be my pool as they're experiencing technical difficulties, but it's always been the same time I enable it that I get the rejects.

2011-06-27 20:22:46] [173.08 | 191.67 Mhash/s] [81 Accepted] [40 Rejected]

Look at that reject rate. Normally it's <5%

iopq

hero member

Activity: 658

Merit: 500

Quote from: -ck on June 27, 2011, 05:20:00 AM

With 4 vectors, this change actually slows down the hash rate. With 2 vectors it speeds it up, but then I get runs of rejected shares. Not sure why but this is consistent now so I'm reluctant to include it at this stage.

are you sure?

-ck

legendary

Activity: 4088

Merit: 1631

Ruu \o/

With 4 vectors, this change actually slows down the hash rate. With 2 vectors it speeds it up, but then I get runs of rejected shares. Not sure why but this is consistent now so I'm reluctant to include it at this stage.

-ck

legendary

Activity: 4088

Merit: 1631

Ruu \o/

Quote from: -ck on June 27, 2011, 02:36:41 AM

Quote from: figvam on June 27, 2011, 02:04:44 AM

This small mod to the poclbm OpenCL kernel gives about 3% more performance if BFI_INT is used:
https://forum.bitcoin.org/index.php?topic=22965.0;topicseen

Quote

#define Ma(x, y, z) amd_bytealign((y), (x | z), (z & x))
and change it to this line
#define Ma(x, y, z) amd_bytealign( (z^x), (y), (x) )

For some reason that's greatly increased my reject rate.

Actually that was sheer coincidence. I'll test this change some more, thanks!

-ck

legendary

Activity: 4088

Merit: 1631

Ruu \o/

Quote from: figvam on June 27, 2011, 02:04:44 AM

This small mod to the poclbm OpenCL kernel gives about 3% more performance if BFI_INT is used:
https://forum.bitcoin.org/index.php?topic=22965.0;topicseen

Quote

#define Ma(x, y, z) amd_bytealign((y), (x | z), (z & x))
and change it to this line
#define Ma(x, y, z) amd_bytealign( (z^x), (y), (x) )

For some reason that's greatly increased my reject rate.

figvam

newbie

Activity: 42

Merit: 0

This small mod to the poclbm OpenCL kernel gives about 3% more performance if BFI_INT is used:
https://forum.bitcoin.org/index.php?topic=22965.0;topicseen

Quote

#define Ma(x, y, z) amd_bytealign((y), (x | z), (z & x))
and change it to this line
#define Ma(x, y, z) amd_bytealign( (z^x), (y), (x) )

-ck

legendary

Activity: 4088

Merit: 1631

Ruu \o/

Updated tree.

32 bits and BFI int patching is now working. For some reason on 64bit the opencl compiler builds an elf file within an elf file but not so on 32 bits. Go figure.

STILL TODO:
Testing and performance evaluation.
Windows builds (getting help from someone there hopefully it works).

Topic: cgminer - CPU/GPU miner in C for linux/windows - page 9. (Read 81956 times)