Author

Topic: Would it be smart to unload some of the work onto the cpu?--read this PDF (Read 1707 times)

legendary
Activity: 1050
Merit: 1000
You are WRONG!
Quote
please define "various values". if its the midstate you are talking about, it was a obvious optimization.
https://bitcointalksearch.org/topic/theoretical-limit-on-hashing-speed-33817

They list out lots of the constant and semi-constant optimizations in that thread.
still the cpu can not help. all theas is optimization in the algorithm, to calculate the sha256 hashes.
sr. member
Activity: 256
Merit: 250
What they do is overlapping kernel execution and host-device transfers. It is generally possible but troublesome with OpenCL and AMD APP SDK. It is done by using asynchronous kernel invocations and buffer reads (supplying CL_FALSE as the "blocking" function argument and callback functions also supplying CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE while creating the queue). This requires DMA support which works with 2.4 only. Keep in mind that it depends very much on the task being performed - the way all bitcoin kernels are written would not allow this.

It would be possible if there were no "reduction", e.g all workitems write their output in their part of the grid, e.g for NDRange of 1,000,000 nonces, you have an output buffer of 1,000,000 uints. This way you may overlap transfers and kernel executions.

It wouldn't be much faster (in fact I wouldn't be surprised if it turns out to be slower). Also, people that underclock memory will definitely suffer.

P.S those might be helpful:

http://forums.amd.com/forum/messageview.cfm?catid=390&threadid=144609
http://forums.amd.com/devforum/messageview.cfm?FTVAR_FORUMVIEWTMP=Linear&catid=390&threadid=134450
hero member
Activity: 560
Merit: 517
Quote
please define "various values". if its the midstate you are talking about, it was a obvious optimization.
https://bitcointalksearch.org/topic/theoretical-limit-on-hashing-speed-33817

They list out lots of the constant and semi-constant optimizations in that thread.
legendary
Activity: 1050
Merit: 1000
You are WRONG!
Quote
And a lot of the recent improvements to the GPU mining algorithms have come from pre-computing, on the CPU, various values.
please define "various values". if its the midstate you are talking about, it was a obvious optimization.
hero member
Activity: 560
Merit: 517
Quote
cool, i knew i was an idiot, but I just wanted a clearer def on why.
Your question was definitely not stupid. I don't like to see perfectly viable ideas get shot down for no reason  Tongue

Quote
the thing that the paper suggest is that we put the cpu to work, while the gpu works.
It's more complicated than that. The slides are a bit scatter brained, but I'll just reference the "Profit!" slide, because I think that is what the OP is referring to. What it suggests is moving the two components of the algorithm which the GPUs perform poorly at, but which the CPU can quickly process while the GPU is doing more intensive and suitable work. It's like hiring someone at your restaurant to peel potatoes and take out the trash, so your cooks can do what they do best.

In some sense, this has already been applied to mining. bitcoind actually does half the work for you, and "pre-hashes" the results it returns (this is where midstate comes from). Hence, some of the work has been offloaded to the CPU-based bitcoind.

And a lot of the recent improvements to the GPU mining algorithms have come from pre-computing, on the CPU, various values.

And we also only generate Difficulty-1 shares from the GPU, because anything else would slow the GPU down. The CPU is responsible for determining if that share is actually useful in generating a block.

However, those are only once per unit of work optimizations. Unlike the password recovery slides, there is no repeated computation that the CPU could help with in mining, because there are no parts of the mining algorithm that are better suited for the CPU. No complicated generation or verification. Since the CPU isn't better at any part of it, it's best to leave the whole algorithm on the GPU.

Quote
but there are nothing to do for the cpu while the gpu works.
That's not the case at all. The CPU can be hashing as well, and on most modern CPUs that would give you an extra 3 or 4 MH/s ... roughly the same increment you see in each new revision of the GPU code.

The real reason why we don't utilize the CPU in this manner is not because you can't. You most certainly can run a CPU miner alongside your GPU miners (with a bit of effort). People don't usually do it, though, because the CPUs have terrible MH per Watt performance. i.e. you're wasting electricity. It's far more suitable to buy a low power, cheap CPU and have it just twiddling its thumbs so it can snap into action when it's time to give the GPU new work. Like the squeegee boys on porn shoots.
sr. member
Activity: 476
Merit: 250
moOo
cool, i knew i was an idiot, but I just wanted a clearer def on why.

thanks for the reply.
legendary
Activity: 1050
Merit: 1000
You are WRONG!
Quote
while I admit I know nothing about programming miners, that is a bit definitive, considering the kernel mods which have resulted in increased speed.
it was optimization in the code of the gpu kernel, that resulted in the speedup.
 
Quote
Not that I doubt you, but did you look at the pdf?
no, i did not have to. but im going to now.
there is nothing mining related that a CPU can do, that a GPU can not.

the only thing that seperates the GPU and the CPU is:
GPUs have a lot of cores, and therefor i good for heavy computation.
CPUs have a lot of system capability. like paging, privileged separation, task switching and other nasty stuff.

all this stuff makes the CPUs more complicated more heavy, and robust.

i have now read the paper. there is still nothing to speedup.
the thing that the paper suggest is that we put the cpu to work, while the gpu works.
but there are nothing to do for the cpu while the gpu works.
sr. member
Activity: 476
Merit: 250
moOo
Quote
no there is nothing to speedup.


while I admit I know nothing about programming miners, that is a bit definitive, considering the kernel mods which have resulted in increased speed.

Not that I doubt you, but did you look at the pdf?

legendary
Activity: 1050
Merit: 1000
You are WRONG!
no there is nothing to speedup.
full member
Activity: 168
Merit: 100
Live long and prosper. \\//,
More interesting for me in this PDF, that this experienced team of GPGPU programers do write they programs not in OpenCl, but in in Intermediate Language (IL)...
full member
Activity: 196
Merit: 100
Interesting find. This could prove to be useful.
legendary
Activity: 1764
Merit: 1006
i read somewhere that people sells their gpu time for password cracking.

it has been done for quite awhile, i think?
sr. member
Activity: 476
Merit: 250
moOo
This is a PDF about using the AMD GPU for password recovery and they are mainly talking about the upcoming sdk 2.5.
If you look at page 10 on.. you see they suggest that offloading some of the work onto the CPU.. speeds up their password recovery greatly.

Now I'm not a programmer.. probably should have said that first but you probably wouldnt have read this far... So I dont know if this will help you guys at all, but I figure  solving these hashes is simular.. like I dont think we have to do the trial password thing.. just the hashes and keys.. but surely we validate the keys


Eh if it doesnt help just ignore me.. I hate it when people dont know what they are talking about try to diagnosis crap.. cars, computers.. etc

But if it helps yall then it helps me, so I figured it was worth the large chance of looking like the fool, to share.
Jump to: