Author

Topic: GPU mining strategies (algorithm) (Read 1530 times)

newbie
Activity: 2
Merit: 0
July 11, 2014, 09:21:26 AM
#3
Thanks for the feedback.

Do you then return the nonce from the kernel to do a fullcheck on for validity on the host? On a GPU is it even possible to return prematurely from a calculation? As they run in lockstep mode I do not think this is possible?

At the moment I am simply using a full sha256 functiaon from a library which of course is slower, and don't even do midstate pre-hashing at the moment. However I am only getting 150khas/sec so I somewhat doubt that this is the main issue. Even with the sha256 function removed from the kernel I only get theoretically about 3000khash and that is still much lower than the ~10 mhash I can get using other miners.

It might actually be down to the rather experimental opencl framework I am using, will investigate a bit further.
hero member
Activity: 675
Merit: 513
July 10, 2014, 12:27:18 PM
#2
Hi,
Sorry if this is the wrong forum.
I am currently in the process of creating a fairly simple bitcoin miner for CPU and GPU (purely for demonstration purposes not for money earning).
However I am having a little troubling understanding how the GPU miners generally work. Now I do want a fairly simple version of this, but I  hope to be able to get one that performs decently (hopefully within 10-30% of normal miners, and definitely faster than CPU version).
In general I would think the strategy you have for executing on the GPU is something like the below. I hope someone could help me out on whether I am doing something completely wrong and give me some pointers towards how you usually do it.
  • Transfer the binary version of the data to hash to the kernel (I noticed a lot of input arguments to the OpenCL kernels on some of the versions I have seen, I assume this is some sort of optimizations of data transfer, looks like a midstate calculation that is passed to each kernel)
  • Now, calculate the double sha256 hash of the data (Is it generally advisable to have a loop checking multiple nonces or just one per kernel?)
  • Return a result. What is the best way of doing this? Do I check inside the kernel if it is lower than the desired target, do I just return any value and return to the host device to check for validity or how is this generally done? If checking multiple nonces I assume you should keep track of what was the best result during the run.
I do have a general and very basic GPU implementation but it is currently slower than the CPU implementation I have. I do more or less as above where each kernel check several nonces and return the "best" one (i.e. most trailing zeros of the hash (using getwork protocol) ).
Usually in the nonce loop you want to return a result as soon as you find a diff 1 hash (4 bytes of zeroes).
If you don't have 4 bytes of zeroes then you can continue.
Some things don't change if you have a different nonce. These can be precomputed outside of the nonce loop.
Did you use the unrolled version of sha256? Or do you have a for loop with 64 rounds?
newbie
Activity: 2
Merit: 0
July 09, 2014, 11:07:10 AM
#1
Hi,

Sorry if this is the wrong forum.

I am currently in the process of creating a fairly simple bitcoin miner for CPU and GPU (purely for demonstration purposes not for money earning).

However I am having a little troubling understanding how the GPU miners generally work. Now I do want a fairly simple version of this, but I hope to be able to get one that performs decently (hopefully within 10-30% of normal miners, and definitely faster than CPU version).

In general I would think the strategy you have for executing on the GPU is something like the below. I hope someone could help me out on whether I am doing something completely wrong and give me some pointers towards how you usually do it.

  • Transfer the binary version of the data to hash to the kernel (I noticed a lot of input arguments to the OpenCL kernels on some of the versions I have seen, I assume this is some sort of optimizations of data transfer, looks like a midstate calculation that is passed to each kernel)
  • Now, calculate the double sha256 hash of the data (Is it generally advisable to have a loop checking multiple nonces or just one per kernel?)
  • Return a result. What is the best way of doing this? Do I check inside the kernel if it is lower than the desired target, do I just return any value and return to the host device to check for validity or how is this generally done? If checking multiple nonces I assume you should keep track of what was the best result during the run.

I do have a general and very basic GPU implementation but it is currently slower than the CPU implementation I have. I do more or less as above where each kernel check several nonces and return the "best" one (i.e. most trailing zeros of the hash (using getwork protocol) ).
Jump to: