Author

Topic: From Stable to even Higher GPU Hashing Performance (Read 1090 times)

sr. member
Activity: 840
Merit: 255
SportsIcon - Connect With Your Sports Heroes
I'm using the compiled .bin. It's excellent!

I wonder if I manage to do the same for the 6950
newbie
Activity: 23
Merit: 0
I'll put the updates on the litecoin forum (https://forum.litecoin.net/index.php/topic,6020.0.html) since the topic is getting buried here pretty quickly.
hero member
Activity: 532
Merit: 500
bearded, drunk, fat, naked
your kernel might work for my cards then, I guess? Shouldn't it work for all 7950s (I suppose that yours are 7950s).

I also wonder if those cards are actually lemons or whether they are just overwhelmed by being the primary display card at the same time. For my 3x7950 rig, I can switch the cars any way I want, the one being the display card will always hash slower and require more voltage to not "get sick".
newbie
Activity: 23
Merit: 0
hmm... not really... it's a bit complicated.  The most gains in the hash are eliminating loops.

For example, the kernel has an #pragma unroll which removes loop and generates the call line by line.  If a variable is unknown, #pragma unroll is not done.  Also if cgminer is not compiled with -O3 option (most are being compiled -O2, I can assure you this), the kernel will not even be unrolled at all, so it's better for you to do the unrolling manually.

EDIT: oh, and my settings are these:

"intensity" : "19",
"vectors" : "1",
"worksize" : "256",
"kernel" : "scrypt",
"lookup-gap" : "2",
"thread-concurrency" : "24000",
"gpu-engine" : "1140,1140,1020",
"gpu-memclock" : "1250"

The last card was a lemon, doesn't want to get higher than 1020, that's what's you see on the stats above (about 633~ish) still pretty decent for hash/watt ratio.
hero member
Activity: 532
Merit: 500
bearded, drunk, fat, naked

  • remove, as much as possible, looping structure
  • minimize calling custom functions and incorporate them to the function caller itself.



These sound like they might not be hardware dependant? Can you write a script that automatically does this for the given thread concurrency (etc.) parameters, i.e. simply improve the kernel-building function of cgminer?

aren't the lines of code you have to change always the same, just with different numbers? or do the lines actually change with the given graphics card architecture? are yours 7950 radeons? What cgminer settings were optimal for your cards before?

i.e. mine are
-g 1 --thread-concurrency 24000
for sapphire OC 7950s.
sr. member
Activity: 350
Merit: 250
whoooo a whole 3% increase
newbie
Activity: 23
Merit: 0
ah-- no interest? Lips sealed  It requires no hardware modification.  I was trying to compile cgminer to my Galaxy S4, managed to do it, but the kernel doesn't want to be accepted, so I modified the scrypt kernel code.  The good is what you see above, higher hashrate-- you might even need to tune down (like I did, to reach a good hash/power ratio), the bad is the modification is specific for each configuration, so the kernel code I have is specific to mine.  Bear in mind that cgminer can do a lot of things and can be configured to suite any GPU settings, but making it (the scrypt kernel) concentrate on a specific configuration will yield higher hashrates.  So here's what I did:

  • edit scrypt130511.cl
  • remove, as much as possible, looping structure
  • use hardcoded constants as much as possible
  • minimize arithmetic operations (addition, subtraction, multiplication, and division) by calculating the exact values from hardcoded settings (thread concurrency, worksize, gap)
  • minimize calling custom functions and incorporate them to the function caller itself.
  • save!
  • remove all scrypt130511......bin files, because this is the compiled kernel that cgminer uses for the current setting
  • start cgminer

The readability and maintainability of the code is of course affected, but we don't need this-- we want the kernel to be as efficient as possible!
newbie
Activity: 23
Merit: 0
A month ago I was hashing like this:

http://img109.imageshack.us/img109/4765/4o1v.png

You can see that the highest core-engine/memory there is 1145/1250.

Fast forward now, with lower core-engine/memory settings:
http://img690.imageshack.us/img690/305/j84v.png

Don't mind the Kh/s.  I tweaked my BAMT to display it in Kh/s.  See the numbers above?  That's 700+ on the same MSI 7950 being reached with lower engine clocks, all other settings the same!  Got you interested now?

Just when you've achieved the maximum stable configuration of your rigs that are running for months now, this is a guide to achieve higher hashing performance out of your GPU.

(Stay tuned-- I'm just excited to share this now with all the nights I've spent doing this, so I'll let you hang there for a while now... in the meantime, please post your current scrypt settings below)
Jump to: