Author

Topic: Minor Scrypt OpenCL optimization (Read 11332 times)

newbie
Activity: 19
Merit: 0
January 28, 2014, 10:02:47 AM
#20
Something I've noticed after testing this on and off for quite awhile now.

While it definitely increases raw kh/s and makes the gpu work harder (based on gpu temps at same clock rate)

- the reality is, it does nothing for WU

Long term WU does not change compared to without these tweaks.

I am not knowledgeable enough to explain why but I can pretty much report this as fact.

So if it makes you feel better to see higher kh/s and use more power and run gpu hotter, this helps.

But for actual real-world improvement, this does not seem to do much.

If someone has proof otherwise, please share your WU improvements?


legendary
Activity: 1210
Merit: 1024
January 19, 2014, 03:05:29 PM
#19
hero member
Activity: 686
Merit: 500
January 19, 2014, 02:45:24 PM
#18
Hi Kopam, I can sent you the cl file that I use for my 7950, which fluctuates between 680 and 710 kh/s (usually lower because it is in the computer that I use).
I have a file with the settings in it, or the file without the setting where the settings are in a .conf file.

Just PM me your email adress.

Greetings Maarten
newbie
Activity: 28
Merit: 0
January 16, 2014, 05:37:37 PM
#17
Just tried on my 4x7950 rig adn can confirm it realy make some sweet spot to gpu-s so they make more hashpower.
hero member
Activity: 518
Merit: 500
January 16, 2014, 05:26:58 PM
#16
I just found this thread.
I have no idea how to optimize like that so i would like to ask for help Smiley
I am waiting for a lot of 7950 to arrive very soon so 2-3% difference will be a lot.
If anyone manages to help me i will make a donation !
full member
Activity: 167
Merit: 100
January 04, 2014, 08:47:32 AM
#15
I actually lost 20% hashing performance on my 7950.
But I'm on an old version of drivers (13.4) and use settings on my 7950 that do not max MHz on mem and gpu (=>lower power). I'd guess it's the driver.
full member
Activity: 140
Merit: 100
January 02, 2014, 06:01:49 AM
#14
Thanks, gave me a 2.5% increase in khash/s with a 7950.
full member
Activity: 140
Merit: 100
January 02, 2014, 05:51:39 AM
#13
il give it a shot on my 6970
newbie
Activity: 19
Merit: 0
January 02, 2014, 04:54:35 AM
#12
Please let me know of the results you were able to get, including your hardware and drivers version.

This definitely worked on my 7950s under BAMT/Ubuntu with catalyst 12.6 (8.98)

But strangely it did not affect my 7970 in the slightest.

Here are the previous attempts to tweak scrypt.cl

https://litecointalk.org/index.php/topic,4082.0.html

https://litecointalk.org/index.php/topic,6020.0.html

which didn't work for me at all - but yours did

Note how they had to make a different version for 13.4+ and pre-13.4

The 7950s went from 610-611  to  620-625   so roughly a 2% performance increase.
On rare occasion I've noticed the rate bounce down to 610 but climb back up.
Watching the longer term WU to see what happens.

I'm curious why the 7970 is not affected at all, I only have a single thread running.

If you'd like some dogecoin as a thank you, let me know your address.
sr. member
Activity: 840
Merit: 251
December 23, 2013, 02:32:50 AM
#11
Hello,

I spent several days optimizing Scrypt OpenCL code. It was quite challenging, because my primary work and hobby is low-level optimization of code, especially the cryptographic one. SHA-256 is very familiar to me, in particular, I contributed in SHA-256 assembler optimization of OpenSSL code.

Concerning the current Scrypt OpenCL code, it's quite perfect. My first implementation was 10 times slower than that! Anyway, finally I've achieved a little speed-up. I have tested it in few AMD GPUs under Windows and latest AMD drivers (13.11), and the results are as follows:
HD 6770 and HD 7950 - 2-3%
HD 7770 - no change
R9 280x - haven't seen any changes in -g 2 mode, but there is again 2% in -g 1.

Here is the new OpenCL code: http://www.crark.net/download/scrypt130511.zip

Instructions:
0) Save your scrypt130511.cl file
1) Unzip and copy it (overwriting) to the cgminer folder. If your cgminer uses another filename (like scrypt130302.cl), rename file to this name.
All cgminer 3.x versions should be supported.
2) Delete all *.bin files (like scrypt130511Tahitiglg2tc8192w256l4.bin).
3) Restart cgminer and enjoy. All your previous settings, like lookup-gap and thread-concurrency should not be changed.

Please let me know of the results you were able to get, including your hardware and drivers version.
 If you like my work, please donate BTC or LTC.

SY, Pavel Semjanov.

The modifications seem to be working well for me. After some config tweaks, I went from 640 KH/s at best to 670 KH/s w/Sapphire Vapor-X 7950 w/boost. My efficiency and shares/min seem to have noticeably improved as well. Here's my cgminer config for anyone interested:

"api-allow" : "W:127.0.0.1",
"api-listen" : true,
"expiry" : "3",
"log" : "5",
"queue" : "2",
"scan-time" : "1",
"scrypt" : true,
"kernel" : "scrypt",
"auto-fan" : true,
"gpu-threads" : "1",
"gpu-engine" : "1150",
"gpu-memclock" : "1500",
"intensity" : "19",
"temp-target" : "70",
"temp-overheat" : "85",
"temp-cutoff" : "95",
"temp-hysteresis" : "3",
"gpu-powertune" : "20",

"gpu-vddc" : "1.25",
"worksize" : "256",
"lookup-gap" : "2",
"shaders" : "1792",
"vectors" : "1",
"thread-concurrency" : "21712"

Still playing around with the gpu-vddc setting, but something about these GPUs don't seem to like anything less than 1.25 regardless of what I set my clocks at (I've set them way lower hoping to be able to set the voltage lower, but they always end up crashing). One day when I'm feeling more ambitious I may try to flash the GPU bios. I have also played around with the thread-concurrency quite a bit, but 21712 seems to be the sweet spot.

To the OP: Unfortunately I don't have any LTC or BTC atm - I had to sell most of them a couple of months ago to cover other expenses and just recently started mining again. What little funds I have right now are invested in other coins. Happen to have a TAG, WDC or NXT address? I would be happy to send a few over to you if so.
full member
Activity: 149
Merit: 102
December 22, 2013, 07:14:03 PM
#10
Argh... dont #define k if you want to use it as a loop var. :-)

#define k 0

#pragma unroll
   for(uint k=0; k<8; k++);
hero member
Activity: 686
Merit: 500
December 22, 2013, 03:43:21 PM
#9
Semicolons at the end of the for statement do nothing for the current error.
But you are right the don't belong there so I removed them thnx.
full member
Activity: 149
Merit: 102
December 22, 2013, 02:39:40 PM
#8
just a wild guess... remove the semicolon from the for... line?
hero member
Activity: 686
Merit: 500
December 22, 2013, 01:35:46 PM
#7
I tried to optimize the code even further, but i have limited coding skills.

Below is the part that I tried to make so that is can be executed parallel. But I get the error: line 469: error: expected an identifier
for(uint k=0; k<8; k++);

Somebody that has a clue to what goes wrong here?

Code:
void SHA256_fixed(uint4*restrict state0,uint4*restrict state1)
{
uint4 S0 = *state0;
uint4 S1 = *state1;

#define A S0.x
#define B S0.y
#define C S0.z
#define D S0.w
#define E S1.x
#define F S1.y
#define G S1.z
#define H S1.w
#define k 0

#pragma unroll
for(uint k=0; k<8; k++);
RND(A,B,C,D,E,F,G,H, fixedW[(8*k)+0]);

#pragma unroll
for(uint k=0; k<8; k++);
RND(H,A,B,C,D,E,F,G, fixedW[(8*k)+1]);

#pragma unroll
for(uint k=0; k<8; k++);
RND(G,H,A,B,C,D,E,F, fixedW[(8*k)+2]);

#pragma unroll
for(uint k=0; k<8; k++);
RND(F,G,H,A,B,C,D,E, fixedW[(8*k)+3]);

#pragma unroll
for(uint k=0; k<8; k++);
RND(E,F,G,H,A,B,C,D, fixedW[(8*k)+4]);

#pragma unroll
for(uint k=0; k<8; k++);
RND(D,E,F,G,H,A,B,C, fixedW[(8*k)+5]);

#pragma unroll
for(uint k=0; k<8; k++);
RND(C,D,E,F,G,H,A,B, fixedW[(8*k)+6]);

#pragma unroll
for(uint k=0; k<8; k++);
RND(B,C,D,E,F,G,H,A, fixedW[(8*k)+7]);

#undef A
#undef B
#undef C
#undef D
#undef E
#undef F
#undef G
#undef H
#undef k

*state0 += S0;
*state1 += S1;
}
hero member
Activity: 896
Merit: 1000
December 21, 2013, 08:01:40 AM
#6
There is an optimised Optimized scrypt kernel files for 7950/7970/7990/R9 280x
https://litecointalk.org/index.php?topic=6058.0;topicseen

It can increase the speed.

If combine your script with that one, can we increase the speed further?

From your instruction, we have to delete the existing .bin files.
hero member
Activity: 686
Merit: 500
December 21, 2013, 07:18:18 AM
#5
Code seems to speed up my 7950 with about 2%.
However when quiting cgminer and restarting it, it tends to hang around the 12 kh/s and i have to restart the pc to get it working again.

On the topic of improvements: Has someone ever implemented the uint8 into the code? I'm not a coder but I read on the net that the scrypt would benefit from this...

Gr, Maarten
newbie
Activity: 9
Merit: 0
December 14, 2013, 10:01:21 PM
#4
You can take the programmer out of C but you can't take C out of the programmer.  Nice catch.  Further improvements are possible  though if you don't mind coding for specific GPUs with AMD-specific optimizations.  Thanks for sharin' though.
psw
newbie
Activity: 3
Merit: 0
December 13, 2013, 12:15:14 PM
#3
Virus? In text OpenCL file? Are you kidding?
legendary
Activity: 1106
Merit: 1000
December 13, 2013, 12:10:56 PM
#2
Virus warning
psw
newbie
Activity: 3
Merit: 0
December 13, 2013, 11:58:14 AM
#1
Hello,

I spent several days optimizing Scrypt OpenCL code. It was quite challenging, because my primary work and hobby is low-level optimization of code, especially the cryptographic one. SHA-256 is very familiar to me, in particular, I contributed in SHA-256 assembler optimization of OpenSSL code.

Concerning the current Scrypt OpenCL code, it's quite perfect. My first implementation was 10 times slower than that! Anyway, finally I've achieved a little speed-up. I have tested it in few AMD GPUs under Windows and latest AMD drivers (13.11), and the results are as follows:
HD 6770 and HD 7950 - 2-3%
HD 7770 - no change
R9 280x - haven't seen any changes in -g 2 mode, but there is again 2% in -g 1.

Here is the new OpenCL code: http://www.crark.net/download/scrypt130511.zip

Instructions:
0) Save your scrypt130511.cl file
1) Unzip and copy it (overwriting) to the cgminer folder. If your cgminer uses another filename (like scrypt130302.cl), rename file to this name.
All cgminer 3.x versions should be supported.
2) Delete all *.bin files (like scrypt130511Tahitiglg2tc8192w256l4.bin).
3) Restart cgminer and enjoy. All your previous settings, like lookup-gap and thread-concurrency should not be changed.

Please let me know of the results you were able to get, including your hardware and drivers version.
 If you like my work, please donate BTC or LTC.

SY, Pavel Semjanov.
Jump to: