Pages:
Author

Topic: [ANN][GRS][DMD][DGB] Pallas optimized groestl opencl kernels - page 21. (Read 61242 times)

legendary
Activity: 2716
Merit: 1094
Black Belt Developer
I suppose I'll have to upgrade the RAM on my miner box from 2G to 4G now ... occasional 14.7RC2 driver crashes at I=22

does it work at I=21? there should be little difference in hashrate.

Actually runs faster at I=21 Smiley
Have not messed with GPU or MEM clocks just defaults Smiley  
(Powercolor) 280X  18MHs 67C-68C
(Powercolor) 7950  16MHs 68C-69C
Both cards are volt-modded to lower than stock ...

good, but if you lower the mem clock you will save power, and get higher maximum core clock as well.
hero member
Activity: 630
Merit: 500
I suppose I'll have to upgrade the RAM on my miner box from 2G to 4G now ... occasional 14.7RC2 driver crashes at I=22

does it work at I=21? there should be little difference in hashrate.

Actually runs faster at I=21 Smiley
Have not messed with GPU or MEM clocks just defaults Smiley  
(Powercolor) 280X  18MHs 67C-68C
(Powercolor) 7950  16MHs 68C-69C
Both cards are volt-modded to lower than stock ...
hero member
Activity: 808
Merit: 1014
Have anybody tried with 270X cards - what hashrate should I expect?
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
I suppose I'll have to upgrade the RAM on my miner box from 2G to 4G now ... occasional 14.7RC2 driver crashes at I=22

does it work at I=21? there should be little difference in hashrate.
hero member
Activity: 630
Merit: 500
I suppose I'll have to upgrade the RAM on my miner box from 2G to 4G now ... occasional 14.7RC2 driver crashes at I=22
hero member
Activity: 630
Merit: 500
Sent ya 0.5 DMD for now, will send some more after it runs stable for a day Smiley
Transaction ID: 37bca0a9872845908b4fc4e223d920b3355b5bbbb54de97a583aee67c7b4605d
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
@pallas
What's your DMD donation address Smiley Found 2 blocks in like 15 minutes (LUCK!)

good!
my DMD address is dVrz69vZFrxJRH9AnKyHim7Hd3PhY3w9NQ
hero member
Activity: 630
Merit: 500
@pallas
What's your DMD donation address Smiley Found 2 blocks in like 15 minutes (LUCK!)
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
Again it's mostly about memory for groestl: optimizing register operations might lead to unnoticeable gain but you may loose on memory access.
sr. member
Activity: 266
Merit: 250
Hi pallas!

Good stuff!

Have you tried other groestl algorithms? Like byte slicing or bit slicing?

Did you also have strange situations where better looking code actually ran slower?

PS: I wish you had opened the code earlier, Groestlcoin even had a bounty for an optimized miner.

Thanks
srcxxx

I did a lot of optimizations which looked smart but lead to slower code. You can't imagine how many. Some of them I though of while in bed and made me not sleep until I could try them. Then the delusion :-D
I believe it's because of the optimizations the compiler does but most of all about local memory and cache access.
If you optimize some instructions but leading to less parallelism in memory access, you'll get slower code.
That's typical for groestl because the speed is limited mostly by the memory (otherwise it would be faster like keccak, blake etc.).
I know there was a bounty: my code was unfinished and, at the time, it was the only way I could mine without loosing money.

I know. I actually think that the compiler is not that clever and that's why sometimes worse code runs faster.
Also, I looked at ASM and some stuff there is just plain not optimal. Perhaps it'll be improved in future versions of AMD drivers.

Also, most ASM code only uses .xy from a register. I tried making it work on ulong2 or ulong8 - only slower.

I wish it was possible to write GPU code in assembler...
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
Wow that's a nice improvement on hashrate Smiley  Now tuning for stability on my miners ...
Sending a donation your way next block find Smiley

Thanks!
Let me know your figures.
I need 280x and 290x hashrates, to put in the op.
hero member
Activity: 630
Merit: 500
Wow that's a nice improvement on hashrate Smiley  Now tuning for stability on my miners ...
Sending a donation your way next block find Smiley

Testing on HD7950 and R9280X and will report my hashrates when I get it stable Smiley
Both cards run considerably hotter and 100% fan ...
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
Hi pallas!

Good stuff!

Have you tried other groestl algorithms? Like byte slicing or bit slicing?

Did you also have strange situations where better looking code actually ran slower?

PS: I wish you had opened the code earlier, Groestlcoin even had a bounty for an optimized miner.

Thanks
srcxxx

I did a lot of optimizations which looked smart but lead to slower code. You can't imagine how many. Some of them I though of while in bed and made me not sleep until I could try them. Then the delusion :-D
I believe it's because of the optimizations the compiler does but most of all about local memory and cache access.
If you optimize some instructions but leading to less parallelism in memory access, you'll get slower code.
That's typical for groestl because the speed is limited mostly by the memory (otherwise it would be faster like keccak, blake etc.).
I know there was a bounty: my code was unfinished and, at the time, it was the only way I could mine without loosing money.
sr. member
Activity: 266
Merit: 250
Hi pallas!

Good stuff!

Have you tried other groestl algorithms? Like byte slicing or bit slicing?

Did you also have strange situations where better looking code actually ran slower?

PS: I wish you had opened the code earlier, Groestlcoin even had a bounty for an optimized miner.

Thanks
srcxxx
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
That's what opensource is about ;-)
I'm a linux guy for 20 years now and I remember public domain software since the commodore age (around 1984).
hero member
Activity: 774
Merit: 554
CEO Diamond Foundation
Pallas you are Prometheus, spending your time and skills in creating something useful to a lot of people and at the end opening it to all interested. Kudos  Smiley
legendary
Activity: 3052
Merit: 1053
bit.diamonds | uNiq.diamonds
thx a lot for ur effort
to make best possible amd based mining open source avaiable for
DMD Diamond

legendary
Activity: 2716
Merit: 1094
Black Belt Developer
A BIT OF HISTORY

The first gpu miner for groestlcoin and similar was sph-sgminer by phm. Optimizing the original implementation was trivial (almost 3x the speed could be achived!), so probably there are tens of optimized versions around, many of which have been kept private: mining groestlcoin and similar was always unfair for most people, at least for non-devs.
Hopefully this kernel will end this and should also level the field between amd and nvidia.
I believe my version is faster than many of the other kernels because of the time I dedicated to it and the thousands of tests I did.

FINAL ADVICES

I suggest to keep "good binaries": make a backup of the fastest .bin files you have, so you can recover them in case of driver problems.
Also this will enable you to get 1 o 2 percent more hashrate because of compiler variance (try removing the bin and running 3/4 times to see the variance in action).
Or use the provided bin file (see the OP) which should be a good one.

I've experienced lower power usage with catalyst 14.9 compared to 14.6 beta (a bit less compared to 13 but still better). Speaking of optimization, this should be kept in mind: buy a power meter for your miner(s)!
But it looks like the compiler included in 14.9 drivers produces binaries which run considerably slower than older releases. If you are on 14.9, use the provided bin file instead of the kernel source in .cl format.
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
**** MYRIAD GROESTL ****

If you are looking for the closed source myriad groestl miner (for DGB, SFR, etc.) look here instead:

https://satoshibox.com/fttcfvpiyhbod7ueidmgdhym

ABOUT

This is my optimized Groestlcoin / Diamond and similar opencl kernel (groestl + groestl algorythm, not myriad-groestl which is groestl + sha, see the top of this post for the latter).
It is based on the sph version originally available on sph-sgminer but is now totally rewritten.
It should be compatible with all sph-sgminer versions and derivates.

PERFORMANCE

v1 - to be compiled with catalyst 14.6 or 14.7:

R9 290x @1125 Mhz: ~26.4 Mh/s
R9 290 @1200: ~25 Mh/s
R9 280x (stock): ~18 Mh/s
7950 @1200: ~16 Mh/s
R9 270X: ~9.7 Mh/s

v2 - experimental hawaii only bin:

R9 290x @1125 Mhz: ~34.4 Mh/s
R9 290 @1100: ~30.6 Mh/s

Wolf0's Tahiti binary:

R9 280x: ~25 Mh/s

HOW TO USE

- Stop the miner
- Replace groestlcoin.cl, diamond.cl and/or the kernel you want to use with this one (it's inside the "kernel" folder)
- Remove all the .bin files (in the main folder)
- Set worksize to 256 only (-w 256)
- Run and enjoy!

TWEAKING

Set intensity from 20 to 22. Thread concurrency and all the other parameters are useless.
This kernel doesn't make use of gpu ram, so set the ram clock to THE MINIMUM POSSIBLE VALUE; for example 150 MHz for R9 290.
Now play with the core clock until you find the highest stable value (probably between 1100 and 1200 for the R9 290).

COMPATIBILITY

Tested working stable on R9 290, 280x and 7950. Should work on any recent amd gpu but performance is not guaranted to be optimal.
I doesn't work with cryptohunger optimized pool: use the conventional port or another pool. Also do not replace the optimized kernel of grs-sgminer but the normal one.

TROUBLESHOOTING

Try the following:
- Sure you set worksize to 256?
- Replace the generated .bin file with this one (64 bit, r9 280(x) and 290(x) only): LINK EXPIRED (diamondHawaiiw256l8.bin), see below for a newer binary file
- Lower the intensity
- Lower the core speed (are you sure you put the ram clock to the lowest possible value?)
- Since it uses more power, it could be a cooling issue too: check the gpu temperature

DONATIONS

This work took me months of coding and testing and unslept nights; please show your appreciation (you are making more money by using it!) by donating to:
BTC: DISABLED

DOWNLOAD

Opensource Kernel (v1):
https://app.box.com/s/9vikvemf7acio3uns7taorodqmod42ej

Experimental Hawaii bin (v2):
https://app.box.com/s/zsr29tfgv4tpxs1q7451dayzaw3wnoee

Wolf0's Tahiti bin (https://bitcointalksearch.org/topic/m.11778971):
https://ottrbutt.com/miner/wolf-groestlcoinTahitigw256l4.bin
Pages:
Jump to: