Pages:
Author

Topic: [ANN][GRS][DMD][DGB] Pallas optimized groestl opencl kernels - page 16. (Read 61242 times)

legendary
Activity: 2716
Merit: 1094
Black Belt Developer
@pallas could u find the actual state of the art mining software for DMD Groestl and post links in DMD ANN we then will update software on website

it would be great if it include ur performance boost tricks already.....

i think no one from our core team runs AMD cards any longer so ur help would be welcome


the problem with my kernel is that, no matter how hard I try, I can't get the best hashrate on 14.9 drivers (only 20 Mh/s vs 25 with 14.6), so it's not enough to just replace diamond.cl on sgminer 4.1 or 5.
that's why I still prefer people visit this post, with all the info and troubleshooting, for best performance.
the only way to make it clean is creating a fork of sgminer, for tahiti and hawaii cards only, with the precompiled binary; some changes are needed in order for it to always use the binary and not compile the cl sources.
not sure I like it but it might work for many... what do you think?
member
Activity: 81
Merit: 1002
It was only the wind.
A BIT OF HISTORY

The first gpu miner for groestlcoin and similar was sph-sgminer by phm. Optimizing the original implementation was trivial (almost 3x the speed could be achived!), so probably there are tens of optimized versions around, many of which have been kept private: mining groestlcoin and similar was always unfair for most people, at least for non-devs.
Hopefully this kernel will end this and should also level the field between amd and nvidia.
I believe my version is faster than many of the other kernels because of the time I dedicated to it and the thousands of tests I did.

FINAL ADVICES

I suggest to keep "good binaries": make a backup of the fastest .bin files you have, so you can recover them in case of driver problems.
Also this will enable you to get 1 o 2 percent more hashrate because of compiler variance (try removing the bin and running 3/4 times to see the variance in action).
Or use the provided bin file (see the OP) which should be a good one.

I've experienced lower power usage with catalyst 14.9 compared to 14.6 beta (a bit less compared to 13 but still better). Speaking of optimization, this should be kept in mind: buy a power meter for your miner(s)!
But it looks like the compiler included in 14.9 drivers produces binaries which run considerably slower than older releases. If you are on 14.9, use the provided bin file instead of the kernel source in .cl format.

If you want to fix it for 14.9, remove the naive implementation of the B64_# macros and use swizzle. Worked for me.

Thanks, but I've already tried any combination of bitwise operations and vectors (as_uchar...): I could make it work but hashrate is about 20 Mh/s vs 25 Mh/s of 14.6 beta.

Ah, I see - I just saw it go from 7MH/s to... I think 20, on 14.9, so I figured it worked; never mind, then.
legendary
Activity: 3052
Merit: 1053
bit.diamonds | uNiq.diamonds
@pallas could u find the actual state of the art mining software for DMD Groestl and post links in DMD ANN we then will update software on website

it would be great if it include ur performance boost tricks already.....

i think no one from our core team runs AMD cards any longer so ur help would be welcome
member
Activity: 81
Merit: 1002
It was only the wind.
A BIT OF HISTORY

The first gpu miner for groestlcoin and similar was sph-sgminer by phm. Optimizing the original implementation was trivial (almost 3x the speed could be achived!), so probably there are tens of optimized versions around, many of which have been kept private: mining groestlcoin and similar was always unfair for most people, at least for non-devs.
Hopefully this kernel will end this and should also level the field between amd and nvidia.
I believe my version is faster than many of the other kernels because of the time I dedicated to it and the thousands of tests I did.

FINAL ADVICES

I suggest to keep "good binaries": make a backup of the fastest .bin files you have, so you can recover them in case of driver problems.
Also this will enable you to get 1 o 2 percent more hashrate because of compiler variance (try removing the bin and running 3/4 times to see the variance in action).
Or use the provided bin file (see the OP) which should be a good one.

I've experienced lower power usage with catalyst 14.9 compared to 14.6 beta (a bit less compared to 13 but still better). Speaking of optimization, this should be kept in mind: buy a power meter for your miner(s)!
But it looks like the compiler included in 14.9 drivers produces binaries which run considerably slower than older releases. If you are on 14.9, use the provided bin file instead of the kernel source in .cl format.

If you want to fix it for 14.9, remove the naive implementation of the B64_# macros and use swizzle. Worked for me.
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
Thanks, but I've already tried any combination of bitwise operations and vectors (as_uchar...): I could make it work but hashrate is about 20 Mh/s vs 25 Mh/s of 14.6 beta.
Ah, I see - I just saw it go from 7MH/s to... I think 20, on 14.9, so I figured it worked; never mind, then.

It's funny how some little changes lead to huge hashrate drops (depending on compiler version); but it's true for memory intensive algos only, as far as I can see.
Maybe your own version doesn't have this problem, then ;-)
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
A BIT OF HISTORY

The first gpu miner for groestlcoin and similar was sph-sgminer by phm. Optimizing the original implementation was trivial (almost 3x the speed could be achived!), so probably there are tens of optimized versions around, many of which have been kept private: mining groestlcoin and similar was always unfair for most people, at least for non-devs.
Hopefully this kernel will end this and should also level the field between amd and nvidia.
I believe my version is faster than many of the other kernels because of the time I dedicated to it and the thousands of tests I did.

FINAL ADVICES

I suggest to keep "good binaries": make a backup of the fastest .bin files you have, so you can recover them in case of driver problems.
Also this will enable you to get 1 o 2 percent more hashrate because of compiler variance (try removing the bin and running 3/4 times to see the variance in action).
Or use the provided bin file (see the OP) which should be a good one.

I've experienced lower power usage with catalyst 14.9 compared to 14.6 beta (a bit less compared to 13 but still better). Speaking of optimization, this should be kept in mind: buy a power meter for your miner(s)!
But it looks like the compiler included in 14.9 drivers produces binaries which run considerably slower than older releases. If you are on 14.9, use the provided bin file instead of the kernel source in .cl format.

If you want to fix it for 14.9, remove the naive implementation of the B64_# macros and use swizzle. Worked for me.

Thanks, but I've already tried any combination of bitwise operations and vectors (as_uchar...): I could make it work but hashrate is about 20 Mh/s vs 25 Mh/s of 14.6 beta.
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
been busy working with miningfield setting up my USA mirror of pools.
yah if you can post a modifed .cl for ws 128 I'll test it on 5450

Made a lot of progress on USA mirror setup today Smiley might have it online by monday Smiley

just change the initial part of the main function to:

for (u = get_local_id(0); u < 256; u += get_local_size(0)) {
  T2 = ROTL64(T0, 16UL);
  T3 = ROTL64(T0, 24UL);
  T4 = ROTL64(T0, 32UL);
  T5 = ROTL64(T0, 40UL);
  T6 = ROTL64(T0, 48UL);
  T7 = ROTL64(T0, 56UL);
}

this part was blocking worksize < 256.
as I said previously, it still might not work or be very slow for tuning reasons.
let me know of it works.
thanks!
thanks will do as soon as I get time Smiley  VM server box has 5450 in it might as well let the host make use of it, I can get ~0.25MHs with normal gorestlcoin kernel with ws 128 on it ... it's running 24/7/365 anyway Smiley

Only has 80 shaders LOL it's a dwarf but is air cooled hehe
about on par with intel HD GPU (10 shaders) in G3220 CPU as far as hashrate

just curious... did you manage to make it work? if yes, what hashrate?

EDIT: it has about half the shaders of a nexus 9 :-D
full member
Activity: 151
Merit: 100
Help! What the bat file to start Diamond? I can not run for card 280x.Thanks
hero member
Activity: 630
Merit: 500
been busy working with miningfield setting up my USA mirror of pools.
yah if you can post a modifed .cl for ws 128 I'll test it on 5450

Made a lot of progress on USA mirror setup today Smiley might have it online by monday Smiley

just change the initial part of the main function to:

for (u = get_local_id(0); u < 256; u += get_local_size(0)) {
  T2 = ROTL64(T0, 16UL);
  T3 = ROTL64(T0, 24UL);
  T4 = ROTL64(T0, 32UL);
  T5 = ROTL64(T0, 40UL);
  T6 = ROTL64(T0, 48UL);
  T7 = ROTL64(T0, 56UL);
}

this part was blocking worksize < 256.
as I said previously, it still might not work or be very slow for tuning reasons.
let me know of it works.
thanks!
thanks will do as soon as I get time Smiley  VM server box has 5450 in it might as well let the host make use of it, I can get ~0.25MHs with normal gorestlcoin kernel with ws 128 on it ... it's running 24/7/365 anyway Smiley

Only has 80 shaders LOL it's a dwarf but is air cooled hehe
about on par with intel HD GPU (10 shaders) in G3220 CPU as far as hashrate

legendary
Activity: 2716
Merit: 1094
Black Belt Developer
been busy working with miningfield setting up my USA mirror of pools.
yah if you can post a modifed .cl for ws 128 I'll test it on 5450

Made a lot of progress on USA mirror setup today Smiley might have it online by monday Smiley

just change the initial part of the main function to:

for (u = get_local_id(0); u < 256; u += get_local_size(0)) {
  T2 = ROTL64(T0, 16UL);
  T3 = ROTL64(T0, 24UL);
  T4 = ROTL64(T0, 32UL);
  T5 = ROTL64(T0, 40UL);
  T6 = ROTL64(T0, 48UL);
  T7 = ROTL64(T0, 56UL);
}

this part was blocking worksize < 256.
as I said previously, it still might not work or be very slow for tuning reasons.
let me know of it works.
thanks!
hero member
Activity: 630
Merit: 500
been busy working with miningfield setting up my USA mirror of pools.
yah if you can post a modifed .cl for ws 128 I'll test it on 5450

Made a lot of progress on USA mirror setup today Smiley might have it online by monday Smiley
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
Any chance of getting a worksize 128 super optimized kernel to try on HD5450? (256 too large)

The changes needed to make it work at 128 are easy, but it probably won't be tuned well for such a card: I've tested on r9 290 and 7950 while developing. It might even not work at all.
If you want to try I can send you a file or the changes and if it works well we can post it here.
hero member
Activity: 630
Merit: 500
Any chance of getting a worksize 128 super optimized kernel to try on HD5450? (256 too large)
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
It looks like the compiler included in 14.9 drivers produces binaries which run considerably slower than older releases.
The same happens for other algorythms as well, for example on X11.
I've tweaked the code a bit but I still can't reach full speed, so I will keep on trying or, eventually, wait for a new driver release.
Meanwhile, if you are on 14.9, use the provided bin file instead of the kernel source in .cl format.
full member
Activity: 151
Merit: 100
I can not replace the bin file. After restarting the miner, is presented again the old bin file. How to replace it? Thank you

if a bin file with the same name already exists, it shouldn't replace it.
so best replace the bin file the miner creates with mine, using the same filename.
Thanks!!!
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
I can not replace the bin file. After restarting the miner, is presented again the old bin file. How to replace it? Thank you

if a bin file with the same name already exists, it shouldn't replace it.
so best replace the bin file the miner creates with mine, using the same filename.
full member
Activity: 151
Merit: 100
I can not replace the bin file. After restarting the miner, is presented again the old bin file. How to replace it? Thank you
hero member
Activity: 630
Merit: 500
Help set up R9 290 Tri-x! Thank you!
Read page 1 then ask question if help needed Smiley  (Sorry for being ill-tempered, I just had to block a cred card due to fraudulent charges being made on it, cash only till new card arrives).
full member
Activity: 151
Merit: 100
Help set up R9 290 Tri-x! Thank you!
Pages:
Jump to: