Author

Topic: [ANN] sgminer v5 - optimized X11/X13/NeoScrypt/Lyra2RE/etc. kernel-switch miner - page 106. (Read 877846 times)

hero member
Activity: 935
Merit: 1001
I don't always drink...
@ bobben2, 280x with hynix 302Kh/s 1100/1600 x 4 is 990 watts at the wall

Those cards must be screaming  Grin
How much of a %age improvement did you get?

Sorry, 1600 was a type, it is 1500
about 8.6% with the below config.  Note the undervoltage to 1000.

Quote
{
  "pools": [
    {
      "name": "FeatherCoin-neo Pool - WemineFTC",
      "nfactor": "10",
      "algorithm": "neoscrypt",
      "url": "stratum+tcp://stratum.wemineftc.com:4444",
      "user": "USER",
      "pass": "x"
    }
  ],
  "api-port": "4028",
  "gpu-engine": "1100",
  "gpu-memclock": "1500",
  "worksize": "256",
  "gpu-threads": "2",
  "api-listen": true,
  "api-allow": "W:127.0.0.1/32",
  "queue": "1",
  "algorithm": "neoscrypt",
  "device": "0,1,2,3",
  "xintensity": "3",
  "thread-concurrency": "8192",
  "gpu-vddc": "1.00",
  "scan-time": "1",
  "gpu-reorder": true,
  "temp-cutoff": "90",
  "temp-overheat": "82",
  "temp-target": "72",
  "gpu-platform": "0",
  "gpu-dyninterval": "7",
  "expiry": "1",
  "no-pool-disable": true,
  "no-client-reconnect": true,
  "log": "5",
  "no-submit-stale": true,
  "scrypt": true,
  "tcp-keepalive": "30",
  "temp-hysteresis": "3",
  "kernel-path": "/usr/local/bin",
  "powertune": "20"
}
newbie
Activity: 57
Merit: 0
@ bobben2, 280x with hynix 302Kh/s 1100/1600 x 4 is 990 watts at the wall

That's not very good hashrate with the public kernel.. 1600 is not very good memclock, try lowering to 1500.

I had about 320khs with 1100/1500..
full member
Activity: 279
Merit: 104
@ bobben2, 280x with hynix 302Kh/s 1100/1600 x 4 is 990 watts at the wall

Those cards must be screaming  Grin
How much of a %age improvement did you get?
member
Activity: 81
Merit: 1002
It was only the wind.
hero member
Activity: 935
Merit: 1001
I don't always drink...
@ bobben2, 280x with hynix 302Kh/s 1100/1600 x 4 is 990 watts at the wall
full member
Activity: 279
Merit: 104
Here is a small neoscrypt kernel improvement for free, since I am mostly doing X11 anyway.
It gave me a 5.8% speedup on my reference R9 290 card (with Stilt bios),
from 290.2 to 307Kh/s  at 800/1500 core/mem freq on Ubuntu 12.04 with stock drivers.
I didnt try it on my R9 280x cards, so please post your results if you try this.

You will have to mod the kernel as per the code below.
The bottleneck in this kernel is the way it stores the 128 intermediate results of chacha and salsa in global memory.
By doing the change below you are reducing stalls/latency by not making read/writes to same/adjacent memory banks.

Change:
void ScratchpadStore(__global void *V, void *X, uchar idx)
{
   ((__global ulong16 *)V)[idx << 1] = ((ulong16 *)X)[0];
   ((__global ulong16 *)V)[(idx << 1) + 1] = ((ulong16 *)X)[1];
}

void ScratchpadMix(void *X, const __global void *V, uchar idx)
{
   ((ulong16 *)X)[0] ^= ((__global ulong16 *)V)[idx << 1];
   ((ulong16 *)X)[1] ^= ((__global ulong16 *)V)[(idx << 1) + 1];
}

To:
void ScratchpadStore(__global void *V, void *X, uchar idx)
{
   ((__global ulong16 *)V)[idx] = ((ulong16 *)X)[0];
   ((__global ulong16 *)V)[idx + 128] = ((ulong16 *)X)[1];
}
void ScratchpadMix(void *X, const __global void *V, uchar idx)
{
   ((ulong16 *)X)[0] ^= ((__global ulong16 *)V)[idx];
   ((ulong16 *)X)[1] ^= ((__global ulong16 *)V)[idx + 128];
}
member
Activity: 81
Merit: 1002
It was only the wind.
BTW I got another 10℅, programming on the phone.

I tuned for 290X - you hit 1.6MH/s yet?

I'm on 290@1000/1250, about 1.1 Mh/s.
But income is so low it's not worth working on it.
actually I tried to use your groestl kernel yesterday (at least one part... wasn't in the mood to unroll everything  Grin), but I don't see any difference and actually it was a bit slower for me).

Wolf0, can you post your speed with these core/mem 1030/1250 (can't get my card to work at 1500MHz).
My current average speed with my r9 290x and my latest kernel is 1.3MH/s


1.35 or so, but I don't have your blake fix in.
ok so it isn't that bad  Grin
You won't gain much with the blake fix, blake takes practically no time compared to the rest. It is a lot of pain to put in and sgminer5 dev admin doesn't like it  Grin
(I need to change a few thing before they agree to update)

Ah, okay. I'll work on it more later. Ran out of drugs - should have more today Cheesy
sr. member
Activity: 539
Merit: 255
legendary
Activity: 1512
Merit: 1000
quarkchain.io
..who doesn't want those wolf0's kernels Smiley
member
Activity: 81
Merit: 1002
It was only the wind.
BTW I got another 10℅, programming on the phone.

I tuned for 290X - you hit 1.6MH/s yet?

I'm on 290@1000/1250, about 1.1 Mh/s.
But income is so low it's not worth working on it.
actually I tried to use your groestl kernel yesterday (at least one part... wasn't in the mood to unroll everything  Grin), but I don't see any difference and actually it was a bit slower for me).

Wolf0, can you post your speed with these core/mem 1030/1250 (can't get my card to work at 1500MHz).
My current average speed with my r9 290x and my latest kernel is 1.3MH/s


1.35 or so, but I don't have your blake fix in.
legendary
Activity: 1596
Merit: 1000
Hey, so i started testing neoscrypt configs, but i have no idea what i am aiming for.
Can anyone share what is the most you can get from 7950 ? I am getting around 220kh/s. Is that good ? can i get more then that ?

Cheers

Looks like you can get a little more..  

http://hw.neoscrypt.tk/index.php

I actually got them up to 260 but i am still wondering if i can get more out of them. I am using sgminer5.1-dev

I would like to know what is the best some one got out of this cards or from any cards actually.
I mean any optimized hidden super secret kernel etc Smiley
Just wondering what is the max at this moment.

Around 600kh/s out of 290X.

That's insane! I want your kernel. Wink

Do you have any power figures VS the publicly available kernel that gets me ~315KH/s on 290 & 290X @975/1500?

Also, what clocks are you running? Would you mind sharing your TC and other relevant settings if they can be applied to the public kernel?
sr. member
Activity: 539
Merit: 255
I don't think elpida memory could account for such a wide discrepancy in the hash or the HW errors.  At the very least a 290x should be equal with a 290, the only difference is the number of shaders at least for a reference card.  If it was a hynix 290x hitting 330+ and an elpida 290x hitting 310, then maybe I could understand.  I've got a hynix 290x in my test rig and I forgot what the other one is but I can test.  I realize you spent hours on your config for your 290 but if you feel like showing it I can start from there.  I've got 14.6 rc2 installed on the test rig and I'm going to install 14.9 tonight and drop the 14.6 in the mining directory (I also use wolf0's builds).  Do you use Stilt's bios?  I couldn't get stilt's bios stable for the X coins but I wonder if it will work for neoscrypt.  Maybe Stilt on neoscrypt will let us find the right ratio of gpu to memory clock (if neoscrypt is anything like scrypt).  Sad thing is this really doesn't make too much of a difference in profit.  

Yeah, right from the get-go tuning the GPU's for neoscrypt (and tuning them for scrypt as well) using the exact same settings would typically get the 290's with Hynix more hash than the 290x's with elpida.  Then once that peaked I split them off in different directions for tuning.

There was a single setting where the 290x got more hash than the 290, but that was back in the 30 to 60 kh/s range and was never repeatable with the newer kernel and drivers, even with the same settings.

It may be more than just the hynix/elpida thing, it could be in the card hardware or bios (never tried stilts).  I never dug deeper than the memory after I figured out why the 290's were outpacing the 290x's mining scrypt.  But, when I was doing scrypt the hashrates were much closer, so it could also be that the bottleneck in the kernel affects the 290x worse.

And you have another good point..  At this time hashrate isn't good for much more than bragging rights unless you have a farm..  I heard via the rumor mill that there's a massive GPU farm getting built.  If that is true, then unless there's some attractive new coins to draw the hash, the GPU mine-able coins are all going to get diluted even further.  It would help if BTC weren't tanking, but there was a big hype bubble to recover from..

My elpida 290x's always outperformed my hynix 290s, especially with Stilt's bios.  Stilt's bios was stable for the 2 hynix 290x on my test rig mining neoscrypt but it didn't seem to make a difference in the max hash I could get.  With either bios I could squeeze out about 330kh/s by overclocking to 1070/1500.  There was no magic ratio that I could find but that may be because I'm running the stock kernel.  I have a feeling stilt's bios may help out with a better kernel, if not for performance then for energy savings.  The core clock doesn't influence hash that much which is something wolf0 and others have said, ie crank up the memory speed and downclock the core for energy savings.

Testing the different drivers didn't make a difference to me, in fact I saw a slight increase in just sticking with 14.6rc2, as opposed to using 14.9 and 14.6 ocl files.  Maybe you play games and 14.9 is better for that but I don't use these for gaming.  Testing different settings didn't really make too much of a difference either, TCs of 8192, 8448, 16384, 22500 (I used that for scrypt-n) and 22528 and different worksizes didn't produce a significant change.  The 290 kernel bottleneck is a problem.  

I wouldn't worry to much about a massive gpu farm.  I don't think it will matter too much, there will always be new farmers and some will also leave.  Now if it's wolf0's farm then maybe that would be something to worry about, Wink  But thanks to his hawaii bin, x11 is much more profitable for me than neoscrypt.

Anyway

My 290x don't like over 1130 clocks and 1450 memory at all, they keep crashing the drivers or the rig.  But the 290's will run 1150 and 1500.  They might run more, but I haven't really pushed them as I had bad fans.  Now they are back with new fans, I will push further when I have the time to pay attention to it.

No gaming on this rig..  It's pretty capable of it with an AMD8350 and 16 gigs of ram, but 4 of the 5 GPU's only have a 1x extender so it would be playing with a single 280x.  I noticed about 5 to 10 mhs difference with the 14.6/14.9, odd that you don't.

Yes, there's a whole lotta blah settings in that bottlenecked area, I tried them all, and many a few times as I was running 4 gpu's and would set all 4 differently during the testing to cover as much ground as possible.
member
Activity: 81
Merit: 1002
It was only the wind.
Sorry man, maybe I'm getting a bit frustrated; I've been here since Page 1 of this thread, meticulously bugged ystarnaud to implement a ton of fucking features and bugfixes and improvements, and have provided him with great feedback and also this entire community on the thread with multiple configs, test cases, driver version feedbacks and a ton of other shit.  Yes, I don't have a build environment so I can just tweak what I want, and badman74 and Elun and Elbandi have provided me with so many versions of sgminer5 by now it's hard to count.  I feel like any of the advice, configs, trial and error I have done and posted results here for, has only really shot me in the foot.  It appears in the mining game there's no reason ever to show anybody else how to get their shit set up right, because if there's isn't set up right and mine is, then that would mean I'm making more than they are.  I thought it was essential to giving back to the community.  I don't make the miners, the kernels, but sometimes I do find out what makes them faster, and I share it.  And now it's making me regret it, since we end up having to wait for table scraps ("leaked" kernels as opposed to shared) and having to deal with listening to fuckers brag about programming skills when they should just crowdfund it for a price to open source it or something.  I don't know.  As I said, frustrated.  You can see how this happened with neoscrypt.  It was profitable for a few days at best, and now I haven't seen it on westhash as most profitable in the last week.  It got ran into the ground, and we weren't even at its most optimized yet.  The field is way different than when scrypt was here - everybody shared there - it was a common struggle.

I shared Neoscrypt - that's WHY it got ran into the ground.

Shared it as in it's public now?

If you're mining above 100kh/s, you're using my code.
sr. member
Activity: 539
Merit: 255
Hey, so i started testing neoscrypt configs, but i have no idea what i am aiming for.
Can anyone share what is the most you can get from 7950 ? I am getting around 220kh/s. Is that good ? can i get more then that ?

Cheers

Looks like you can get a little more.. 

http://hw.neoscrypt.tk/index.php

I actually got them up to 260 but i am still wondering if i can get more out of them. I am using sgminer5.1-dev

I would like to know what is the best some one got out of this cards or from any cards actually.
I mean any optimized hidden super secret kernel etc Smiley
Just wondering what is the max at this moment.

Around 600kh/s out of 290X.

You should be able to top that with a 295x2..  Wink
full member
Activity: 169
Merit: 100
I don't think elpida memory could account for such a wide discrepancy in the hash or the HW errors.  At the very least a 290x should be equal with a 290, the only difference is the number of shaders at least for a reference card.  If it was a hynix 290x hitting 330+ and an elpida 290x hitting 310, then maybe I could understand.  I've got a hynix 290x in my test rig and I forgot what the other one is but I can test.  I realize you spent hours on your config for your 290 but if you feel like showing it I can start from there.  I've got 14.6 rc2 installed on the test rig and I'm going to install 14.9 tonight and drop the 14.6 in the mining directory (I also use wolf0's builds).  Do you use Stilt's bios?  I couldn't get stilt's bios stable for the X coins but I wonder if it will work for neoscrypt.  Maybe Stilt on neoscrypt will let us find the right ratio of gpu to memory clock (if neoscrypt is anything like scrypt).  Sad thing is this really doesn't make too much of a difference in profit. 

Yeah, right from the get-go tuning the GPU's for neoscrypt (and tuning them for scrypt as well) using the exact same settings would typically get the 290's with Hynix more hash than the 290x's with elpida.  Then once that peaked I split them off in different directions for tuning.

There was a single setting where the 290x got more hash than the 290, but that was back in the 30 to 60 kh/s range and was never repeatable with the newer kernel and drivers, even with the same settings.

It may be more than just the hynix/elpida thing, it could be in the card hardware or bios (never tried stilts).  I never dug deeper than the memory after I figured out why the 290's were outpacing the 290x's mining scrypt.  But, when I was doing scrypt the hashrates were much closer, so it could also be that the bottleneck in the kernel affects the 290x worse.

And you have another good point..  At this time hashrate isn't good for much more than bragging rights unless you have a farm..  I heard via the rumor mill that there's a massive GPU farm getting built.  If that is true, then unless there's some attractive new coins to draw the hash, the GPU mine-able coins are all going to get diluted even further.  It would help if BTC weren't tanking, but there was a big hype bubble to recover from..

My elpida 290x's always outperformed my hynix 290s, especially with Stilt's bios.  Stilt's bios was stable for the 2 hynix 290x on my test rig mining neoscrypt but it didn't seem to make a difference in the max hash I could get.  With either bios I could squeeze out about 330kh/s by overclocking to 1070/1500.  There was no magic ratio that I could find but that may be because I'm running the stock kernel.  I have a feeling stilt's bios may help out with a better kernel, if not for performance then for energy savings.  The core clock doesn't influence hash that much which is something wolf0 and others have said, ie crank up the memory speed and downclock the core for energy savings.

Testing the different drivers didn't make a difference to me, in fact I saw a slight increase in just sticking with 14.6rc2, as opposed to using 14.9 and 14.6 ocl files.  Maybe you play games and 14.9 is better for that but I don't use these for gaming.  Testing different settings didn't really make too much of a difference either, TCs of 8192, 8448, 16384, 22500 (I used that for scrypt-n) and 22528 and different worksizes didn't produce a significant change.  The 290 kernel bottleneck is a problem. 

I wouldn't worry to much about a massive gpu farm.  I don't think it will matter too much, there will always be new farmers and some will also leave.  Now if it's wolf0's farm then maybe that would be something to worry about, Wink  But thanks to his hawaii bin, x11 is much more profitable for me than neoscrypt.

Anyway
hero member
Activity: 518
Merit: 500
Hey, so i started testing neoscrypt configs, but i have no idea what i am aiming for.
Can anyone share what is the most you can get from 7950 ? I am getting around 220kh/s. Is that good ? can i get more then that ?

Cheers

Looks like you can get a little more.. 

http://hw.neoscrypt.tk/index.php

I actually got them up to 260 but i am still wondering if i can get more out of them. I am using sgminer5.1-dev

I would like to know what is the best some one got out of this cards or from any cards actually.
I mean any optimized hidden super secret kernel etc Smiley
Just wondering what is the max at this moment.
sr. member
Activity: 539
Merit: 255
Hey, so i started testing neoscrypt configs, but i have no idea what i am aiming for.
Can anyone share what is the most you can get from 7950 ? I am getting around 220kh/s. Is that good ? can i get more then that ?

Cheers

Looks like you can get a little more.. 

http://hw.neoscrypt.tk/index.php
hero member
Activity: 518
Merit: 500
Hey, so i started testing neoscrypt configs, but i have no idea what i am aiming for.
Can anyone share what is the most you can get from 7950 ? I am getting around 220kh/s. Is that good ? can i get more then that ?

Cheers
sr. member
Activity: 539
Merit: 255
I don't think elpida memory could account for such a wide discrepancy in the hash or the HW errors.  At the very least a 290x should be equal with a 290, the only difference is the number of shaders at least for a reference card.  If it was a hynix 290x hitting 330+ and an elpida 290x hitting 310, then maybe I could understand.  I've got a hynix 290x in my test rig and I forgot what the other one is but I can test.  I realize you spent hours on your config for your 290 but if you feel like showing it I can start from there.  I've got 14.6 rc2 installed on the test rig and I'm going to install 14.9 tonight and drop the 14.6 in the mining directory (I also use wolf0's builds).  Do you use Stilt's bios?  I couldn't get stilt's bios stable for the X coins but I wonder if it will work for neoscrypt.  Maybe Stilt on neoscrypt will let us find the right ratio of gpu to memory clock (if neoscrypt is anything like scrypt).  Sad thing is this really doesn't make too much of a difference in profit. 

Yeah, right from the get-go tuning the GPU's for neoscrypt (and tuning them for scrypt as well) using the exact same settings would typically get the 290's with Hynix more hash than the 290x's with elpida.  Then once that peaked I split them off in different directions for tuning.

There was a single setting where the 290x got more hash than the 290, but that was back in the 30 to 60 kh/s range and was never repeatable with the newer kernel and drivers, even with the same settings.

It may be more than just the hynix/elpida thing, it could be in the card hardware or bios (never tried stilts).  I never dug deeper than the memory after I figured out why the 290's were outpacing the 290x's mining scrypt.  But, when I was doing scrypt the hashrates were much closer, so it could also be that the bottleneck in the kernel affects the 290x worse.

And you have another good point..  At this time hashrate isn't good for much more than bragging rights unless you have a farm..  I heard via the rumor mill that there's a massive GPU farm getting built.  If that is true, then unless there's some attractive new coins to draw the hash, the GPU mine-able coins are all going to get diluted even further.  It would help if BTC weren't tanking, but there was a big hype bubble to recover from..
full member
Activity: 166
Merit: 100
Developer


Here you can observe my AMD Shaphire 7970 working. It is incredibly fast.

Thank you very much.
Jump to: