Author

Topic: Gateless Gate Sharp 1.3.8: 30Mh/s (Ethash) on RX 480! - page 186. (Read 214431 times)

full member
Activity: 224
Merit: 100
CryptoLearner
It turned out that the "legacy" AMD drivers require a totally different set of optimizations. This must be the reason why GG was running rather slow on older (GCN1/2) cards. I suppose optimizations for legacy drivers are worth the effort after all...

or people could update Smiley

Well, the thing is, with older cards, even the latest drivers always switch back to the "legacy" mode.
Weird, weird...

Ah i see, so it's not the drivers that is updated but just the package then... lazy amd lol, they don't make newest drivers compatible with old hardware, they just pack different drivers version into one package, no wonder they're so big, nvidia prob does the same, when you see that the package is 350+ MB
sr. member
Activity: 728
Merit: 304
Miner Developer
It turned out that the "legacy" AMD drivers require a totally different set of optimizations. This must be the reason why GG was running rather slow on older (GCN1/2) cards. I suppose optimizations for legacy drivers are worth the effort after all...

or people could update Smiley

Well, the thing is, with older cards, even the latest drivers always switch back to the "legacy" mode.
Weird, weird...
full member
Activity: 224
Merit: 100
CryptoLearner
xD, well to be honest there is no pros at keeping old drivers when there is numerous proof newer ones are working as good or even better, it's not THAT hard to update even if you got alot of rig and know a little about scripting/dev (if you got this much rig you must have some knowledge to have proper monitoring at least) i prefer the dev to be able to focus on one driver (that's what claymore for example does if i recall) otherwise you spend all your time for compatibility instead of improving performance.
sr. member
Activity: 574
Merit: 250
Fighting mob law and inquisition in this forum
It turned out that the "legacy" AMD drivers require a totally different set of optimizations. This must be the reason why GG was running rather slow on older (GCN1/2) cards. I suppose optimizations for legacy drivers are worth the effort after all...

or people could update Smiley
Smart move :-D
full member
Activity: 224
Merit: 100
CryptoLearner
It turned out that the "legacy" AMD drivers require a totally different set of optimizations. This must be the reason why GG was running rather slow on older (GCN1/2) cards. I suppose optimizations for legacy drivers are worth the effort after all...

or people could update Smiley
sr. member
Activity: 728
Merit: 304
Miner Developer
It turned out that the "legacy" AMD drivers require a totally different set of optimizations. This must be the reason why GG was running rather slow on older (GCN1/2) cards. I suppose optimizations for legacy drivers are worth the effort after all...
newbie
Activity: 39
Merit: 0
Well guess I did something wrong, with the latest amdgpu-pro drivers I built with make, then ran gatelessgate.py, getting 10/sec on each 480, anyone know where I messed up?  Cry Thanks!
full member
Activity: 491
Merit: 100
#SWGT PRE-SALE IS LIVE
What is the current speed based per cards?
If someone is using gg miner, please provide some feedback.

EDIT:
Quote
Total 760.7 sol/s [dev0 193.8, dev1 188.2, dev2 190.5, dev3 190.8] 15 shares
Total 760.9 sol/s [dev0 194.9, dev1 188.1, dev2 190.2, dev3 190.7] 15 shares
Total 761.2 sol/s [dev0 196.8, dev1 188.4, dev2 189.6, dev3 190.8] 15 shares
Total 761.0 sol/s [dev0 195.8, dev1 188.8, dev2 188.7, dev3 190.9] 16 shares
Total 761.9 sol/s [dev0 196.1, dev1 189.2, dev2 189.2, dev3 191.5] 17 shares
Total 761.5 sol/s [dev0 196.6, dev1 189.1, dev2 189.1, dev3 193.1] 18 shares
Total 761.1 sol/s [dev0 194.7, dev1 193.1, dev2 189.2, dev3 194.8] 18 shares
Total 761.5 sol/s [dev0 194.2, dev1 193.1, dev2 188.0, dev3 195.7] 18 shares
4x RX480 Nitros 8GB
Niice, Niice ( 840 with Claymore's and -i 2)

That is a good speed. It is gradually catching up with Claymore miner now.
legendary
Activity: 980
Merit: 1001
aka "whocares"
I appreciate what you are doing and look forward to switching my farm to your miner when it is a bit faster.  A moderate difference in hashrate is to costly with a bunch of miners running but I will accept a small loss in hashrate just to stop using the closed source stuff.
sr. member
Activity: 728
Merit: 304
Miner Developer
I will probably stick with the GCN assembly instead of AMD IL because I would rather not deal with another abstraction layer.
It looks like AMDIL is a dead-end anyway.
http://lists.llvm.org/pipermail/llvm-dev/2015-May/085684.html

HSAIL will probably short-lived since most of the work is now focused on the llvm amdgpu back-end.  It even supports inline asm, but I'm not sure if it will generate a kernel binary that conforms to AMD's CL2.0 ABI.   With clang/llvm-3.9, I've only got as far as getting it to output gcn assembler from the OpenCL + inline asm code.




Like Wolf said, CLRX is the way to go if you haven't looked into it. I used it in my previous project with a great success. I am trying to figure out how to enable GDS on Ellesmere, which turned out to be rather tricky. It seems that there is no way to enable GDS with the CL2.0 ABI and you have to resort back to CL1.2 ABI with the "-legacy" build option. This totally sucks as I need to redo optimizations all over again. I have no idea as to what engineers at AMD had in mind when they decided to make this design change.
sr. member
Activity: 652
Merit: 266
What is the current speed based per cards?
If someone is using gg miner, please provide some feedback.

EDIT:
Quote
Total 760.7 sol/s [dev0 193.8, dev1 188.2, dev2 190.5, dev3 190.8] 15 shares
Total 760.9 sol/s [dev0 194.9, dev1 188.1, dev2 190.2, dev3 190.7] 15 shares
Total 761.2 sol/s [dev0 196.8, dev1 188.4, dev2 189.6, dev3 190.8] 15 shares
Total 761.0 sol/s [dev0 195.8, dev1 188.8, dev2 188.7, dev3 190.9] 16 shares
Total 761.9 sol/s [dev0 196.1, dev1 189.2, dev2 189.2, dev3 191.5] 17 shares
Total 761.5 sol/s [dev0 196.6, dev1 189.1, dev2 189.1, dev3 193.1] 18 shares
Total 761.1 sol/s [dev0 194.7, dev1 193.1, dev2 189.2, dev3 194.8] 18 shares
Total 761.5 sol/s [dev0 194.2, dev1 193.1, dev2 188.0, dev3 195.7] 18 shares
4x RX480 Nitros 8GB
Niice, Niice ( 840 with Claymore's and -i 2)

Is that with the latest build, or did you compile it yourself?
It's under Ubuntu 16.04 with latest amdgpu-pro drivers.
newbie
Activity: 39
Merit: 0
What is the current speed based per cards?
If someone is using gg miner, please provide some feedback.

EDIT:
Quote
Total 760.7 sol/s [dev0 193.8, dev1 188.2, dev2 190.5, dev3 190.8] 15 shares
Total 760.9 sol/s [dev0 194.9, dev1 188.1, dev2 190.2, dev3 190.7] 15 shares
Total 761.2 sol/s [dev0 196.8, dev1 188.4, dev2 189.6, dev3 190.8] 15 shares
Total 761.0 sol/s [dev0 195.8, dev1 188.8, dev2 188.7, dev3 190.9] 16 shares
Total 761.9 sol/s [dev0 196.1, dev1 189.2, dev2 189.2, dev3 191.5] 17 shares
Total 761.5 sol/s [dev0 196.6, dev1 189.1, dev2 189.1, dev3 193.1] 18 shares
Total 761.1 sol/s [dev0 194.7, dev1 193.1, dev2 189.2, dev3 194.8] 18 shares
Total 761.5 sol/s [dev0 194.2, dev1 193.1, dev2 188.0, dev3 195.7] 18 shares
4x RX480 Nitros 8GB
Niice, Niice ( 840 with Claymore's and -i 2)

Is that with the latest build, or did you compile it yourself?
sr. member
Activity: 652
Merit: 266
What is the current speed based per cards?
If someone is using gg miner, please provide some feedback.

EDIT:
Quote
Total 760.7 sol/s [dev0 193.8, dev1 188.2, dev2 190.5, dev3 190.8] 15 shares
Total 760.9 sol/s [dev0 194.9, dev1 188.1, dev2 190.2, dev3 190.7] 15 shares
Total 761.2 sol/s [dev0 196.8, dev1 188.4, dev2 189.6, dev3 190.8] 15 shares
Total 761.0 sol/s [dev0 195.8, dev1 188.8, dev2 188.7, dev3 190.9] 16 shares
Total 761.9 sol/s [dev0 196.1, dev1 189.2, dev2 189.2, dev3 191.5] 17 shares
Total 761.5 sol/s [dev0 196.6, dev1 189.1, dev2 189.1, dev3 193.1] 18 shares
Total 761.1 sol/s [dev0 194.7, dev1 193.1, dev2 189.2, dev3 194.8] 18 shares
Total 761.5 sol/s [dev0 194.2, dev1 193.1, dev2 188.0, dev3 195.7] 18 shares
4x RX480 Nitros 8GB
Niice, Niice ( 840 with Claymore's and -i 2)
sr. member
Activity: 588
Merit: 251
I will probably stick with the GCN assembly instead of AMD IL because I would rather not deal with another abstraction layer.
It looks like AMDIL is a dead-end anyway.
http://lists.llvm.org/pipermail/llvm-dev/2015-May/085684.html

HSAIL will probably short-lived since most of the work is now focused on the llvm amdgpu back-end.  It even supports inline asm, but I'm not sure if it will generate a kernel binary that conforms to AMD's CL2.0 ABI.   With clang/llvm-3.9, I've only got as far as getting it to output gcn assembler from the OpenCL + inline asm code.


sr. member
Activity: 728
Merit: 304
Miner Developer
Zawawa I wish you a happy new year and all the best luck and health.
Thanks for your efforts.
Thank you! The efforts do not mean much without the results, though.
AMD drivers are so flaky that I am thinking about switching to the GCN assembly sooner than later.

Haha that took long you realized that :-D

Oh, I knew that from the get go. I just wanted to make sure I have the fastest OpenCL kernel before getting my hands dirty with the GCN assembly.
sr. member
Activity: 574
Merit: 250
Fighting mob law and inquisition in this forum
Zawawa I wish you a happy new year and all the best luck and health.
Thanks for your efforts.
Thank you! The efforts do not mean much without the results, though.
AMD drivers are so flaky that I am thinking about switching to the GCN assembly sooner than later.

Haha that took long you realized that :-D
sr. member
Activity: 728
Merit: 304
Miner Developer
Zawawa I wish you a happy new year and all the best luck and health.
Thanks for your efforts.
Thank you! The efforts do not mean much without the results, though.
AMD drivers are so flaky that I am thinking about switching to the GCN assembly sooner than later.
full member
Activity: 224
Merit: 100
CryptoLearner
So, after my wife and I watched New Year fireworks in San Francisco, I came back home at 1:30 a.m., squeezed my poor brain, and then realized that the size of the slot cache in LDS can be significantly reduced by recycling it. This should be a better way to increase occupancy than splitting rounds. We shall see.

Firework in your brain too ?  Cool  Grin

Yes, I would like to think I got brilliant sparks of ideas  Wink

Nice one  Grin Happy new year, and keep up the good work  Cool
sr. member
Activity: 574
Merit: 250
Fighting mob law and inquisition in this forum
Zawawa I wish you a happy new year and all the best luck and health.
Thanks for your efforts.
sr. member
Activity: 728
Merit: 304
Miner Developer
So, after my wife and I watched New Year fireworks in San Francisco, I came back home at 1:30 a.m., squeezed my poor brain, and then realized that the size of the slot cache in LDS can be significantly reduced by recycling it. This should be a better way to increase occupancy than splitting rounds. We shall see.

Firework in your brain too ?  Cool  Grin

Yes, I would like to think I got brilliant sparks of ideas  Wink
Jump to: