Pages:
Author

Topic: [ANN][GRS][DMD][DGB] Pallas optimized groestl opencl kernels - page 12. (Read 61229 times)

legendary
Activity: 2716
Merit: 1094
Black Belt Developer
I built my bins with Wolf0's x64 miner.  Works perfectly.

could you share your bin files please?

Personally, I wouldn't download this.  I'd generate my own.  But here it is.  Use at your own risk!
http://ge.tt/2uga0R82/v/0?c

Thanks, but it's 32 bit, I need 64 bit.
hero member
Activity: 935
Merit: 1001
I don't always drink...
I built my bins with Wolf0's x64 miner.  Works perfectly.

could you share your bin files please?

Personally, I wouldn't download this.  I'd generate my own.  But here it is.  Use at your own risk!
http://ge.tt/2uga0R82/v/0?c
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
I built my bins with Wolf0's x64 miner.  Works perfectly.

could you share your bin files please?
hero member
Activity: 935
Merit: 1001
I don't always drink...
I built my bins with Wolf0's x64 miner.  Works perfectly.
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
@pallas: Thanks for fiddling with Win7! Cheesy What does it means by 32 bit code? That has no meaning regarding the GCN hardware o.O
But I'm 100% sure that you can't use my Capeverde binary unless you have that chip in the device you selected. ( var dev:=cl.devices[CLdeviceIndex]; )

Bins generated by sgminer on a 32 bit system will not work on a 64 bit one and viceversa, so I suppose the same is true for your kernels.
Min end in l4.bin ... am I 32 or 64 ... (win 7 x64)

4 * 8 (bits) = 32

it's the size of a long integer.
probably the sgminer build you are using is 32 bit.
question is does hetpas use 32 or 64 bit ... I'd assume 32 bit since it runs ok on my sgminer ...
my sgminer is old 4.1.0 ...

so you main prob is needing hetpas src to run on linux ...

Probably realhet coded it for 32 bit; I don't know what changes, maybe the parameter passing part.
I hope realhet has time to look into this.
I also use version 4.1.
Hetpas can't run on linux: I'll try again with the new version when I can access my workstation and make it boot on windows.
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
@pallas: Thanks for fiddling with Win7! Cheesy What does it means by 32 bit code? That has no meaning regarding the GCN hardware o.O
But I'm 100% sure that you can't use my Capeverde binary unless you have that chip in the device you selected. ( var dev:=cl.devices[CLdeviceIndex]; )

Bins generated by sgminer on a 32 bit system will not work on a 64 bit one and viceversa, so I suppose the same is true for your kernels.
Min end in l4.bin ... am I 32 or 64 ... (win 7 x64)

4 * 8 (bits) = 32

it's the size of a long integer.
probably the sgminer build you are using is 32 bit.
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
@pallas: Thanks for fiddling with Win7! Cheesy What does it means by 32 bit code? That has no meaning regarding the GCN hardware o.O
But I'm 100% sure that you can't use my Capeverde binary unless you have that chip in the device you selected. ( var dev:=cl.devices[CLdeviceIndex]; )

Bins generated by sgminer on a 32 bit system will not work on a 64 bit one and viceversa, so I suppose the same is true for your kernels.

infact:

[10:25:27] Internal error: Input OpenCL binary is not for the target!
sp_
legendary
Activity: 2926
Merit: 1087
Team Black developer
@pallas: Thanks for fiddling with Win7! Cheesy What does it means by 32 bit code? That has no meaning regarding the GCN hardware o.O
But I'm 100% sure that you can't use my Capeverde binary unless you have that chip in the device you selected. ( var dev:=cl.devices[CLdeviceIndex]; )
Bins generated by sgminer on a 32 bit system will not work on a 64 bit one and viceversa, so I suppose the same is true for your kernels.

On linux yes, but on windows they work. You need to run the x86 build of sgminer.
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
@pallas: Thanks for fiddling with Win7! Cheesy What does it means by 32 bit code? That has no meaning regarding the GCN hardware o.O
But I'm 100% sure that you can't use my Capeverde binary unless you have that chip in the device you selected. ( var dev:=cl.devices[CLdeviceIndex]; )

Bins generated by sgminer on a 32 bit system will not work on a 64 bit one and viceversa, so I suppose the same is true for your kernels.
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
"when Pallas says that R9 280x is 18MH/s he counts it in Groestl hashes."

no my hashrates are taken from sgminer.
hero member
Activity: 630
Merit: 500
14.9 has a piss poor OCL compiler, we've known this for a long time ... Stick with 14.7RC3 for best overall performance over many different algo's.

I guess we are stuck with compiling realhet asm on 14.9 but 14.7 does better compiles for OCL.

I am running realhet asm kernel generated with 14.9 on 14.7 catalyst, just a pain in the ass reverting to 14.7 after using 14.9.

My Pallas OCL compile was done with 14.7RC3 and works better than OCL compiled on 14.9.
Pallas ocl compiled with 14.7RC3 will run normal on 14.9, just don't re-compile it with 14.9 ...

Confused yet? hehe

@Realhet
So the gain of Realhet = 1.40x Pallas stands when comparing to properly working Pallas OCL kernel on 14.7
(Same clocks and Intensity running under 14.7 so a fair compare).
Your Pallas reference speed is incorrect in hetpas because 14.9 mangled the OCL badly performance wise.
Take a look at performance hit 14.7 vs 14.9 in Star65 post above.
Unfortunately some of the "gains" you made may have been just repairing 14.9 OCL bugs LOL but obviously improvement was made somewhere in asm kernel.
You need to establish a baseline for your GPU using 14.7 Pallas OCL and see what really made improvements ...
I suggest start over and use this first round a learning experience Smiley  You started with code broken by 14.9 compiler as a base ...

Pallas 14.7 OCL Bin for 280x 18.5 MHs
https://mega.co.nz/#!kAEnDATC!HeelwXTHDsQNx8WJhTDcwqS-slOmikoBiMqTEK9-DV0
Realhet 14.9 ASM bin for 280x 26.0 MHs
https://mega.co.nz/#!1NlRhYLC!7oLFfr2umL7T2Lc0fX3HY1ddthbpNqt6I_tYdG9OI9g

Another random thought Smiley Can you set hetpas up to "cross-compile" for diff GCN architectures so all we have to do is DL bin files from u to test them?  I really dislike uninst-inst-uninst-inst to try a new asm version on 14.7 ... For example have it compile Tahiti.elf, hawaii.elf etc.  I understand u can only test for your card but with us out here to test other elf would speed process of testing new versions ...

DMD Donations : dJrhv4Pp1FXPrQiEp5njx42QrZiuZrbjQ1

Block found and accepted  solo mining so your asm kernel appears to be valid Smiley

I'd like you to have a look see what you can do to further improve wolf0's neoscrypt kernel with asm when you get time.
7950 currently doing 278KHs mining FTC.  PM me for OCL and BIN.
member
Activity: 109
Merit: 13
TVM Pallas and realhet for nice work!

7970/280x 1130/300 W7

Pallas kernel in Cat 14.6  - 17.8MH/s
Pallas kernel in Cat 14.9  - 7.8MH/s   - so 14.9 very bad drivers?!
Realhet kernel in Cat 14.9 - 24.8MH/s - 24.8/7.8=3.18x !!!

We need realhet kernel (bin) with Cat 14.6 or 14.7 (best drivers perhaps). But I do not know how to do it.

newbie
Activity: 32
Merit: 0
When I run Pallas OCL I see 18.5MHs in sgminer.
When I run Realhet asm I see 26.0MHs in sgminer.

Please send me that .cl file and the binary that is compiled by the sgminer, I gotta check it.

For today, Thank You for testing, I gotta sleep now, see you!
hero member
Activity: 630
Merit: 500
When I run Pallas OCL I see 18.5MHs in sgminer.
When I run Realhet asm I see 26.0MHs in sgminer.
newbie
Activity: 32
Merit: 0
There must be some missunderstandings based on MHs values. So we have to be careful!

On this topic (first post) when Pallas says that R9 280x is 18MH/s he counts it in Groestl hashes.

When my program says "elapsed: 50.686 ms  51.719 MH/s" it counts it also in Groestl hashes. Just as Pallas.

But when you see MH/s inside sgminer then it must be multiplied by 2 because in SG 1 MH/s = 2 MGroestlH/s.

--------------------------
So when you see "51.719 MH/s" is my program
then you must see 26MH/s in SG.

And when you see 18MH/s on the first post on this topic
You must see 9MH/s in SG.

Also when I see 4MH/s in my program
Then I saw 2MH/s in SG.
---------------------------

So the equation is: 2*sgminer Mh/s = Pallas's Mh/s

This is because sgminer counts 2 Groesth hash calculations as 1. But Pallas count it as 2 hashes, and I just copied Pallas, then later found out how sgminer calculates.

---------------------------
So the Tahiti 26MH/s in sgminer is correct. Please remove the kernel and let sgminer compile it form opencl! If I'm calculating well, then you must see 7-8MH/s with the original kernel. Can you check it please?


member
Activity: 109
Merit: 13
I would also tested on 7970 & 280x.
hero member
Activity: 630
Merit: 500
The last test run I did grabbed 2 cards so divide in half for an average on Tahiti (280x+7950).

Not the gains I was expecting base on you blog ... 3.4x times 18.5 MHs should net me around 62 Mhz vs the 26MHs I'm getting now ... so Tahiti not so great gains but better  Smiley

Short of pulling a card physically I don't know how to disable hetpas running all of them ...
newbie
Activity: 32
Merit: 0
Thanks!

Well this is kinda bad for a Tahiti :/

Also the times of the 4 kernel launches are weird:
On my card it is 3.44x, 3.48x, 3.48x, 3.48x
But on your card this is 3.88x, 3.10x, 3.10x, 3.10x

On my card the first launch is a bit slow because the card was at low MHz when the test started and after the warmip it became steady 3.48x.

On your card the speeds are so random. Your card (at 1150) is 3.68x faster than mine, so everything is ok, you should have see 12.8x gains.

Maybe it is a 14.7 issue, I don't know. Everything can change from driver to driver...

What is on my mind is:

1. What if you change workcount form the original
    WorkCount := 256*10*512
to WorkCount := 256*10*512*10;  ?
Does elapsed times became are 10x longer?  (Functional test will fail, ot's ok, just reset WorkCount to default value after this test)

2. Let's see how the original kernel works in HetPas:
  just comment out the  "#define USE_NEW_ASM_KERNEL" and let me see the times please. If the original kernel works well, then gain must be 3.68.


(Thank you for testing so far)

--------------------------------------------------------------------
"elapsed: 50.686 ms  51.719 MH/s   gain:  12.93x"
WOW! THIS IS IT! Cheesy:D:D
Exactly what I've expected! Your card is 3.71x faster. What was the error? You accidentally mined while testing, right?
newbie
Activity: 32
Merit: 0
I tested my kernel only in Cat 14.9
I have no info on how it works on 14.7

When you compile in HetPas it will generate a skeleton kernel binary with the help of the OpenCL compiler. And then the new assembly code will be PATCHED into that. So I don't make the binary from scratch and maybe the 14.7 binary is a bit different than the 14.9 binary and I just don't know about that. (Although life would be so much easier if AMD would be so kind and give us an interface to upload binary program code... But that's not going to happen Cheesy)


"Any tweaks you can do with..."

Please let's do the test inside the IDE first. Let's compare the original and the new kernel there, as it is perfect for timing. In sgminer we need to play with Intensity and other factors and wait for minutes to get a correct time anyways.

So please paste here what you see on HetPas on the right pane after you run the program:
I'm interested in this information, and also tell me what card and engine MHz you used:

Using new GCN ASM code
Kernel binary saved: C:\Work\Groestl\kernel_dump\kernel.elf

elapsed: 190.645 ms  13.750 MH/s   gain:   3.44x
elapsed: 188.281 ms  13.923 MH/s   gain:   3.48x
elapsed: 188.233 ms  13.927 MH/s   gain:   3.48x
elapsed: 188.316 ms  13.920 MH/s   gain:   3.48x

Functional test: RESULT IS OK

hero member
Activity: 630
Merit: 500
I did not try running kernel under catalyst 14.9, all I wanted was to generate the kernel.elf to run under 14.7 ... because I run multiple algos concurrently under 14.7 that suffer under 14.9 ...

Also note that I am running sgminer 4.1.0
Pages:
Jump to: