Pages:
Author

Topic: [ANN][GRS][DMD][DGB] Pallas optimized groestl opencl kernels - page 13. (Read 61229 times)

newbie
Activity: 32
Merit: 0
Yes, that is must be the same kernel that I've copied into the groestl directory next to the groestl_isa.hpas file.

When you compile the original kernel within then groestl_isa.hpas program, it will use the groestl_original.cl kernel. It's Pallas's kernel, except that I hardcoded the workgroup size in it, and did another very minor change.

Also I compared the kernel I downloaded from the very first post in this topic: It's the same.
hero member
Activity: 630
Merit: 500
"Very early results ..."

Very good, that it runs at you!

The speedup is not that impressive but let me ask yo to do a test:

Please when you stop sgminer, press run the groestl_isa.hpas, and copy/paste here my programs output, like this:

-----------------------------------------
Using new GCN ASM code
Kernel binary saved: C:\Work\Groestl\kernel_dump\kernel.elf

elapsed: 190.661 ms  13.749 MH/s   gain:   3.44x
elapsed: 188.444 ms  13.911 MH/s   gain:   3.48x
elapsed: 188.218 ms  13.928 MH/s   gain:   3.48x
elapsed: 188.225 ms  13.927 MH/s   gain:   3.48x

Functional test: RESULT IS OK
-----------------------------------------

And then go to around line 23 and comment out the "#define USE_NEW_ASM_KERNEL" and run it again! This will compile the original OpenCL kernel I've downloaded with sgminer5.1.

-----------------------------------------
Using original OpenCL code
Kernel binary saved: C:\Work\Groestl\kernel_dump\kernel.elf

elapsed: 657.623 ms  3.986 MH/s   gain:   1.00x
elapsed: 655.396 ms  4.000 MH/s   gain:   1.00x
elapsed: 654.897 ms  4.003 MH/s   gain:   1.00x
elapsed: 655.055 ms  4.002 MH/s   gain:   1.00x

Functional test: RESULT IS OK
-----------------------------------------

As you can see, on my small card the speedup is 3.5x. I'd like to check these results on your 280x as well.
I'm thinking that the problem is only because your big card don't get enough threads ore something similar.

Just a silly test: what if you turn Memory clock up to normal speed? Maybe it will change the L1 cache's behaviour? My kernel uses 0 memory, but uses L1 cache extensively.

And finally I had an 'accepted', phew...

"Had hell of a time reverting back to 14.7" -> Is there a tool called "Catalyst Clean Uninstall Utility" nowadays? 2-3 years ago that was useful when decrease Cat version.
No significant effect on raising mem-clock other than higher temps ...

Use "DDU" to clean catalyst drivers but not always 100% effective sometimes a little manual cleaning needed too ...

BTW I am using Pallas kernel as reference, not one supplied with stock sgminer ...

Any tweaks you can do with 2048 shaders (280x) and 1792 shaders (7950) ?
member
Activity: 81
Merit: 1002
It was only the wind.
The compiled bin file should work regardless of catalyst version or operating system, so could you please post a link to the bin file? Thanks.
There is something weird about the sgminer screenshot, are you sure it's working correctly? It shows a single, disabled GPU with id 0, and the share got accepted was from GPU id 1. The diff numbers are also kinda weird.

SG bug.
newbie
Activity: 32
Merit: 0
"Very early results ..."

Very good, that it runs at you!

The speedup is not that impressive but let me ask yo to do a test:

Please when you stop sgminer, press run the groestl_isa.hpas, and copy/paste here my programs output, like this:

-----------------------------------------
Using new GCN ASM code
Kernel binary saved: C:\Work\Groestl\kernel_dump\kernel.elf

elapsed: 190.661 ms  13.749 MH/s   gain:   3.44x
elapsed: 188.444 ms  13.911 MH/s   gain:   3.48x
elapsed: 188.218 ms  13.928 MH/s   gain:   3.48x
elapsed: 188.225 ms  13.927 MH/s   gain:   3.48x

Functional test: RESULT IS OK
-----------------------------------------

And then go to around line 23 and comment out the "#define USE_NEW_ASM_KERNEL" and run it again! This will compile the original OpenCL kernel I've downloaded with sgminer5.1.

-----------------------------------------
Using original OpenCL code
Kernel binary saved: C:\Work\Groestl\kernel_dump\kernel.elf

elapsed: 657.623 ms  3.986 MH/s   gain:   1.00x
elapsed: 655.396 ms  4.000 MH/s   gain:   1.00x
elapsed: 654.897 ms  4.003 MH/s   gain:   1.00x
elapsed: 655.055 ms  4.002 MH/s   gain:   1.00x

Functional test: RESULT IS OK
-----------------------------------------

As you can see, on my small card the speedup is 3.5x. I'd like to check these results on your 280x as well.
I'm thinking that the problem is only because your big card don't get enough threads ore something similar.

Just a silly test: what if you turn Memory clock up to normal speed? Maybe it will change the L1 cache's behaviour? My kernel uses 0 memory, but uses L1 cache extensively.

And finally I had an 'accepted', phew...

"Had hell of a time reverting back to 14.7" -> Is there a tool called "Catalyst Clean Uninstall Utility" nowadays? 2-3 years ago that was useful when decrease Cat version.
hero member
Activity: 630
Merit: 500
Temporarily upgraded to 14.9 to run hetpas, built for 280x.
Had hell of a time reverting back to 14.7 ... several tries later 14.7 working again and I have a kernel.elf for 280x.

Testing now ...

Very early results ...
280x I=22 E=1180 M=150 WS=256 ... 26 MHs Solo . No blocks yet ... approx 1.4x normal diamond kernel (18.5MHs)

Intensity 22 is sweet spot for my 280x, now playing with mem clock ...

No significant effect on raising mem-clock other than higher temps ...

stick with low mem clock.
newbie
Activity: 32
Merit: 0
I've updated HetPas and the groestl_isa.hpas too. Pls download HetPas150111_Groestl.zip.

From now it will start with a list of the cards:
writeln("List of opencl devices:");
for var i:=0 to cl.devices.count-1 do begin
  writeln("Device #",i);
  writeln(cl.devices[ i].dump);
end;

It should display something like this:
List of opencl devices:
Device #0
Target: Cayman  Series: 6  Core:880 MHz  CU:24  RAM:2048 MB  UID:4098
ext: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics ...
Device #1
Target: Capeverde  Series: 7  Core:880 MHz  CU:10  RAM:1024 MB  UID:4098
ext: cl_khr_fp64 cl_amd_fp64 ...

Using device:
Target: Capeverde  Series: 7  Core:880 MHz  CU:10  RAM:1024 MB  UID:4098
ext: cl_khr_fp64 cl_amd_fp64 ...
* core MHz value is not always accurate, use Catalyst Control Center (or ADL) instead!

For the GCN cards, the 'Series' must be at least 7. If it fails and it is indeed a GCN card, then I detected it badly, pls report then. My first card is a series 6xxx Northern Islands hardware, it can't used for this kernel.

@utahjohn: Maybe it works on 14.7 too. I can't tell that, but I know that it will crash on 13.4 because the kernel parameters are handled differently in that driver.
newbie
Activity: 32
Merit: 0
Thx for testing! So many errors :S But usually that's how it goes.

"No GCN device found" error.

That could be because I can't recognize new cards.
I know only these at the moment.
'TAHITI', 'PITCAIRN', 'CAPEVERDE', 'UNKNOWN5');
Importing new names right now.

Meanwhile you can select an OpenCL device by uncommenting this line in the code:
var dev:=cl.devices[0]; //access device by index (must be a GCN one)

The findDevices function can't recognize new cards. I'll repair it now.

@pallas: Thanks for fiddling with Win7! Cheesy What does it means by 32 bit code? That has no meaning regarding the GCN hardware o.O
But I'm 100% sure that you can't use my Capeverde binary unless you have that chip in the device you selected. ( var dev:=cl.devices[CLdeviceIndex]; )
   
hero member
Activity: 630
Merit: 500
Runtime error: No GCN device found

I have 2 AMD cards on gpu-platform 1
and 1 Intel GPU on gpu-platform 0

Edit: DOH 14.7RC3 not GCN ...
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
I'm running it for an hour now and I got a 'rejected'. I'm solo mining GRS. Do I need to worry? Or is it usual? Can it be caused by slow network?

yes it can be cause of the network: if the wallet is behind sync, the block may be rejected (or orphaned).
try with a pool...
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
Realhet, thanks for the capeverde bin, unfortunately I can't use it because it's 32 bit.
I created a bootable win7 stick in order to compile the kernel: it compiles fine but, when run, it says "no target Hawaii" and no bin is created.
newbie
Activity: 32
Merit: 0
Sorry for taking it a bit long.

Here's what all you have to know if you're willing to test: http://realhet.wordpress.com/gcn-asm-groestl-coin-kernel/

Please send me benchmarks and compiled kernels for various cards!

I'm running it for an hour now and I got a 'rejected'. I'm solo mining GRS. Do I need to worry? Or is it usual? Can it be caused by slow network?
hero member
Activity: 610
Merit: 500
I would also tested
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
The compiled bin file should work regardless of catalyst version or operating system, so could you please post a link to the bin file? Thanks.
There is something weird about the sgminer screenshot, are you sure it's working correctly? It shows a single, disabled GPU with id 0, and the share got accepted was from GPU id 1. The diff numbers are also kinda weird.
newbie
Activity: 32
Merit: 0
Do I need a better proof than this? Grin
http://x.pgy.hu/~worm/het/my_first_grs.png
I'm the proud owner of my first 19 GRS coins, haha. I guess I was super lucky to get an 'accepted' right after 10 minutes of mining.

The speed increase in sgminer is the same that I measured in my 'workbench': From 2MH/s it raised to 7MH/s. (Or if we calculate in GroestlHash/s then it is 4MH/s -> 14MH/s.)

If anyone willing to help me testing this, please tell me! You'll need a Windows with cat14.9 and you also have to brave enough to run my IDE (HetPas.exe) on that system.

I can't wait to see your reports that how fast it is on the big cards. Cheesy
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
Well, I found it better not to alter sgminer, that I'm totally unfamiliar with it and rather started to turn my kernel to be exactly the same as groestlcoin.cl from the outside. It will be a half page of additional code that deals with the kernel parameters. With a small dummy kernel it is already working now, but I'm just too tired to continue now. Cheesy

well that's easier to use for the people.
waiting forward to seeing your progress! :-)
newbie
Activity: 32
Merit: 0
Well, I found it better not to alter sgminer, that I'm totally unfamiliar with it and rather started to turn my kernel to be exactly the same as groestlcoin.cl from the outside. It will be a half page of additional code that deals with the kernel parameters. With a small dummy kernel it is already working now, but I'm just too tired to continue now. Cheesy
hero member
Activity: 630
Merit: 500
I see there is very little interest in mining groestl coins with GPU: very few users joined the recent discussion (2/3).
Let alone contributing to the code (2) or donating (2), in the whole life of this thread.
Well I still prefer GPU mining while block rwd 1.0 and will see what happens to diff when Rwd drops to 0.1 ... So count me in on new kernel, I donated a bit last time u did new kernel and will donate again for new super-super asm kernel Smiley
I expect diff will drop remarkably when Rwd drops and solo mining might still be attractive even aftre ...
I have 1 280x solo mining DMD (Pallas Diamond) approx 18.6 MHs (2-4 coins per day)
and 7950 solo mining FTC (neoscrypt) 278 KHs (would be sweet if these opt'z could be applied to Neoscrypt also ... wolf0 where are u?)

@realhet
Would be great if you could add a kernel setting parameter (perhaps realhet) that selects using your kernel and supply a windows x64 build of your sgminer ... I'd donate for that Smiley
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
I see there is very little interest in mining groestl coins with GPU: very few users joined the recent discussion (2/3).
Let alone contributing to the code (2) or donating (2), in the whole life of this thread.
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
I'm using your kernel: groestlcoin.cl.

Now I disassembled a dummy kernel with the appropriate parameters and I forgot about the T buffers. OpenCL uploads them in an extra buffer automatically. I don't even wanna know how the driver send that extra buffer and most importantly can't make an automatic skeleton kernel to get the binary with a placeholder for constant data that my program can patch with the output of the assembler.

So the easiest way would be to modify sgminer to handle my kernel. I have found the the 'queue_sph_kernel()' function where I can start from.

I never tested my kernel with cards smaller than tahiti, I also have no reports of it running on <= pitcairn: other groestlcoin kernels might be faster in that case.
newbie
Activity: 32
Merit: 0
I'm using your kernel: groestlcoin.cl.

Now I disassembled a dummy kernel with the appropriate parameters and I forgot about the T buffers. OpenCL uploads them in an extra buffer automatically. I don't even wanna know how the driver send that extra buffer and most importantly can't make an automatic skeleton kernel to get the binary with a placeholder for constant data that my program can patch with the output of the assembler.

So the easiest way would be to modify sgminer to handle my kernel. I have found the the 'queue_sph_kernel()' function where I can start from.
Pages:
Jump to: