Pages:
Author

Topic: [ANN][GRS][DMD][DGB] Pallas optimized groestl opencl kernels - page 4. (Read 61261 times)

legendary
Activity: 2716
Merit: 1094
Black Belt Developer
Myriad-groestl: I tried splitting the kernel into two parts, groestl and sha. I was almost sure it would be an improvement but it is a little slower instead (and it requires a custom miner). It could fix Tahiti slowness, though. I don't know yet because I didn't have such a card ready on the rig to test.
How did you get this expectation? I also tried that but for very different reason: giving small devices chance at a more efficient task switching. It wasn't worth it even in that case (surprisingly).

The two algos are very different and I was hoping for a lower registry occupation (especially for sha, which in addition doesn't use LDS) and, thus, more waves in flight.
It's not the case for hawaii but it might be for tahiti, I'll test it as soon as I have some free time and can access the rig.
hero member
Activity: 672
Merit: 500
Myriad-groestl: I tried splitting the kernel into two parts, groestl and sha. I was almost sure it would be an improvement but it is a little slower instead (and it requires a custom miner). It could fix Tahiti slowness, though. I don't know yet because I didn't have such a card ready on the rig to test.
How did you get this expectation? I also tried that but for very different reason: giving small devices chance at a more efficient task switching. It wasn't worth it even in that case (surprisingly).
hero member
Activity: 1498
Merit: 507
Not your Keys, not your Bitcoin
Hi There will be something for Cayman (6970) or (5870) Wolf0's binary ?
member
Activity: 81
Merit: 1002
It was only the wind.
That's about as far as my parse got before I went, "Is that a fucking NULL pointer dereference?"
Yes Smiley Indexed address is calculated in bitselect. LUT0 and LUT4 indexing is just single AND operation.
EDIT: Oh, wait UINT8 is byte, not int vector. I probably went too far redefining every type Smiley

Okay... I'm guessing that you've removed bits from the tables and are regenerating them on the fly, but I can't quite figure out how. Then again, bitwise ops aren't really my best subject...
Tables are constant, just prerotated left by 3 bit (size of one uint2 when used as index). Well, this stuff needs comments, if kernel will be published. Money are in X11 and Monero, not so much value in Whirlpool code, I could just drop it somewhere, but it will give everyone free boost in X11 Sad

Whirlpool's not even in X11 - might help a bit with Groestl, though.
hero member
Activity: 610
Merit: 500
Myriad-groestl: I tried splitting the kernel into two parts, groestl and sha. I was almost sure it would be an improvement but it is a little slower instead (and it requires a custom miner). It could fix Tahiti slowness, though. I don't know yet because I didn't have such a card ready on the rig to test.
I can provide access to test
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
Myriad-groestl: I tried splitting the kernel into two parts, groestl and sha. I was almost sure it would be an improvement but it is a little slower instead (and it requires a custom miner). It could fix Tahiti slowness, though. I don't know yet because I didn't have such a card ready on the rig to test.
member
Activity: 81
Merit: 1002
It was only the wind.
That's about as far as my parse got before I went, "Is that a fucking NULL pointer dereference?"
Yes Smiley Indexed address is calculated in bitselect. LUT0 and LUT4 indexing is just single AND operation.
EDIT: Oh, wait UINT8 is byte, not int vector. I probably went too far redefining every type Smiley

Okay... I'm guessing that you've removed bits from the tables and are regenerating them on the fly, but I can't quite figure out how. Then again, bitwise ops aren't really my best subject...
newbie
Activity: 36
Merit: 0
Hm, bin from Wolf0 works fine and faster a little bit on 7850 (8 MH/s) and even 5770 (3,6 MH/s). On kernel v1 compiled on 14.7 RC3 driver speeds were 7,2 and 3,2 at the same clocks. Nice work.
legendary
Activity: 1281
Merit: 1003
so i have r9 270  with driver 15.10

...snip...

Instead of using binaries made for other chips, why not simply compiling your own for pitcairn? just overwrite diamond.cl, remove the bin files and run.
Let me know how it goes :-)

it s what i did first   not using the bin file  but speed was 4.6mh

but i think maybe because there was the old bin file in C:\Users\carlo   (sgminer use that one  sometimes) strange but not always

when i see it don't create a bin file in sgminer  directory  than i lnow it s using the one is  C:\Users\carlo

i gonna tri again  remove them both

so i delete, start sgmine  it create a new but  now working 4.6mhz only

so the bin make the difference   not the kernel

member
Activity: 81
Merit: 1002
It was only the wind.
I was wondering if us (miner developers) should unite to take the best out of it.
Cartel will take all the fun out of game and possibly destroy PoW world. On the other hand, PoS landscape could benefit from some polishing Smiley

Haha, too true. Also, just going through this for myself, here:
Code:
static const __constant ulong arrPrecalc_post_l27[256] = ...
#define baseL27 ((uint)&arrPrecalc_post_l27[0])
#define TC0off8_l27(off8) (*(const __constant ulong *)&(((const __constant uint8 *)0)[off8]))
#define LUT3_r3(v) as_ulong(TC0off8_l27(bitselect(baseL27, (uint)(as_ulong(v) >> 24), 0x7F8U))

That's about as far as my parse got before I went, "Is that a fucking NULL pointer dereference?"
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
so i have r9 270  with driver 15.10

...snip...

Instead of using binaries made for other chips, why not simply compiling your own for pitcairn? just overwrite diamond.cl, remove the bin files and run.
Let me know how it goes :-)
legendary
Activity: 1281
Merit: 1003
so i have r9 270  with driver 15.10

sgminer_diamond_v4.1.0

my batch

setx GPU_MAX_ALLOC_PERCENT 100
setx GPU_USE_SYNC_OBJECTS 1
"E:\myriadcoin\cgminer skein\sgminer_diamond_v4.1.0\sgminer.exe" -k diamond -o stratum+tcp://eu.miningfield.com:3377 -u carlo0000.r9a -p 0 --difficulty-multiplier 0.0039062500 -w 256 -I 22 -T

so i can only use the kernel
the bin are for 290 and 280, i try to use and rename but sgminer crash

my bin file name is  diamondPitcairnglg2tc10688w256l4.bin


i try again it s working now with wolf-groestlcoinTahitigw256l4
the other one crash

i notice i had diamondPitcairnglg2tc10688w256l4.bin in my user folder C:\Users\carlo  , i delete

i have 8.7 mh  Grin @1025mhz

thanks for help

so i run it on my other computer, that one does 2x  9.3 mh @1040mhz  with driver 15.7 with display is at 800*600 with no screen,

so u put this computer to 1040mhz too but i does only 8.8 , but i have lot of stuff running on it a the display it at 1080p
so i put back to 1025 and gonna mine mry skein again on this one, it s less intensive, screen is really slow with mining diamond @ I22
but on skein it s more faster with I8 (max) 140mh

even with double hashrate now on diamond, i still have more incomes with myr with skein on my r9 270
but i don't know how much it s gona make with POS on diamond  so maybe not a big difference in long term

or not i just made new calculation MYR is still droping a lot last days
difficulty is higher and price get down a lot   , last week i was at 120000 satochi day,

today it s only 78000 sat  Huh  it s less than mining diamond
so i m going on diamond for now
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
the diamond.cl  is missing on the download
i only see groestlcoin-v1.cl

or must we just rename to diamond ?

so i rename to diamond.cl
but no change in my speed i have 4.7 mh on r9 270  sgminer 4.1.0

i guest groestlcoin-v1.cl is not for diamond, i ve got a lot rejected shares

groestlcoin and diamond use the same block hashing algo so the same opencl kernel applies.
but you must configure the miner to mine for the specific coin because there are differences!
that's why there are two kernels even thought the two kernel files are the same.

please posto your conf file and commandline so I can help you debug it.
legendary
Activity: 1281
Merit: 1003
the diamond.cl  is missing on the download
i only see groestlcoin-v1.cl

or must we just rename to diamond ?

so i rename to diamond.cl
but no change in my speed i have 4.7 mh on r9 270  sgminer 4.1.0

i guest groestlcoin-v1.cl is not for diamond, i ve got a lot rejected shares
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
Nothing new in the groestl+groestl area, but I've worked a bit on the groestl+sha variant (myr-groestl for myriad, digibyte, saffron, etc.).
Tahiti is a mess, but I could easily push hawaii over 60 Mh/s, keeping the kernel compatible with the old miners.

I could finally get rid of scratch registers on Tahiti: now the 280x is doing 35 Mh/s with moderate overclock :-)
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
Nothing new in the groestl+groestl area, but I've worked a bit on the groestl+sha variant (myr-groestl for myriad, digibyte, saffron, etc.).
Tahiti is a mess, but I could easily push hawaii over 60 Mh/s, keeping the kernel compatible with the old miners.
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
Wolf0 created a faster Tahiti binary and posted about it in the groestlcoin thread:

I have a faster Tahiti binary than Pallas' for Groestlcoin - works on DMD, too. The usage is the same as his binary; I should have more info later.

Get it here: https://ottrbutt.com/miner/wolf-groestlcoinTahitigw256l4.bin

it is indeed faster and works flawlessly.
usage: just rename it over the old one and make sure you set worksize 256 for that card; you can get a bit more hashrate by using 2 or 4 threads.
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
I'm interested in knowing the hashrate of R9 285 and R9 Fury X cards, anybody?
newbie
Activity: 32
Merit: 0
It doesn't seems like they are implementing gcn specific goodies on the current compiler stack. It's kinda bloated, and AMD_IL awaits for it's replacement since 7970 came out. I'm sure in the upcoming HSA language there will be much more GCN things implemented (except the separated V and S programming).
Pages:
Jump to: