Pages:
Author

Topic: [ANN] [SKC] Skeincoin 0.9.3.1 | Skein-SHA2 - page 56. (Read 161541 times)

full member
Activity: 227
Merit: 260
Casting uint to uint16 compiles on your system? Guess something is awfully wrong with it, then, it is against OpenCL spec (and common sense Smiley ). Anyway, it would be great if you manage to optimize sha256, I have only quickly thrown together something that worked for me and feel somewhat embarrassed now that it is public.

Yeah, compiles, even tried fresh checkout:D
Code:
    if(sha256_res(sha256_res(as_uint16(state))) & 0xf0ffffff)
        return;
Agree, that's really weird casting Smiley And no worries about the code, you kernel works, and thank you very much for developing and publishing it!!!

I've already moved W[62] and all it usages to kernel's search method, declared it as local, but haven't got any significant speedup. Tried to rid off all vectors, but made even much worse Smiley Now will try to use original poclbm sha256D search with reduced rounds (without 2nd call).

Is skeincoin's SHA256 hash function the same as the one used in normal SHA256 coins? Or is it a variation?

It's the same, but not double sha256 as in Bitcoin and other coins, so we can't just reuse well optimized bitcoin's kernel.
newbie
Activity: 19
Merit: 0
Is skeincoin's SHA256 hash function the same as the one used in normal SHA256 coins? Or is it a variation?
sr. member
Activity: 462
Merit: 250
Yes, that W[] array is moved (by compiler) to registers on GCN, but apparently on VLIW it is not and uses global memory, which is slow. This can be improved of course (and first of all it does not have to be 62 elements long, 16 elements is enough if you reuse them). Just wonder how have you managed to compile sha256_res(sha256_res()): it takes uint16 vector as parameter, but returns only one uint.

I've tried both
Code:
(sha256_res((uint16)sha256_res(as_uint16(skein512_mid_impl(state, msg)))) & 0xf0ffffff)
and
Code:
(sha256_res(sha256_res(as_uint16(skein512_mid_impl(state, msg)))) & 0xf0ffffff)

And it compiles, probably only getting wrong results. But it still enough for test, as sha256_res runs twice, maybe only with wrong input on second run Smiley

Besides, double Skein runs and 780MH/s on 5870, so SHA256 is current bottleneck for sure. With good sha implementation we will be able to reach even better performance, than SHA256D Cheesy


Casting uint to uint16 compiles on your system? Guess something is awfully wrong with it, then, it is against OpenCL spec (and common sense Smiley ). Anyway, it would be great if you manage to optimize sha256, I have only quickly thrown together something that worked for me and feel somewhat embarrassed now that it is public.
full member
Activity: 227
Merit: 260
Yes, that W[] array is moved (by compiler) to registers on GCN, but apparently on VLIW it is not and uses global memory, which is slow. This can be improved of course (and first of all it does not have to be 62 elements long, 16 elements is enough if you reuse them). Just wonder how have you managed to compile sha256_res(sha256_res()): it takes uint16 vector as parameter, but returns only one uint.

I've tried both
Code:
(sha256_res((uint16)sha256_res(as_uint16(skein512_mid_impl(state, msg)))) & 0xf0ffffff)
and
Code:
(sha256_res(sha256_res(as_uint16(skein512_mid_impl(state, msg)))) & 0xf0ffffff)

And it compiles, probably only getting wrong results. But it still enough for test, as sha256_res runs twice, maybe only with wrong input on second run Smiley

Besides, double Skein runs and 780MH/s on 5870, so SHA256 is current bottleneck for sure. With good sha implementation we will be able to reach even better performance, than SHA256D Cheesy

sr. member
Activity: 462
Merit: 250
I also noticed skein doesnt need high memory frequency so you can decrease it almost all the way down without any performance hit. Skein miner is probably not optimized yet and thats why it needs less power i guess.
The kernel does not use VRAM at all (save for tiny bit to pass back results to miner), so yes, you can downclock it to minimum. It is optimized, though, but for GCN. Chances are that if you replace rolhack functions with rotates, arrays with variables and unroll skein rounds, it will perform better on VLIW architectures.

Current sha256_res imlementation form skein.cl performs too slow. I've tried to test double SHA256 hashing on current kernel, and replaced skein call with one more sha256_res call:

Code:
    if(sha256_res((sha256_res(as_uint16(state)))) & 0xf0ffffff)
        return;
    output[OUTPUT_SIZE] = output[nonce & OUTPUT_MASK] = nonce;

And I've got 125MH/s on single 5870. So probably it's the bottleneck of current Skein-SHA256 opencl implementation.

I have never worked with opencl before, so I can miss something (or even everything Smiley)
Yes, that W[] array is moved (by compiler) to registers on GCN, but apparently on VLIW it is not and uses global memory, which is slow. This can be improved of course (and first of all it does not have to be 62 elements long, 16 elements is enough if you reuse them). Just wonder how have you managed to compile sha256_res(sha256_res()): it takes uint16 vector as parameter, but returns only one uint.
full member
Activity: 138
Merit: 100
Thanks for the numbers.
Looks like all your cards are also mining about 3x slower than SHA256 coins running on cgminer.

Not sure what is needed to get that 3x speedboost, cgminer has a lot of functions/tricks to be that fast...
This isn't a pure sha256 coin, comparing it to sha256 cgminer performance is moot, esp in the face of missing optimizations between the two, i'm not the first too point this out. However r9 280x overclocked to 1100mhz 207MH/s

Lets compare Skeincoin with Blakecoin instead, running on AMD 7970/R9 280x at 1170MHz core and 300MHz mem:

Skein (poclbm-skc): 225MH/s
Blake (cgminer): 2800MH/s
SHA256 (cgminer): 700MH/s
Scrypt (cgminer): 700 KH/s (1050/1500MHz)

Moot or not, benchmarking is fun.
full member
Activity: 227
Merit: 260
I also noticed skein doesnt need high memory frequency so you can decrease it almost all the way down without any performance hit. Skein miner is probably not optimized yet and thats why it needs less power i guess.
The kernel does not use VRAM at all (save for tiny bit to pass back results to miner), so yes, you can downclock it to minimum. It is optimized, though, but for GCN. Chances are that if you replace rolhack functions with rotates, arrays with variables and unroll skein rounds, it will perform better on VLIW architectures.

Current sha256_res imlementation form skein.cl performs too slow. I've tried to test double SHA256 hashing on current kernel, and replaced skein call with one more sha256_res call:

Code:
    if(sha256_res((sha256_res(as_uint16(state)))) & 0xf0ffffff)
        return;
    output[OUTPUT_SIZE] = output[nonce & OUTPUT_MASK] = nonce;

And I've got 125MH/s on single 5870. So probably it's the bottleneck of current Skein-SHA256 opencl implementation.

I have never worked with opencl before, so I can miss something (or even everything Smiley)
full member
Activity: 138
Merit: 100
Yes I can do that on my own... was not what I requested though.

Still looking... what modifications can be made to the existing scripts so that this can be forced to call amdoclskc.dll?

1K SKC still up for grabs.

Reorder had a better suggestion:
"I suggest that somebody steps in and replaces my quickhack dll with pure python implementation from http://pythonhosted.org/pyskein/"

Maybe better to offer the 1k bounty for a proper miner upgrade instead of another dll hack.
hero member
Activity: 630
Merit: 500
No dice on the above... not a valid win32 application.

Dug into the Windows error report and it seems to be something with the amdocl.dll

EDIT: Found out why.

For Win 7 64 you do not use the system32 folder... you need to rename amdocl.dll in the SYSWOW64 folder and then add the one linked here.

Working now... time to tweak!

EDIT2: Of course... this has now broken my cgminer for scrypt mining unless I swap the files back... NICE!
yeah figured as much, march=nocona is core generation intel, I didn't have a dll issue at all and can switch between the two easly, must be a driver/sdk issue
using driver 13-12_win7_win8_64_dd_ccc_whql, AMD-APP-SDK-v2.9-Windows-641 and cgminer 3.7.1 on r9 280x, win7 64 basic.


Well I am all set now... 193MH/s on my 7950 and got it running on the APU Devastator core on my 6800K... another 30MH/s.

Question if someone knows... what edits can be made to make this run and look for a DIFFERENT named amdocl.dll? I would like to make it call something like amdoclskc.dll and rename the custom amdocl64.dll made for this so I can put my reg one back and not have to file swap just to go back to scrypt mining.

1K SKC to the first functioning answer.

The fastest would be to create 2 bat files that does the renameing for you.
First one for SKC, to rename the original amdocl.dll to amdocl.new and the "new" amdocl64.dll to amdocl.dll.
Second one to reset the names above.

Good luck!

Tip SKC address: SkHZ8rTEehrmjFB6VffWh8K7eKgdppG2bq

Yes I can do that on my own... was not what I requested though.

Still looking... what modifications can be made to the existing scripts so that this can be forced to call amdoclskc.dll?

1K SKC still up for grabs.
member
Activity: 158
Merit: 39
WTS 10k, pm with offer
member
Activity: 104
Merit: 10
My 5870 gets around 110 mhash.
sr. member
Activity: 280
Merit: 250
Are you sure its optimized? I cant compare the gpu performance but when it comes to cpu, poclbm-skc hashes around 50% slower than skeincoin-cpuminer on the same cpu. I dont have much time right now, but I can compare it tomorrow.

Maybe we can create some kind of a benchmark to see the performance of particular gpus?
In my case its:
7950 - 180MH/s
r9 280x 200MH/s
Overclocking can get you another 5-10MH/s

5870 at stock core (850MHz) - 95MH/s
7870 (Tahiti) at 880MHz - 94MH/s
6950 at 900MHz - 103MH/s
6930 at 860MHz - 93MHz

Basically all cards giving around 90-105 MHz on my setup, overclocking doesn't change it a lot. On scrypt 5870,7870,6930 have performance close to 400KHz, and 6950 runs at about 450KHz, so current Skein-SHA2 implementation has 230-280 times higher hashrate than scrypt.

Thanks for the numbers.
Looks like all your cards are also mining about 3x slower than SHA256 coins running on cgminer.

Not sure what is needed to get that 3x speedboost, cgminer has a lot of functions/tricks to be that fast...
This isn't a pure sha256 coin, comparing it to sha256 cgminer performance is moot, esp in the face of missing optimizations between the two, i'm not the first too point this out. However r9 280x overclocked to 1100mhz 207MH/s
full member
Activity: 138
Merit: 100
Are you sure its optimized? I cant compare the gpu performance but when it comes to cpu, poclbm-skc hashes around 50% slower than skeincoin-cpuminer on the same cpu. I dont have much time right now, but I can compare it tomorrow.

Maybe we can create some kind of a benchmark to see the performance of particular gpus?
In my case its:
7950 - 180MH/s
r9 280x 200MH/s
Overclocking can get you another 5-10MH/s

5870 at stock core (850MHz) - 95MH/s
7870 (Tahiti) at 880MHz - 94MH/s
6950 at 900MHz - 103MH/s
6930 at 860MHz - 93MHz

Basically all cards giving around 90-105 MHz on my setup, overclocking doesn't change it a lot. On scrypt 5870,7870,6930 have performance close to 400KHz, and 6950 runs at about 450KHz, so current Skein-SHA2 implementation has 230-280 times higher hashrate than scrypt.

Thanks for the numbers.
Looks like all your cards are also mining about 3x slower than SHA256 coins running on cgminer.

Not sure what is needed to get that 3x speedboost, cgminer has a lot of functions/tricks to be that fast...
full member
Activity: 227
Merit: 260
Are you sure its optimized? I cant compare the gpu performance but when it comes to cpu, poclbm-skc hashes around 50% slower than skeincoin-cpuminer on the same cpu. I dont have much time right now, but I can compare it tomorrow.

Maybe we can create some kind of a benchmark to see the performance of particular gpus?
In my case its:
7950 - 180MH/s
r9 280x 200MH/s
Overclocking can get you another 5-10MH/s

5870 at stock core (850MHz) - 95MH/s
7870 (Tahiti) at 880MHz - 94MH/s
6950 at 900MHz - 103MH/s
6930 at 860MHz - 93MHz

Basically all cards giving around 90-105 MHz on my setup, overclocking doesn't change it a lot. On scrypt 5870,7870,6930 have performance close to 400KHz, and 6950 runs at about 450KHz, so current Skein-SHA2 implementation has 230-280 times higher hashrate than scrypt.
full member
Activity: 138
Merit: 100
No dice on the above... not a valid win32 application.

Dug into the Windows error report and it seems to be something with the amdocl.dll

EDIT: Found out why.

For Win 7 64 you do not use the system32 folder... you need to rename amdocl.dll in the SYSWOW64 folder and then add the one linked here.

Working now... time to tweak!

EDIT2: Of course... this has now broken my cgminer for scrypt mining unless I swap the files back... NICE!
yeah figured as much, march=nocona is core generation intel, I didn't have a dll issue at all and can switch between the two easly, must be a driver/sdk issue
using driver 13-12_win7_win8_64_dd_ccc_whql, AMD-APP-SDK-v2.9-Windows-641 and cgminer 3.7.1 on r9 280x, win7 64 basic.


Well I am all set now... 193MH/s on my 7950 and got it running on the APU Devastator core on my 6800K... another 30MH/s.

Question if someone knows... what edits can be made to make this run and look for a DIFFERENT named amdocl.dll? I would like to make it call something like amdoclskc.dll and rename the custom amdocl64.dll made for this so I can put my reg one back and not have to file swap just to go back to scrypt mining.

1K SKC to the first functioning answer.

The fastest would be to create 2 bat files that does the renameing for you.
First one for SKC, to rename the original amdocl.dll to amdocl.new and the "new" amdocl64.dll to amdocl.dll.
Second one to reset the names above.

Good luck!

Tip SKC address: SkHZ8rTEehrmjFB6VffWh8K7eKgdppG2bq
hero member
Activity: 630
Merit: 500
No dice on the above... not a valid win32 application.

Dug into the Windows error report and it seems to be something with the amdocl.dll

EDIT: Found out why.

For Win 7 64 you do not use the system32 folder... you need to rename amdocl.dll in the SYSWOW64 folder and then add the one linked here.

Working now... time to tweak!

EDIT2: Of course... this has now broken my cgminer for scrypt mining unless I swap the files back... NICE!
yeah figured as much, march=nocona is core generation intel, I didn't have a dll issue at all and can switch between the two easly, must be a driver/sdk issue
using driver 13-12_win7_win8_64_dd_ccc_whql, AMD-APP-SDK-v2.9-Windows-641 and cgminer 3.7.1 on r9 280x, win7 64 basic.


Well I am all set now... 193MH/s on my 7950 and got it running on the APU Devastator core on my 6800K... another 30MH/s.

Question if someone knows... what edits can be made to make this run and look for a DIFFERENT named amdocl.dll? I would like to make it call something like amdoclskc.dll and rename the custom amdocl64.dll made for this so I can put my reg one back and not have to file swap just to go back to scrypt mining.

1K SKC to the first functioning answer.
newbie
Activity: 19
Merit: 0
WTS 10k SKC. PM with offers.
sr. member
Activity: 280
Merit: 250
No dice on the above... not a valid win32 application.

Dug into the Windows error report and it seems to be something with the amdocl.dll

EDIT: Found out why.

For Win 7 64 you do not use the system32 folder... you need to rename amdocl.dll in the SYSWOW64 folder and then add the one linked here.

Working now... time to tweak!

EDIT2: Of course... this has now broken my cgminer for scrypt mining unless I swap the files back... NICE!
yeah figured as much, march=nocona is core generation intel, I didn't have a dll issue at all and can switch between the two easly, must be a driver/sdk issue
using driver 13-12_win7_win8_64_dd_ccc_whql, AMD-APP-SDK-v2.9-Windows-641 and cgminer 3.7.1 on r9 280x, win7 64 basic.
hero member
Activity: 630
Merit: 500
No dice on the above... not a valid win32 application.

Dug into the Windows error report and it seems to be something with the amdocl.dll

EDIT: Found out why.

For Win 7 64 you do not use the system32 folder... you need to rename amdocl.dll in the SYSWOW64 folder and then add the one linked here.

Working now... time to tweak!

EDIT2: Of course... this has now broken my cgminer for scrypt mining unless I swap the files back... NICE!
hero member
Activity: 630
Merit: 500
Finally got the miner to mine.  Grin
Also needed the setuptools dependancy.

Files needed for mining in Windows:

https://github.com/snoopcode/poclbm-skc
Download as zip.

Python 2.7.6 32bit:
http://www.python.org/getit/

Get these in 32bit, python 2.7 format from http://www.lfd.uci.edu/~gohlke/pythonlibs/
-Ipython
-Mako
-Numpy
-pyopencl
-pytools
-setuptools

amdocl64.dll and copy to Windows/system32 as amdocl.dll if using >13.12 driver (rename the original one as amdocl.new for other miners/games)
http://www.filedropper.com/amdocl64

skeinhash.so for the poclbm folder: http://www.megafileupload.com/en/file/482956/skeinhash-so.html

Thanks to madjihad for the last 2 files.
Thanks to all for the tips, hints and work on the miner.

https://github.com/mjmvisser/adl3
Download as unzip and unzip to miner folder, needed for temp monitoring and setting --cutoff-temp=90 for 79x0 cards.

Missing: 64 bit support (not needed according to "reorder")

Bat file example previously posted:
C:\python27\python.exe C:\skein\poclbm-skc\poclbm.py http://[username.worker]:[password]@skc.coinmine.pl:6400

/EDIT: Added adl3 temp monitor link and bat file example.

Followed this to the letter and got this on Win 7 64 bit:

Fires up and then pop up that python.exe has stopped working.

Bat file is:

C:\python27\python.exe C:\poclbm-skc\poclbm.py http://worker.1:12345@skc.coinmine.pl:6400 -d 0
I was having the same issue with 64bit, but worked once i rebuild sheinhash.so with -march=nocona or maybe try the more generic x86_64
note: i'm running intel cpu

Mind attaching your skeinhash.so? I am on an AMD APU... but would like to see if yours works for me.
Pages:
Jump to: