Author

Topic: Gateless Gate Sharp 1.3.8: 30Mh/s (Ethash) on RX 480! - page 145. (Read 214410 times)

sp_
legendary
Activity: 2926
Merit: 1087
Team Black developer
At any rate, I will have to rethink the financial aspects of open-source miner development.
Sadly, I was making much more money when I was developing private kernels just for myself.

Claymore is making more than $50 000 a month... And you seem to be working for nothing.
sr. member
Activity: 728
Merit: 304
Miner Developer
At any rate, I will have to rethink the financial aspects of open-source miner development.
Sadly, I was making much more money when I was developing private kernels just for myself.
sr. member
Activity: 728
Merit: 304
Miner Developer
It's true that GG's Ethash kernel performs better on Windows.
I just started with optimizations anyway.

So, respectfully, what ever happened to Zcash? We still don't have a first-rate open source miner. All these other algos already have decent open source miners available to them. Just sayin'. Smiley


Oh, I haven'y given up on that yet.
You have to understand, though, that the competition for Zcash miners is much more intense than those for other miners, and my resources are quite limited. It seems to me that people assume that first-class open-source miners just automagically happen, but that's simply not the case.
full member
Activity: 150
Merit: 100
It's true that GG's Ethash kernel performs better on Windows.
I just started with optimizations anyway.

So, respectfully, what ever happened to Zcash? We still don't have a first-rate open source miner. All these other algos already have decent open source miners available to them. Just sayin'. Smiley
sr. member
Activity: 728
Merit: 304
Miner Developer
It's true that GG's Ethash kernel performs better on Windows.
I just started with optimizations anyway.
sr. member
Activity: 652
Merit: 266
I have noticed that claymore has better hashrate on linux than windows regarding 470s and viseversa for 480s...funny... Gigabyte RX 470 4GB G1 Gaming with hynix memory is doing 30 on linux and 29.3 on windows, while same gigabyte but 480 does 30.8 on windows and 30 on linux.
member
Activity: 129
Merit: 10
Look like im getting 28.8 vs 29.9 on claymore. Not sure if its running your ASM version though.

The ASM version is only for Ellesmere for now. I will prepare binaries for other cards shortly.

Yea I have Ellsemere cards but its loading the normal .bin kernels. I did remove the pre-built .bin and built new kernal binaries using ethash-new.cl in /binary-kernal by moving it to /kernal and that bumped up the speed to 29.2, but still a bit short from claymore.
For me it's quite the opposite...using 2 threads gains additional +0.3/0.4 but too many hw errors.
As you can see in the picture 29.42(1 thread) does gg vs 29.8 on claymore vs 30 (2 threads gg).
EDIT: Did some timings mods and now hw errors are acceptable ~1/2 every minute.


Your running 4.10 kernel...im on the stock 4.4. I think thats where the difference is from.
https://drive.google.com/drive/folders/0B72yKpOokCMcVnV5LWNMS2ltYmM
I've uploaded some of my kernels. 4.10/4.11 are tested and working fine.
Just remember to update only opencl packages from amdgpu-pro 16.60.
Ditto about the asm ...


Took a rig from 4.04 w/ 16.40 to 4.11 w/ 16.60, no speed difference when it comes to claymore (29.5 hynix and 30+ samsung on rx470 4g).

Will test gg next.


My samgung cards get 0.5Mh better on claymore 8.1, but on GG eth I have 6 hynix cards that outperform them.

GG eth rates from 18 cards after a few hours.

Samsung (1140/2090)
28.54
28.53
28.58
28.57

Hynix (1140/2040)
27.80
28.01
28.03
28.05
28.07
28.20
28.21
28.35
28.56
28.57
28.60
28.63
28.63
28.65

One of the rigs has 4.11 kernel now.  I dont see any noticeable difference in hash rates running 4.11 kernel. 

One of the rigs I have a kill-a-watt, and Claymore used 30w less power.

My conclusion... Claymore gets 3-5% better hash rate and uses 3% less power.

member
Activity: 129
Merit: 10
Look like im getting 28.8 vs 29.9 on claymore. Not sure if its running your ASM version though.

The ASM version is only for Ellesmere for now. I will prepare binaries for other cards shortly.

Yea I have Ellsemere cards but its loading the normal .bin kernels. I did remove the pre-built .bin and built new kernal binaries using ethash-new.cl in /binary-kernal by moving it to /kernal and that bumped up the speed to 29.2, but still a bit short from claymore.
For me it's quite the opposite...using 2 threads gains additional +0.3/0.4 but too many hw errors.
As you can see in the picture 29.42(1 thread) does gg vs 29.8 on claymore vs 30 (2 threads gg).
EDIT: Did some timings mods and now hw errors are acceptable ~1/2 every minute.


Your running 4.10 kernel...im on the stock 4.4. I think thats where the difference is from.
https://drive.google.com/drive/folders/0B72yKpOokCMcVnV5LWNMS2ltYmM
I've uploaded some of my kernels. 4.10/4.11 are tested and working fine.
Just remember to update only opencl packages from amdgpu-pro 16.60.
Ditto about the asm ...


Took a rig from 4.04 w/ 16.40 to 4.11 w/ 16.60, no speed difference when it comes to claymore (29.5 hynix and 30+ samsung on rx470 4g).

Will test gg next.


newbie
Activity: 50
Merit: 0
zawawa hai visto la mia confi per eth, cosa c'è che non va?

I also tried it on nanopool, but always error

@echo off
@set GPU_FORCE_64BIT_PTR 0
@set GPU_MAX_HEAP_SIZE 100
@set GPU_USE_SYNC_OBJECTS 1
@set GPU_MAX_ALLOC_PERCENT 100
@set GPU_SINGLE_ALLOC_PERCENT 100
gatelessgate.exe --gpu-platform 0 -k ethash-new -o stratum+tcp://etc-eu.pool.sexy:8008 -u 0xeddcea3922f5c605190ca532277fab8fae243f0d.marvy1 -p x --xintensity 4608 --worksize 192 --gpu-threads 2 --no-extranonce
pause

where am I wrong?

I think you should use --gpu-platform 1 for gpu mining..
sr. member
Activity: 652
Merit: 266
How to use your miner with the new musicoin on algo ether seems no pools will connect yet claymore ethermienr works fine
sgminer/gg has no support for old stratum and getwork protocol. To connect to those pools use eth stratum proxy from dwarfpool.
legendary
Activity: 1820
Merit: 1001
How to use your miner with the new musicoin on algo ether seems no pools will connect yet claymore ethermienr works fine
hero member
Activity: 798
Merit: 1000
zawawa hai visto la mia confi per eth, cosa c'è che non va?

I also tried it on nanopool, but always error

@echo off
@set GPU_FORCE_64BIT_PTR 0
@set GPU_MAX_HEAP_SIZE 100
@set GPU_USE_SYNC_OBJECTS 1
@set GPU_MAX_ALLOC_PERCENT 100
@set GPU_SINGLE_ALLOC_PERCENT 100
gatelessgate.exe --gpu-platform 0 -k ethash-new -o stratum+tcp://etc-eu.pool.sexy:8008 -u 0xeddcea3922f5c605190ca532277fab8fae243f0d.marvy1 -p x --xintensity 4608 --worksize 192 --gpu-threads 2 --no-extranonce
pause

where am I wrong?
sr. member
Activity: 728
Merit: 304
Miner Developer
I would love to implement dual-mining, but I found sgminer's back-end too limiting for that.
I really would like to implement my own back-end in Clojure one day, but that's a really long shot...
sr. member
Activity: 728
Merit: 304
Miner Developer

See, Claymore is definitely reading this thread... I knew it!
sr. member
Activity: 652
Merit: 266
hero member
Activity: 798
Merit: 1000
I can not start it, it always fails, someone could post the configuration of your bat file? I have tried it on nanopool, and even on pool.sexy, but keeps giving error
 Huh

@echo off
@set GPU_FORCE_64BIT_PTR 0
@set GPU_MAX_HEAP_SIZE 100
@set GPU_USE_SYNC_OBJECTS 1
@set GPU_MAX_ALLOC_PERCENT 100
@set GPU_SINGLE_ALLOC_PERCENT 100
gatelessgate.exe --gpu-platform 0 -k ethash-new -o stratum+tcp://etc-eu.pool.sexy:8008 -u 0xeddcea3922f5c605190ca532277fab8fae243f0d.marvy1 -p x --xintensity 4608 --worksize 192 --gpu-threads 2 --no-extranonce
pause

where am I wrong?
full member
Activity: 254
Merit: 100
These Hardware errors do have any impact on perfomance? or i can just ignore them?
sr. member
Activity: 450
Merit: 255
starting 24hr test on pre5
member
Activity: 129
Merit: 10
Anyone tested this yet before I try?

1 instance gatelessgate w/ 6 GPU
or
6 instance gatelessgate w/ 1 GPU per instance
sr. member
Activity: 588
Merit: 251

p.s.  There's also some easy optimizations to do with instruction reordering (though they might not make much difference in performance).  For example:
Code:
/*d11c6a3e 01a9013c*/ v_addc_u32      v62, vcc, v60, 0, vcc
/*2a7e62b2         */ v_xor_b32       v63, 50, v49
/*dc5c0000 4000003d*/ flat_load_dwordx4 v[64:67], v[61:62] slc glc
/*dc5c0000 3b00003b*/ flat_load_dwordx4 v[59:62], v[59:60] slc glc
/*bf8c0171         */ s_waitcnt       vmcnt(1) & lgkmcnt(1)

The v_xor_b32 can be moved to after the flat_load_dwordx4.


That's a good catch. I was actually thinking about automating this kind of instruction reordering.
My compiler driver rewrites the output of LLVM/Clang, so it shouldn't be that difficult.
I really want to combine this feature with register usage analysis.

One thing I've wanted to do but never had enough free time for is an in-place keccak implementation (see section 2.5).
http://keccak.noekeon.org/Keccak-implementation-3.2.pdf

It won't help much for pure ethash mining performance as memory bandwidth is the limiting factor, but it could free up a lot of VALU time for dual mining.  And for pure ethash, although it won't do much for performance, it should help reduce power consumption.
And speaking of power consumption, I have an idea that could significantly cut power use in memory-hungry algos like ETH, XMR, & ZEC.  In order to have any hope of getting it to work, I have to first figure out how the active CU masks in the driver work...
Jump to: