Gateless Gate Sharp 1.3.8: 30Mh/s (Ethash) on RX 480! - page 148.

laik2

sr. member

Activity: 652

Merit: 266

Quote from: zawawa on March 31, 2017, 10:29:26 AM

I run the same experiment on Ubuntu 16.04.1 LTC + Linux Kernel 4.10.2 with AMDGPU-Pro 16.60 and the difference was even stalker.

The assembly version:

The original ethash-new.cl:

I will go ahead and prepare binaries for the other cards.
I will probably include GCN1 as it really does not take too long.

Please, do share your linux xorg.conf and also I see a lot of hw errors for stock GPU...Do you run gatelessgate as root or as user?
I see gpu monitoring temp/fan is working for you...

EDIT: Here are the latest tests...
GG ethash-new.cl copied from binary-kernel dir

GG ethash-new.cl copied from kernel dir

GG ethash-new.cl + compiled binaries from binary-kernel dir

Claymore 8.0 Linux default settings

WBF1

sr. member

Activity: 419

Merit: 250

I dropped your new files in over top of version pre4 and am also seeing a 1-1.5 mhs boost on my RX 470s and 480. Did not try on R9 390, but it was already performing as good as Claymore ever had.

zawawa

sr. member

Activity: 728

Merit: 304

Miner Developer

I run the same experiment on Ubuntu 16.04.1 LTC + Linux Kernel 4.10.2 with AMDGPU-Pro 16.60 and the difference was even stalker.

The assembly version:

The original ethash-new.cl:

I will go ahead and prepare binaries for the other cards.
I will probably include GCN1 as it really does not take too long.

zawawa

sr. member

Activity: 728

Merit: 304

Miner Developer

Quote from: nerdralph on March 31, 2017, 09:08:58 AM

Just cloned the repo to do a linux build then noticed the autotools requirement. Ugh. I much prefer it when developers run autoreconf, and check the configure script into the repo. Then to build it's just the usual ./configure;make

I will probably switch to CMake + ninja sooner than later. This whole autotools thing is too archaic to my taste.

nerdralph

sr. member

Activity: 588

Merit: 251

Just cloned the repo to do a linux build then noticed the autotools requirement. Ugh. I much prefer it when developers run autoreconf, and check the configure script into the repo. Then to build it's just the usual ./configure;make

nerdralph

sr. member

Activity: 588

Merit: 251

Quote from: zawawa on March 30, 2017, 10:32:22 PM

I just pushed to the repo an optimized GCN assembly version of ethash-new.cl for RX 470/480.
Each card should get a 1Mh/s boost with it. If this actually works, then I will extend its support to GCN1/GCN3 devices.
(I sold all of my GCN2 cards a while back...)

I probably wouldn't even bother with Southern Islands; no flat_ instructions. It should be reasonably easy to write a single kernel for Sea Islands and later, with the main differences being for the ABI changes for kernel parameter passing.

nerdralph

sr. member

Activity: 588

Merit: 251

Quote from: ?? on ??

Quote from: zawawa on March 30, 2017, 10:32:22 PM

I just pushed to the repo an optimized GCN assembly version of ethash-new.cl for RX 470/480.
Each card should get a 1Mh/s boost with it. If this actually works, then I will extend its support to GCN1/GCN3 devices.
(I sold all of my GCN2 cards a while back...)

That's not optimized - you flipped the SLC and GLC bits, which will likely make it a tad SLOWER; it did when I tried that.

I was expecting just SLC (bypass L2) to help, though I recall Wolf's comments about GLC (bypass L1) actually helping. I'd even expect GLC to hurt performance if you weren't very careful to ensure data was read in 64-byte chunks.

p.s. There's also some easy optimizations to do with instruction reordering (though they might not make much difference in performance). For example:

Code:

/*d11c6a3e 01a9013c*/ v_addc_u32 v62, vcc, v60, 0, vcc
/*2a7e62b2 */ v_xor_b32 v63, 50, v49
/*dc5c0000 4000003d*/ flat_load_dwordx4 v[64:67], v[61:62] slc glc
/*dc5c0000 3b00003b*/ flat_load_dwordx4 v[59:62], v[59:60] slc glc
/*bf8c0171 */ s_waitcnt vmcnt(1) & lgkmcnt(1)

The v_xor_b32 can be moved to after the flat_load_dwordx4.

SunStruck

sr. member

Activity: 676

Merit: 250

Quote from: zawawa on March 31, 2017, 02:56:02 AM

The .bat file:

Code:

@echo off
set GPU_FORCE_64BIT_PTR 0
set GPU_MAX_HEAP_SIZE 100
set GPU_USE_SYNC_OBJECTS 1
set GPU_MAX_ALLOC_PERCENT 100
set GPU_SINGLE_ALLOC_PERCENT 100
gatelessgate.exe --gpu-platform 1 -k ethash-new -o stratum+tcp://eu1.ethermine.org:4444 -u 0x91fa32e00b0f365d629fb625182a83fed61f0642.gatelessgate -p x --xintensity 4620 --worksize 192 --gpu-threads 2 --no-extranonce
pause

I ran this experiment on Windows 10 with stock RX 480 and AMD Crimson Software 16.9.2 as usual.
@laik2 It would be great if you could try the above settings as well. I am puzzled by the results myself...

--gpu-platform 0 , no ?

still cant connect to suprnova eth tho.. very odd.

laik2

sr. member

Activity: 652

Merit: 266

Tried default config, xI 2048, xI 1024, gpu-threads 2,1...every time the windows crashed, I'm sorry but I can't test due to unknown reason...I don't like windows environment at all.
On linux it's doing ~ claymore.

zawawa

sr. member

Activity: 728

Merit: 304

Miner Developer

I run the same experiment one more time, and I confirmed that ethash-new.cl actually runs faster with the GLC and SLC bits on.

With the GLC and SLC bits on:

With the GLC and SLC bits off:

The .bat file:

Code:

@echo off
set GPU_FORCE_64BIT_PTR 0
set GPU_MAX_HEAP_SIZE 100
set GPU_USE_SYNC_OBJECTS 1
set GPU_MAX_ALLOC_PERCENT 100
set GPU_SINGLE_ALLOC_PERCENT 100
gatelessgate.exe --gpu-platform 1 -k ethash-new -o stratum+tcp://eu1.ethermine.org:4444 -u 0x91fa32e00b0f365d629fb625182a83fed61f0642.gatelessgate -p x --xintensity 4620 --worksize 192 --gpu-threads 2 --no-extranonce
pause

I ran this experiment on Windows 10 with stock RX 480 and AMD Crimson Software 16.9.2 as usual.
@laik2 It would be great if you could try the above settings as well. I am puzzled by the results myself...

zawawa

sr. member

Activity: 728

Merit: 304

Miner Developer

Quote from: ?? on ??

Quote from: zawawa on March 30, 2017, 10:32:22 PM

I just pushed to the repo an optimized GCN assembly version of ethash-new.cl for RX 470/480.
Each card should get a 1Mh/s boost with it. If this actually works, then I will extend its support to GCN1/GCN3 devices.
(I sold all of my GCN2 cards a while back...)

That's not optimized - you flipped the SLC and GLC bits, which will likely make it a tad SLOWER; it did when I tried that.

That's what I thought as well, but the miner was actually running faster, though.
Let me double check...

laik2

sr. member

Activity: 652

Merit: 266

Quote from: zawawa on March 30, 2017, 10:32:22 PM

I just pushed to the repo an optimized GCN assembly version of ethash-new.cl for RX 470/480.
Each card should get a 1Mh/s boost with it. If this actually works, then I will extend its support to GCN1/GCN3 devices.
(I sold all of my GCN2 cards a while back...)

Excellent, but what settings are best to achieve those speeds?
I have tried 1 gpu thread WS=192, xI: 1024 but speed is lower

zawawa

sr. member

Activity: 728

Merit: 304

Miner Developer

I just pushed to the repo an optimized GCN assembly version of ethash-new.cl for RX 470/480.
Each card should get a 1Mh/s boost with it. If this actually works, then I will extend its support to GCN1/GCN3 devices.
(I sold all of my GCN2 cards a while back...)

Superdawg

newbie

Activity: 1

Merit: 0

i'm getting about 2 mh/s less than genoil miner over 4 cards.

Genoil's gives me about 90 mh/s on 4 rx 480 stock no bios mods
this one says about 88, maybe a bit less.

joaocha

full member

Activity: 254

Merit: 100

Zawawa didnt even touch the kernel,and it is fast as claymore on my 390, im sure if he give a week on it, it ill beat claymore

nerdralph

sr. member

Activity: 588

Merit: 251

Quote from: WBF1 on March 30, 2017, 02:28:14 PM

some additional info:

R9 390 with gateless gate & ethash-new = 31.9-32 mhs. xI of 1920, worksize 192, 2 threads

This is the same or better performance I get from Claymore. I am getting slowly increasing HW errors, but no rejected shares or issues shown at the pool side.

This card has +6.5% core clock and 1550 mhz memclock. Drivers are 17.3.2 on Windows 8.1.

Sgminer has always been faster than claymore on my 290x on Linux/fglrx. It's on Polaris where it is slightly behind.

cryptominer420

sr. member

Activity: 450

Merit: 255

on 7870XT I am using XI 1512 and -g 2 and it is doing 23MH mining UBIQ 1 HW error in the last hr.

joaocha

full member

Activity: 254

Merit: 100

2 threads dont change too much thing at all...only hardware errors

Just use Xi - 512
and gpu threads - 1

avoid a lot of hw errors

cryptominer420

sr. member

Activity: 450

Merit: 255

Quote from: zawawa on March 30, 2017, 10:55:58 AM

I just uploaded a new pre-release:

https://github.com/zawawawa/gatelessgate/releases/tag/v0.1.3-pre4

There are some performance improvements, and ethash-new.cl was added as an experimental feature.
Let me know how that works.

This one works on ubiqpool.io using my old 7870XT

joaocha

full member

Activity: 254

Merit: 100

On latest drivers, the eth-new works far better them 16.x or lower

Topic: Gateless Gate Sharp 1.3.8: 30Mh/s (Ethash) on RX 480! - page 148. (Read 214431 times)