Author

Topic: Gateless Gate Sharp 1.3.8: 30Mh/s (Ethash) on RX 480! - page 148. (Read 214431 times)

sr. member
Activity: 652
Merit: 266
I run the same experiment on Ubuntu 16.04.1 LTC + Linux Kernel 4.10.2 with AMDGPU-Pro 16.60 and the difference was even stalker.

The assembly version:



The original ethash-new.cl:



I will go ahead and prepare binaries for the other cards.
I will probably include GCN1 as it really does not take too long.
Please, do share your linux xorg.conf and also I see a lot of hw errors for stock GPU...Do you run gatelessgate as root or as user?
I see gpu monitoring temp/fan is working for you...

EDIT: Here are the latest tests...
GG ethash-new.cl copied from binary-kernel dir

GG ethash-new.cl copied from kernel dir

GG ethash-new.cl + compiled binaries from binary-kernel dir

Claymore 8.0 Linux default settings


sr. member
Activity: 419
Merit: 250
I dropped your new files in over top of version pre4 and am also seeing a 1-1.5 mhs boost on my RX 470s and 480. Did not try on R9 390, but it was already performing as good as Claymore ever had.
sr. member
Activity: 728
Merit: 304
Miner Developer
I run the same experiment on Ubuntu 16.04.1 LTC + Linux Kernel 4.10.2 with AMDGPU-Pro 16.60 and the difference was even stalker.

The assembly version:



The original ethash-new.cl:



I will go ahead and prepare binaries for the other cards.
I will probably include GCN1 as it really does not take too long.
sr. member
Activity: 728
Merit: 304
Miner Developer
Just cloned the repo to do a linux build then noticed the autotools requirement.  Ugh.  I much prefer it when developers run autoreconf, and check the configure script into the repo.  Then to build it's just the usual ./configure;make


I will probably switch to CMake + ninja sooner than later. This whole autotools thing is too archaic to my taste.
sr. member
Activity: 588
Merit: 251
Just cloned the repo to do a linux build then noticed the autotools requirement.  Ugh.  I much prefer it when developers run autoreconf, and check the configure script into the repo.  Then to build it's just the usual ./configure;make
sr. member
Activity: 588
Merit: 251
I just pushed to the repo an optimized GCN assembly version of ethash-new.cl for RX 470/480.
Each card should get a 1Mh/s boost with it. If this actually works, then I will extend its support to GCN1/GCN3 devices.
(I sold all of my GCN2 cards a while back...)

I probably wouldn't even bother with Southern Islands; no flat_ instructions.  It should be reasonably easy to write a single kernel for Sea Islands and later, with the main differences being for the ABI changes for kernel parameter passing.
sr. member
Activity: 588
Merit: 251
I just pushed to the repo an optimized GCN assembly version of ethash-new.cl for RX 470/480.
Each card should get a 1Mh/s boost with it. If this actually works, then I will extend its support to GCN1/GCN3 devices.
(I sold all of my GCN2 cards a while back...)

That's not optimized - you flipped the SLC and GLC bits, which will likely make it a tad SLOWER; it did when I tried that.

I was expecting just SLC (bypass L2) to help, though I recall Wolf's comments about GLC (bypass L1) actually helping.  I'd even expect GLC to hurt performance if you weren't very careful to ensure data was read in 64-byte chunks.

p.s.  There's also some easy optimizations to do with instruction reordering (though they might not make much difference in performance).  For example:
Code:
/*d11c6a3e 01a9013c*/ v_addc_u32      v62, vcc, v60, 0, vcc
/*2a7e62b2         */ v_xor_b32       v63, 50, v49
/*dc5c0000 4000003d*/ flat_load_dwordx4 v[64:67], v[61:62] slc glc
/*dc5c0000 3b00003b*/ flat_load_dwordx4 v[59:62], v[59:60] slc glc
/*bf8c0171         */ s_waitcnt       vmcnt(1) & lgkmcnt(1)

The v_xor_b32 can be moved to after the flat_load_dwordx4.
sr. member
Activity: 676
Merit: 250

The .bat file:

Code:
@echo off
set GPU_FORCE_64BIT_PTR 0
set GPU_MAX_HEAP_SIZE 100
set GPU_USE_SYNC_OBJECTS 1
set GPU_MAX_ALLOC_PERCENT 100
set GPU_SINGLE_ALLOC_PERCENT 100
gatelessgate.exe --gpu-platform 1 -k ethash-new -o stratum+tcp://eu1.ethermine.org:4444 -u 0x91fa32e00b0f365d629fb625182a83fed61f0642.gatelessgate -p x --xintensity 4620 --worksize 192 --gpu-threads 2 --no-extranonce
pause

I ran this experiment on Windows 10 with stock RX 480 and AMD  Crimson Software 16.9.2 as usual.
@laik2 It would be great if you could try the above settings as well. I am puzzled by the results myself...

--gpu-platform 0 , no ?

still cant connect to suprnova eth tho.. very odd.
sr. member
Activity: 652
Merit: 266
Tried default config, xI 2048, xI 1024, gpu-threads 2,1...every time the windows crashed, I'm sorry but I can't test due to unknown reason...I don't like windows environment at all.
On linux it's doing ~ claymore.
sr. member
Activity: 728
Merit: 304
Miner Developer
I run the same experiment one more time, and I confirmed that ethash-new.cl actually runs faster with the GLC and SLC bits on.

With the GLC and SLC bits on:



With the GLC and SLC bits off:



The .bat file:

Code:
@echo off
set GPU_FORCE_64BIT_PTR 0
set GPU_MAX_HEAP_SIZE 100
set GPU_USE_SYNC_OBJECTS 1
set GPU_MAX_ALLOC_PERCENT 100
set GPU_SINGLE_ALLOC_PERCENT 100
gatelessgate.exe --gpu-platform 1 -k ethash-new -o stratum+tcp://eu1.ethermine.org:4444 -u 0x91fa32e00b0f365d629fb625182a83fed61f0642.gatelessgate -p x --xintensity 4620 --worksize 192 --gpu-threads 2 --no-extranonce
pause

I ran this experiment on Windows 10 with stock RX 480 and AMD  Crimson Software 16.9.2 as usual.
@laik2 It would be great if you could try the above settings as well. I am puzzled by the results myself...
sr. member
Activity: 728
Merit: 304
Miner Developer
I just pushed to the repo an optimized GCN assembly version of ethash-new.cl for RX 470/480.
Each card should get a 1Mh/s boost with it. If this actually works, then I will extend its support to GCN1/GCN3 devices.
(I sold all of my GCN2 cards a while back...)

That's not optimized - you flipped the SLC and GLC bits, which will likely make it a tad SLOWER; it did when I tried that.

That's what I thought as well, but the miner was actually running faster, though.
Let me double check...
sr. member
Activity: 652
Merit: 266
I just pushed to the repo an optimized GCN assembly version of ethash-new.cl for RX 470/480.
Each card should get a 1Mh/s boost with it. If this actually works, then I will extend its support to GCN1/GCN3 devices.
(I sold all of my GCN2 cards a while back...)
Excellent, but what settings are best to achieve those speeds?
I have tried 1 gpu thread WS=192, xI: 1024 but speed is lower
sr. member
Activity: 728
Merit: 304
Miner Developer
I just pushed to the repo an optimized GCN assembly version of ethash-new.cl for RX 470/480.
Each card should get a 1Mh/s boost with it. If this actually works, then I will extend its support to GCN1/GCN3 devices.
(I sold all of my GCN2 cards a while back...)
newbie
Activity: 1
Merit: 0
i'm getting about 2 mh/s less than genoil miner over 4 cards.

Genoil's gives me about 90 mh/s on 4 rx 480 stock no bios mods
this one says about 88, maybe a bit less.
full member
Activity: 254
Merit: 100
Zawawa didnt even touch the kernel,and it is fast as claymore on my 390, im sure if he give a week on it, it ill beat claymore
sr. member
Activity: 588
Merit: 251
some additional info:

R9 390 with gateless gate & ethash-new = 31.9-32 mhs. xI of 1920, worksize 192, 2 threads

This is the same or better performance I get from Claymore. I am getting slowly increasing HW errors, but no rejected shares or issues shown at the pool side.

This card has +6.5% core clock and 1550 mhz memclock. Drivers are 17.3.2 on Windows 8.1.

Sgminer has always been faster than claymore on my 290x on Linux/fglrx.  It's on Polaris where it is slightly behind.
sr. member
Activity: 450
Merit: 255
on 7870XT I am using XI 1512 and -g 2 and it is doing 23MH mining UBIQ 1 HW error in the last hr.
full member
Activity: 254
Merit: 100
2 threads dont change too much thing at all...only hardware errors

Just use Xi - 512
and gpu threads  - 1

avoid a lot of hw errors
sr. member
Activity: 450
Merit: 255
I just uploaded a new pre-release:

https://github.com/zawawawa/gatelessgate/releases/tag/v0.1.3-pre4

There are some performance improvements, and ethash-new.cl was added as an experimental feature.
Let me know how that works.

This one works on ubiqpool.io using my old 7870XT
full member
Activity: 254
Merit: 100
On latest drivers, the eth-new works far better them 16.x or lower
Jump to: