Author

Topic: Gateless Gate Sharp 1.3.8: 30Mh/s (Ethash) on RX 480! - page 176. (Read 214410 times)

sr. member
Activity: 588
Merit: 251
I knew it! Could you tell engineers at AMD that we need GDS both on Windows and Linux?
This is such a waste of time and energy...

Well, considering I don't give a shit about Windoze ...
full member
Activity: 224
Merit: 100
CryptoLearner
I knew it! Could you tell engineers at AMD that we need GDS both on Windows and Linux?
This is such a waste of time and energy...

Not at all...your implementation will work just fine under linux, and im pretty sure the majority of miners (at least people with more than 1 or 2 cards) are on linux anyways.

I think the people mining with < 10 rigs are mostly on windows.
sr. member
Activity: 728
Merit: 304
Miner Developer
I knew it! Could you tell engineers at AMD that we need GDS both on Windows and Linux?
This is such a waste of time and energy...

Not at all...your implementation will work just fine under linux, and im pretty sure the majority of miners (at least people with more than 1 or 2 cards) are on linux anyways.

I hope so... We will see.
legendary
Activity: 2156
Merit: 1400
I knew it! Could you tell engineers at AMD that we need GDS both on Windows and Linux?
This is such a waste of time and energy...

Not at all...your implementation will work just fine under linux, and im pretty sure the majority of miners (at least people with more than 1 or 2 cards) are on linux anyways.
sr. member
Activity: 728
Merit: 304
Miner Developer
I knew it! Could you tell engineers at AMD that we need GDS both on Windows and Linux?
This is such a waste of time and energy...
sr. member
Activity: 588
Merit: 251
I triplechecked the instruction encoding with CodeXL. Sadly, a driver problem is the most plausible explanation at this point. They must have forgotten to issue a PM4 packet to initialize GDS (ALLOC_GDS). Another AMD driver woe.

I think you are right.  I looked at the ROC-K source and there is no PM4 packet type defined for ALLOC_GDS there either.
https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver/blob/1853014d96c6af2d81d424f98d320810f40391d8/drivers/gpu/drm/amd/amdkfd/kfd_pm4_opcodes.h
And it looks like the same code that is used in the Linux kernel:
https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/amd/amdkfd/kfd_pm4_opcodes.h

Since there is no easy way to use GDS in OpenCL, and probably not in DX12 either, AMD likely doesn't have a test case for their windows drivers that checks GDS initialization.
sr. member
Activity: 728
Merit: 304
Miner Developer
GDS counters are finally working!
They still need optimizations, but this is definitely a step forward.
sr. member
Activity: 728
Merit: 304
Miner Developer
So GG is running at 413 sol/s on my old stock 7990 with Crimson 16.9.2.
With the same setup, the assembly version of Claymore's 11.1 Beta yields 522 sol/s.
Let's see how much I can improve GG's performance with the GCN assembly.
sr. member
Activity: 728
Merit: 304
Miner Developer
I am so tired now...
I will work on the assembly version for 7990 tomorrow.
OpenCL dummy codes for GDS optimizations are ready, so it's a matter of changing several lines.
Hopefully I will bring you guys a good news then.

Hmm have you looked at this? https://github.com/olvaffe/gpu-docs/blob/master/amd-open-gpu-docs/AMD_GCN3_Instruction_Set_Architecture.pdf

Its the ASM bible speced out for GCN 1.3. Looking at the GDS spec bit 16 sets GDS, so looks like GDS bit for GCN 1.0/1.1 is 17 with 18-25 for the OPCODE, and GCN 1.2+ GDS is bit 16 with 17-24 for OPCODE.

Oh, I downloaded that manual the day it came out, and it has been my bed-time read for quite some time now. Smiley Actually, I was one of those who demanded the manual on the AMD forum when GCN3 first came out.

I triplechecked the instruction encoding with CodeXL. Sadly, a driver problem is the most plausible explanation at this point. They must have forgotten to issue a PM4 packet to initialize GDS (ALLOC_GDS). Another AMD driver woe.
legendary
Activity: 2156
Merit: 1400
I am so tired now...
I will work on the assembly version for 7990 tomorrow.
OpenCL dummy codes for GDS optimizations are ready, so it's a matter of changing several lines.
Hopefully I will bring you guys a good news then.

Hmm have you looked at this? https://github.com/olvaffe/gpu-docs/blob/master/amd-open-gpu-docs/AMD_GCN3_Instruction_Set_Architecture.pdf

Its the ASM bible speced out for GCN 1.3. Looking at the GDS spec bit 16 sets GDS, so looks like GDS bit for GCN 1.0/1.1 is 17 with 18-25 for the OPCODE, and GCN 1.2+ GDS is bit 16 with 17-24 for OPCODE.
sr. member
Activity: 728
Merit: 304
Miner Developer
I am so tired now...
I will work on the assembly version for 7990 tomorrow.
OpenCL dummy codes for GDS optimizations are ready, so it's a matter of changing several lines.
Hopefully I will bring you guys a good news then.
sr. member
Activity: 728
Merit: 304
Miner Developer
Yes, optiminer is using ASM on polaris driver under linux, so it works there. His Polaris speedup does NOT work under windows, so it must be a windows restriction.

That's really good to know. Gotta love AMD for its completely arbitrary decisions...
sr. member
Activity: 728
Merit: 304
Miner Developer
Claymore's seems to be working fine with Crimson 16.9.2, though.
You are right. Claymore's doesn't use ASM for RX 480.

Quote
GPU #0: Tahiti, 3072 MB available, 32 compute units
GPU #0 recognized as Radeon 280X/380X
GPU #1: Ellesmere, 8192 MB available, 36 compute units
GPU #1 recognized as Radeon RX 480
GPU #2: Tahiti, 3072 MB available, 32 compute units
GPU #2 recognized as Radeon 280X/380X
POOL version
GPU #0 algorithm ASM, intensity 6
GPU #1 algorithm 2, intensity 6
GPU #2 algorithm ASM, intensity 6

I just confirmed that GDS on 7990 was accessible, so the aforementioned restriction must be specific to GCN2+ devices, at least on Windows.

I know Optiminer said you had to use his linux miner to get max hash rate from RX4xx, because on Windows he can't access some of the driver features he needs. On R9 Fury and earlier, he's not having any problems.

Thank you so much for the clarification. That means I have to switch my Ellesmere farm to Linux, though. Oh well.
full member
Activity: 150
Merit: 100
Claymore's seems to be working fine with Crimson 16.9.2, though.
You are right. Claymore's doesn't use ASM for RX 480.

Quote
GPU #0: Tahiti, 3072 MB available, 32 compute units
GPU #0 recognized as Radeon 280X/380X
GPU #1: Ellesmere, 8192 MB available, 36 compute units
GPU #1 recognized as Radeon RX 480
GPU #2: Tahiti, 3072 MB available, 32 compute units
GPU #2 recognized as Radeon 280X/380X
POOL version
GPU #0 algorithm ASM, intensity 6
GPU #1 algorithm 2, intensity 6
GPU #2 algorithm ASM, intensity 6

I just confirmed that GDS on 7990 was accessible, so the aforementioned restriction must be specific to GCN2+ devices, at least on Windows.

I know Optiminer said you had to use his linux miner to get max hash rate from RX4xx, because on Windows he can't access some of the driver features he needs. On R9 Fury and earlier, he's not having any problems.
legendary
Activity: 2156
Merit: 1400
Claymore's seems to be working fine with Crimson 16.9.2, though.
You are right. Claymore's doesn't use ASM for RX 480.

Quote
GPU #0: Tahiti, 3072 MB available, 32 compute units
GPU #0 recognized as Radeon 280X/380X
GPU #1: Ellesmere, 8192 MB available, 36 compute units
GPU #1 recognized as Radeon RX 480
GPU #2: Tahiti, 3072 MB available, 32 compute units
GPU #2 recognized as Radeon 280X/380X
POOL version
GPU #0 algorithm ASM, intensity 6
GPU #1 algorithm 2, intensity 6
GPU #2 algorithm ASM, intensity 6

I just confirmed that GDS on 7990 was accessible, so the aforementioned restriction must be specific to GCN2+ devices, at least on Windows.

Yes, optiminer is using ASM on polaris driver under linux, so it works there. His Polaris speedup does NOT work under windows, so it must be a windows restriction.
sr. member
Activity: 728
Merit: 304
Miner Developer
Claymore's seems to be working fine with Crimson 16.9.2, though.
You are right. Claymore's doesn't use ASM for RX 480.

Quote
GPU #0: Tahiti, 3072 MB available, 32 compute units
GPU #0 recognized as Radeon 280X/380X
GPU #1: Ellesmere, 8192 MB available, 36 compute units
GPU #1 recognized as Radeon RX 480
GPU #2: Tahiti, 3072 MB available, 32 compute units
GPU #2 recognized as Radeon 280X/380X
POOL version
GPU #0 algorithm ASM, intensity 6
GPU #1 algorithm 2, intensity 6
GPU #2 algorithm ASM, intensity 6

I just confirmed that GDS on 7990 was accessible, so the aforementioned restriction must be specific to GCN2+ devices, at least on Windows.
full member
Activity: 254
Merit: 100
Claymore and optmizer are having the same problem with GCN and newer drivers.
sr. member
Activity: 728
Merit: 304
Miner Developer
I suspect the driver initializes M0 when gds_segment_byte_size is set in the kernel configuration.

I assumed that the GDS base/size combination would be stored in one of SGPR's just like the OpenCL 1.2 ABI, but you may be right. I will check it right now.

Nope, no luck. O GDS, where art thou?

I've been meaning to look into why optiminer requires GPU_FORCE_64BIT_PTR=1.  Perhaps some quirk of the driver that GDS only works in 64-bit mode?


That wasn't it either. If you come up with any ideas, please let me know.
https://github.com/CLRX/CLRX-mirror/wiki/GcnInstrsDs ?


Yeah, I read that page like 100 times...
I had to ask the author of that page to add support for GDS to his GCN assembler, so I don't think he has the answer.
I will ask him, though.

Edit: matszpk didn't know how to enable GDS on GCN2/3/4 devices, either. Such a nice guy, though.
https://github.com/CLRX/CLRX-mirror/issues/12
sr. member
Activity: 652
Merit: 266
I suspect the driver initializes M0 when gds_segment_byte_size is set in the kernel configuration.

I assumed that the GDS base/size combination would be stored in one of SGPR's just like the OpenCL 1.2 ABI, but you may be right. I will check it right now.

Nope, no luck. O GDS, where art thou?

I've been meaning to look into why optiminer requires GPU_FORCE_64BIT_PTR=1.  Perhaps some quirk of the driver that GDS only works in 64-bit mode?


That wasn't it either. If you come up with any ideas, please let me know.
https://github.com/CLRX/CLRX-mirror/wiki/GcnInstrsDs ?
sr. member
Activity: 728
Merit: 304
Miner Developer
I suspect the driver initializes M0 when gds_segment_byte_size is set in the kernel configuration.

I assumed that the GDS base/size combination would be stored in one of SGPR's just like the OpenCL 1.2 ABI, but you may be right. I will check it right now.

Nope, no luck. O GDS, where art thou?

I've been meaning to look into why optiminer requires GPU_FORCE_64BIT_PTR=1.  Perhaps some quirk of the driver that GDS only works in 64-bit mode?


That wasn't it either. If you come up with any ideas, please let me know.
Jump to: