Author

Topic: Gateless Gate Sharp 1.3.8: 30Mh/s (Ethash) on RX 480! - page 174. (Read 214410 times)

sr. member
Activity: 728
Merit: 304
Miner Developer
WTF. This cannot be right.

sr. member
Activity: 728
Merit: 304
Miner Developer
Now I think about it, it is no wonder that bank conflicts would be a serious problem considering the fact that GDS'es 32 banks are shared across all the compute units unlike LDS. My next game plan is to reduce the number of wavefronts to avoid bank conflicts in GDS. We will see.
sr. member
Activity: 728
Merit: 304
Miner Developer
It turned out that I really need to reduce bank conflicts in GDS.
I also need to revisit global syncs with GWS instructions.
I wonder if Claymore already did research on these mostly undocumented features for his other miner programs.
In the mean time, I will upload a preliminary assembly version for GCN1 for testing purposes.
sr. member
Activity: 728
Merit: 304
Miner Developer
I will look into that. The assembly version for 7990 is almost ready...

have people looked at the returns for pascal coin right now? I would forget mining unless you have almost free electric and super high amounts of hashrate. http://whattomine.com/coins/172-pasc-pascal



No wonder... If it uses a variant of SHA-256, it wouldn't be within the reach of GPU's anyway.
sr. member
Activity: 728
Merit: 304
Miner Developer
Quote
There are 32 banks mapped to the lowest bits of the dw offset.
https://community.amd.com/thread/167167

Does this mean access to GDS always go through Bank 0 as long as the offset of the DS instruction is not specified?
This cannot be right... It seems like GDS is as horrendously designed difficult to use as VGPR indexing...

I think "offset" is not the correct term.  The GDS address (including the offset) is what matters.  So dword addresses 0x0020 and 0x0021 are on different banks, but addresses 0x0020 and 0x0060 are on the same bank.
Quote from GCN ISA: "The GDS is configured with 32 banks, each with 512 entries of 4 bytes each."



That makes a perfect sense. I did read that description about GDS, but I got really confused because realhet is usually right about almost anything related to GCN. I must be pretty tired...

In any case, the fact remains that GDS bank conflicts are a serious problem and row counters must be unpacked. A single thread assembly version is already running stably with a major speed boost. Multithreading is rather tricky because I now have to use the entire 64KB of GDS, which is rather tricky and required modifications of the code. I am expecting an upper 400's on 7990 with this optimization alone.
sr. member
Activity: 588
Merit: 251
Quote
There are 32 banks mapped to the lowest bits of the dw offset.
https://community.amd.com/thread/167167

Does this mean access to GDS always go through Bank 0 as long as the offset of the DS instruction is not specified?
This cannot be right... It seems like GDS is as horrendously designed difficult to use as VGPR indexing...

I think "offset" is not the correct term.  The GDS address (including the offset) is what matters.  So dword addresses 0x0020 and 0x0021 are on different banks, but addresses 0x0020 and 0x0060 are on the same bank.
Quote from GCN ISA: "The GDS is configured with 32 banks, each with 512 entries of 4 bytes each."

sr. member
Activity: 410
Merit: 250
I will look into that. The assembly version for 7990 is almost ready...

have people looked at the returns for pascal coin right now? I would forget mining unless you have almost free electric and super high amounts of hashrate. http://whattomine.com/coins/172-pasc-pascal

sr. member
Activity: 728
Merit: 304
Miner Developer
I will look into that. The assembly version for 7990 is almost ready...
sr. member
Activity: 410
Merit: 250
ocminer (suprnova) is working on a pool he should release soon.

In that case we will need a pool miner as I think all miners out there are currently solo based, I also hear there is problems with the Cuda version of the miner
full member
Activity: 224
Merit: 100
CryptoLearner
ocminer (suprnova) is working on a pool he should release soon.
sr. member
Activity: 449
Merit: 251
@zawawa

If you know sha256  you might look at pascal coin, no pool miner yet and sha256 need slight modification for it, i think many ppl would like to see it, if pascal price hold
NV solo miner is flowed and AMD is just solo
ocminer worked at pool but none is doing miner because it need new protocol for pool
There are no pools yet... and mining profit on it is in the toilet anyway.
legendary
Activity: 1901
Merit: 1024
@zawawa

If you know sha256  you might look at pascal coin, no pool miner yet and sha256 need slight modification for it, i think many ppl would like to see it, if pascal price hold
NV solo miner is flowed and AMD is just solo
ocminer worked at pool but none is doing miner because it need new protocol for pool
sr. member
Activity: 652
Merit: 266
Could be ncurses as I added colors to the UI. I will check it later.
I mean equihash.working1/2 Smiley
sr. member
Activity: 728
Merit: 304
Miner Developer
Could be ncurses as I added colors to the UI. I will check it later.
sr. member
Activity: 652
Merit: 266
Quote
There are 32 banks mapped to the lowest bits of the dw offset.
https://community.amd.com/thread/167167

Does this mean access to GDS always go through Bank 0 as long as the offset of the DS instruction is not specified?
This cannot be right... It seems like GDS is as horrendously designed difficult to use as VGPR indexing...
Good work, new kernels do segfaults on ubuntu, though...
sr. member
Activity: 728
Merit: 304
Miner Developer
Quote
There are 32 banks mapped to the lowest bits of the dw offset.
https://community.amd.com/thread/167167

Does this mean access to GDS always go through Bank 0 as long as the offset of the DS instruction is not specified?
This cannot be right... It seems like GDS is as horrendously designed difficult to use as VGPR indexing...
sr. member
Activity: 728
Merit: 304
Miner Developer
Hmm... GDS counters are not as fast as they should be.
I probably need to unpack them and use uint counters.
sr. member
Activity: 728
Merit: 304
Miner Developer
Thanks, but I'd rather be done with it sooner than later, though.
Now I'm optimizing the OpenCL kernel again and getting 434 sol/s on stock 7990.
I already confirmed that GDS counters do make the miner run faster and the current assembly version is surprisingly stable given all the crazy stuff that is going on behind the scene, but I realized I still need to work on the OpenCL kernel.

Can't wait for an optimized Fiji kernel! Thanks for all your effort!

Fiji is GCN3, so the assembly version for RX 480 should work.
We will  see.
sr. member
Activity: 728
Merit: 304
Miner Developer
Alright, I'm done with the OpenCL kernel.
All I have to do now is to rewrite a disassembled code of the kernel, which should be straight forward.
I will work on the assembly version for 7990 first, then I will do the same for RX 480 on Linux.
I will support both platforms, so no worries.
full member
Activity: 150
Merit: 100
Thanks, but I'd rather be done with it sooner than later, though.
Now I'm optimizing the OpenCL kernel again and getting 434 sol/s on stock 7990.
I already confirmed that GDS counters do make the miner run faster and the current assembly version is surprisingly stable given all the crazy stuff that is going on behind the scene, but I realized I still need to work on the OpenCL kernel.

Can't wait for an optimized Fiji kernel! Thanks for all your effort!
Jump to: