Gateless Gate Sharp 1.3.8: 30Mh/s (Ethash) on RX 480! - page 165.

zawawa

sr. member

Activity: 728

Merit: 304

Miner Developer

Quote from: joaocha on February 15, 2017, 03:16:56 PM

Where are you zawawa Shocked

Oh, I'm right here. I was really focused on programming.
This LLVM thing is taking way too long, so I might take a break and work on the previous assembly version instead.

joaocha

full member

Activity: 254

Merit: 100

Where are you zawawa Shocked

zawawa

sr. member

Activity: 728

Merit: 304

Miner Developer

I can now see how sloppy optimizations of AMD's drivers are...
They should be more aggressive with alloca promotion.

Code:

// TODO: Have some sort of hint or other heuristics to guess occupancy based
// on other factors..
unsigned OccupancyHint
= AMDGPU::getIntegerAttribute(F, "amdgpu-max-waves-per-eu", 0);
if (OccupancyHint == 0)
OccupancyHint = 7;

Code:

// FIXME: There is no reason why we can't support larger arrays, we
// are just being conservative for now.
if (!AllocaTy ||
AllocaTy->getElementType()->isVectorTy() ||
AllocaTy->getNumElements() > 4 ||
AllocaTy->getNumElements() < 2) {
DEBUG(dbgs() << " Cannot convert type to vector\n");
return false;
}

laik2

sr. member

Activity: 652

Merit: 266

Use that profile for ETH mining

Quote

"profiles": [
{
"name": "eth",
"algorithm": "ethash",
"xintensity": "1024",
"worksize": "64",
"gpu-threads": "1"
}

zawawa

sr. member

Activity: 728

Merit: 304

Miner Developer

Oh well. At least I don't have to suffer those pretty mysterious register spills any more.
It's quite scary to think about the possibility that this code might be running on hundreds of thousands of devices, though.

zawawa

sr. member

Activity: 728

Merit: 304

Miner Developer

Quote from: nerdralph on February 13, 2017, 03:55:36 PM

Quote from: zawawa on February 13, 2017, 01:57:25 PM

So it seems that LLVM imposes an arbitrary limitation on the maximum number of VGPR's per wavefront, which presumably results in premature register spills and inefficient VGPR utilization. I wonder if these pieces of codes were responsible for AMD's recent low quality drivers...

Code:

unsigned SIRegisterInfo::getNumVGPRsAllowed(unsigned WaveCount) const {
  switch(WaveCount) {
   case 10: return 24;
   case 9: return 28;
   case 8: return 32;
   case 7: return 36;
   case 6: return 40;
   case 5: return 48;
   case 4: return 64;
   case 3: return 84;
   case 2: return 128;
   default: return 256;
  }
}

The VGPRs are partitioned among the waves. More waves = fewer VGPRs available to each wave.

I know that. The problem here is that "unsigned WaveCount" is always set to 10, which results in an unnecessarily low return value. Please see my previous posts.

nerdralph

sr. member

Activity: 588

Merit: 251

Quote from: zawawa on February 13, 2017, 01:57:25 PM

So it seems that LLVM imposes an arbitrary limitation on the maximum number of VGPR's per wavefront, which presumably results in premature register spills and inefficient VGPR utilization. I wonder if these pieces of codes were responsible for AMD's recent low quality drivers...

Code:

unsigned SIRegisterInfo::getNumVGPRsAllowed(unsigned WaveCount) const {
switch(WaveCount) {
case 10: return 24;
case 9: return 28;
case 8: return 32;
case 7: return 36;
case 6: return 40;
case 5: return 48;
case 4: return 64;
case 3: return 84;
case 2: return 128;
default: return 256;
}
}

The VGPRs are partitioned among the waves. More waves = fewer VGPRs available to each wave.

nerdralph

sr. member

Activity: 588

Merit: 251

Quote from: zawawa on February 13, 2017, 01:37:06 PM

Wow, this keeps getting worse. No wonder AMD drivers suck balls...

Code:

unsigned getMaxWavesPerCU() const {
if (getGeneration() >= AMDGPUSubtarget::SOUTHERN_ISLANDS)
return 10;

// FIXME: Not sure what this is for other subtagets.
return 8;
}

Meh. Nobody is really interested in maintaining pre-GCN devices.

laik2

sr. member

Activity: 652

Merit: 266

Quote from: zawawa on February 13, 2017, 02:27:53 PM

I was able to turn off VGPR spills by default.
With this change alone, it is totally worth having my own fork of LLVM, methinks.

Admirations

zawawa

sr. member

Activity: 728

Merit: 304

Miner Developer

I was able to turn off VGPR spills by default.
With this change alone, it is totally worth having my own fork of LLVM, methinks.

lupanar

sr. member

Activity: 1344

Merit: 252

Quote from: zawawa on February 13, 2017, 01:42:11 PM

Quote from: lupanar on February 13, 2017, 01:33:37 PM

Hi. There is any plans in near future to do GG miner for cuda?

I am planning to revisit optimizations for NVIDIA cards.
If OpenCL would suffice for NVIDIA cards, I won't do CUDA.
(I could, but I just don't want to duplicate my efforts.)
At this point, the only pertinent issue seems to be the long-standing CPU busy-wait bug in NVIDIA's OpenCL drivers, but this issue should be manageable.

Ok, thanks for sharing!

zawawa

sr. member

Activity: 728

Merit: 304

Miner Developer

So it seems that LLVM imposes an arbitrary limitation on the maximum number of VGPR's per wavefront, which presumably results in premature register spills and inefficient VGPR utilization. I wonder if these pieces of codes were responsible for AMD's recent low quality drivers...

Code:

unsigned SIRegisterInfo::getNumVGPRsAllowed(unsigned WaveCount) const {
switch(WaveCount) {
case 10: return 24;
case 9: return 28;
case 8: return 32;
case 7: return 36;
case 6: return 40;
case 5: return 48;
case 4: return 64;
case 3: return 84;
case 2: return 128;
default: return 256;
}
}

zawawa

sr. member

Activity: 728

Merit: 304

Miner Developer

Quote from: lupanar on February 13, 2017, 01:33:37 PM

Hi. There is any plans in near future to do GG miner for cuda?

I am planning to revisit optimizations for NVIDIA cards.
If OpenCL would suffice for NVIDIA cards, I won't do CUDA.
(I could, but I just don't want to duplicate my efforts.)
At this point, the only pertinent issue seems to be the long-standing CPU busy-wait bug in NVIDIA's OpenCL drivers, but this issue should be manageable.

zawawa

sr. member

Activity: 728

Merit: 304

Miner Developer

Wow, this keeps getting worse. No wonder AMD drivers suck balls...

Code:

unsigned getMaxWavesPerCU() const {
if (getGeneration() >= AMDGPUSubtarget::SOUTHERN_ISLANDS)
return 10;

// FIXME: Not sure what this is for other subtagets.
return 8;
}

lupanar

sr. member

Activity: 1344

Merit: 252

Hi. There is any plans in near future to do GG miner for cuda?

zawawa

sr. member

Activity: 728

Merit: 304

Miner Developer

WTF. Is this what I am thinking this is? What a shoddy job...

Code:

unsigned SIRegisterInfo::getRegPressureSetLimit(const MachineFunction &MF,
unsigned Idx) const {
const SISubtarget &STI = MF.getSubtarget();
// FIXME: We should adjust the max number of waves based on LDS size.
unsigned SGPRLimit = getNumSGPRsAllowed(STI, STI.getMaxWavesPerCU());
unsigned VGPRLimit = getNumVGPRsAllowed(STI.getMaxWavesPerCU());

unsigned VSLimit = SGPRLimit + VGPRLimit;

/**/
if (SGPRPressureSets.test(Idx) && VGPRPressureSets.test(Idx)) {
// FIXME: This is a hack. We should never be considering the pressure of
// these since no virtual register should ever have this class.
return VSLimit;
}

if (SGPRPressureSets.test(Idx))
return SGPRLimit;

return VGPRLimit;
}

zorvalth

full member

Activity: 223

Merit: 100

Quote from: UnclWish on February 13, 2017, 09:24:52 AM

zorvalth, What kind of Software you're use on the background on your screens? Wich shows Power etc.
OMG - xintensity 4620! - And this is default bat? Default bat is --xintensity 512 or --xintensity 1024.
On your miner during 1 min 20 sec allready several HW errors on each GPU.

Corsair link but you need "i" PSU from corsair. --xintensity 512 doesnt change the speed.

mo35

member

Activity: 142

Merit: 10

Quote from: UnclWish on February 13, 2017, 09:24:52 AM

zorvalth, What kind of Software you're use on the background on your screens? Wich shows Power etc.
OMG - xintensity 4620! - And this is default bat? Default bat is --xintensity 512 or --xintensity 1024.
On your miner during 1 min 20 sec allready several HW errors on each GPU.

Thats Corsair Power Link , available for i series models , RMi , HXi , AXi etc. My latest favorites due this functionality , and not mindblowing expensive RM1000i models , running few of thos , no complains so far.

UnclWish

sr. member

Activity: 1484

Merit: 253

zorvalth, What kind of Software you're use on the background on your screens? Wich shows Power etc.
OMG - xintensity 4620! - And this is default bat? Default bat is --xintensity 512 or --xintensity 1024.
On your miner during 1 min 20 sec allready several HW errors on each GPU.

toptek

legendary

Activity: 1274

Merit: 1000

IF you mean me i did say about the same mine do with CM 27 MH with a modded bios i get about 25 to 26 MH with gate less for ETH but ETH I'm not to worried about and don't have a Problem using CM's ETH miner the 1 % fee doesn't brother me like the Fee for his other miners does and I never complained about the ETH miner fee it seems reasonable, i don't use any of his other miners, i don't need to gate less or other miners match them ..all but ZEC

..I use windows because i like the Freedom i get using it; works out the box not much to set up etc .. with PC's and Use Linux on PI's ...

Topic: Gateless Gate Sharp 1.3.8: 30Mh/s (Ethash) on RX 480! - page 165. (Read 214463 times)