Author

Topic: Gateless Gate Sharp 1.3.8: 30Mh/s (Ethash) on RX 480! - page 165. (Read 214410 times)

sr. member
Activity: 728
Merit: 304
Miner Developer
Where are you zawawa  Shocked

Oh, I'm right here. I was really focused on programming.
This LLVM thing is taking way too long, so I might take a break and work on the previous assembly version instead.
full member
Activity: 254
Merit: 100
Where are you zawawa  Shocked
sr. member
Activity: 728
Merit: 304
Miner Developer
I can now see how sloppy optimizations of AMD's drivers are...
They should be more aggressive with alloca promotion.

Code:
  // TODO: Have some sort of hint or other heuristics to guess occupancy based
  // on other factors..
  unsigned OccupancyHint
    = AMDGPU::getIntegerAttribute(F, "amdgpu-max-waves-per-eu", 0);
  if (OccupancyHint == 0)
    OccupancyHint = 7;

Code:
  // FIXME: There is no reason why we can't support larger arrays, we
  // are just being conservative for now.
  if (!AllocaTy ||
      AllocaTy->getElementType()->isVectorTy() ||
      AllocaTy->getNumElements() > 4 ||
      AllocaTy->getNumElements() < 2) {
    DEBUG(dbgs() << "  Cannot convert type to vector\n");
    return false;
  }
sr. member
Activity: 652
Merit: 266
Use that profile for ETH mining
Quote
  "profiles": [
    {
      "name": "eth",
      "algorithm": "ethash",
      "xintensity": "1024",
      "worksize": "64",
      "gpu-threads": "1"
    }
sr. member
Activity: 728
Merit: 304
Miner Developer
Oh well. At least I don't have to suffer those pretty mysterious register spills any more.
It's quite scary to think about the possibility that this code might be running on hundreds of thousands of devices, though.
sr. member
Activity: 728
Merit: 304
Miner Developer
So it seems that LLVM imposes an arbitrary limitation on the maximum number of VGPR's per wavefront, which presumably results in premature register spills and inefficient VGPR utilization. I wonder if these pieces of codes were responsible for AMD's recent low quality drivers...

Code:
unsigned SIRegisterInfo::getNumVGPRsAllowed(unsigned WaveCount) const {
  switch(WaveCount) {
    case 10: return 24;
    case 9:  return 28;
    case 8:  return 32;
    case 7:  return 36;
    case 6:  return 40;
    case 5:  return 48;
    case 4:  return 64;
    case 3:  return 84;
    case 2:  return 128;
    default: return 256;
  }
}

The VGPRs are partitioned among the waves.  More waves = fewer VGPRs available to each wave.


I know that. The problem here is that "unsigned WaveCount" is always set to 10, which results in an unnecessarily low return value. Please see my previous posts.
sr. member
Activity: 588
Merit: 251
So it seems that LLVM imposes an arbitrary limitation on the maximum number of VGPR's per wavefront, which presumably results in premature register spills and inefficient VGPR utilization. I wonder if these pieces of codes were responsible for AMD's recent low quality drivers...

Code:
unsigned SIRegisterInfo::getNumVGPRsAllowed(unsigned WaveCount) const {
  switch(WaveCount) {
    case 10: return 24;
    case 9:  return 28;
    case 8:  return 32;
    case 7:  return 36;
    case 6:  return 40;
    case 5:  return 48;
    case 4:  return 64;
    case 3:  return 84;
    case 2:  return 128;
    default: return 256;
  }
}

The VGPRs are partitioned among the waves.  More waves = fewer VGPRs available to each wave.
sr. member
Activity: 588
Merit: 251
Wow, this keeps getting worse. No wonder AMD drivers suck balls...

Code:
  unsigned getMaxWavesPerCU() const {
    if (getGeneration() >= AMDGPUSubtarget::SOUTHERN_ISLANDS)
      return 10;

    // FIXME: Not sure what this is for other subtagets.
    return 8;
  }

Meh.  Nobody is really interested in maintaining pre-GCN devices.
sr. member
Activity: 652
Merit: 266
I was able to turn off VGPR spills by default.
With this change alone, it is totally worth having my own fork of LLVM, methinks.

Admirations Smiley
sr. member
Activity: 728
Merit: 304
Miner Developer
I was able to turn off VGPR spills by default.
With this change alone, it is totally worth having my own fork of LLVM, methinks.
sr. member
Activity: 1344
Merit: 252
Hi. There is any plans in near future to do GG miner for cuda?

I am planning to revisit optimizations for NVIDIA cards.
If OpenCL would suffice for NVIDIA cards, I won't do CUDA.
(I could, but I just don't want to duplicate my efforts.)
At this point, the only pertinent issue seems to be the long-standing CPU busy-wait bug in NVIDIA's OpenCL drivers, but this issue should be manageable.
Ok, thanks for sharing!
sr. member
Activity: 728
Merit: 304
Miner Developer
So it seems that LLVM imposes an arbitrary limitation on the maximum number of VGPR's per wavefront, which presumably results in premature register spills and inefficient VGPR utilization. I wonder if these pieces of codes were responsible for AMD's recent low quality drivers...

Code:
unsigned SIRegisterInfo::getNumVGPRsAllowed(unsigned WaveCount) const {
  switch(WaveCount) {
    case 10: return 24;
    case 9:  return 28;
    case 8:  return 32;
    case 7:  return 36;
    case 6:  return 40;
    case 5:  return 48;
    case 4:  return 64;
    case 3:  return 84;
    case 2:  return 128;
    default: return 256;
  }
}
sr. member
Activity: 728
Merit: 304
Miner Developer
Hi. There is any plans in near future to do GG miner for cuda?

I am planning to revisit optimizations for NVIDIA cards.
If OpenCL would suffice for NVIDIA cards, I won't do CUDA.
(I could, but I just don't want to duplicate my efforts.)
At this point, the only pertinent issue seems to be the long-standing CPU busy-wait bug in NVIDIA's OpenCL drivers, but this issue should be manageable.
sr. member
Activity: 728
Merit: 304
Miner Developer
Wow, this keeps getting worse. No wonder AMD drivers suck balls...

Code:
  unsigned getMaxWavesPerCU() const {
    if (getGeneration() >= AMDGPUSubtarget::SOUTHERN_ISLANDS)
      return 10;

    // FIXME: Not sure what this is for other subtagets.
    return 8;
  }
sr. member
Activity: 1344
Merit: 252
Hi. There is any plans in near future to do GG miner for cuda?
sr. member
Activity: 728
Merit: 304
Miner Developer
WTF. Is this what I am thinking this is? What a shoddy job...

Code:
unsigned SIRegisterInfo::getRegPressureSetLimit(const MachineFunction &MF,
                                                unsigned Idx) const {
  const SISubtarget &STI = MF.getSubtarget();
  // FIXME: We should adjust the max number of waves based on LDS size.
  unsigned SGPRLimit = getNumSGPRsAllowed(STI, STI.getMaxWavesPerCU());
  unsigned VGPRLimit = getNumVGPRsAllowed(STI.getMaxWavesPerCU());

  unsigned VSLimit = SGPRLimit + VGPRLimit;

  /**/
  if (SGPRPressureSets.test(Idx) && VGPRPressureSets.test(Idx)) {
    // FIXME: This is a hack. We should never be considering the pressure of
    // these since no virtual register should ever have this class.
    return VSLimit;
  }

  if (SGPRPressureSets.test(Idx))
    return SGPRLimit;

  return VGPRLimit;
}
full member
Activity: 223
Merit: 100
zorvalth, What kind of Software you're use on the background on your screens? Wich shows Power etc.
OMG - xintensity 4620! - And this is default bat? Default bat is --xintensity 512 or --xintensity 1024.
On your miner during 1 min 20 sec allready several HW errors on each GPU.

Corsair link but you need "i" PSU from corsair. --xintensity 512 doesnt change the speed.
member
Activity: 142
Merit: 10
zorvalth, What kind of Software you're use on the background on your screens? Wich shows Power etc.
OMG - xintensity 4620! - And this is default bat? Default bat is --xintensity 512 or --xintensity 1024.
On your miner during 1 min 20 sec allready several HW errors on each GPU.

Thats Corsair Power Link , available for i series models , RMi , HXi , AXi etc. My latest favorites due this functionality , and not mindblowing expensive RM1000i models , running few of thos , no complains so far.
sr. member
Activity: 1484
Merit: 253
zorvalth, What kind of Software you're use on the background on your screens? Wich shows Power etc.
OMG - xintensity 4620! - And this is default bat? Default bat is --xintensity 512 or --xintensity 1024.
On your miner during 1 min 20 sec allready several HW errors on each GPU.
legendary
Activity: 1274
Merit: 1000
IF you mean me i did say about the same  mine do with CM  27 MH with a modded bios i get about 25 to 26 MH with gate less for ETH but ETH I'm not to worried about and don't have a Problem using CM's ETH miner the 1 % fee doesn't brother me like the Fee for his other miners does and I never complained about the ETH miner fee it seems reasonable, i don't use any of his other miners, i don't need to gate less or other miners match them ..all but ZEC Smiley ..I use windows because i like the Freedom i get using it; works out the box not much to set up etc .. with PC's  and Use Linux on PI's ...
Jump to: