Author

Topic: CCminer(SP-MOD) Modded NVIDIA Maxwell / Pascal kernels. - page 870. (Read 2347659 times)

legendary
Activity: 2002
Merit: 1051
ICO? Not even once.
How much hashrate are you getting with that? And for which algos?

Personally I never even OC memory clocks as it never giving me any noticeable speed just more power consumption and instability.

Boost only mining eth for me. From something around 78MH to 87MH.
BTW not talking about memory OC, just reaching max original boost clocks.

Ah, I see.
I just checked and all of my rigs with 970s are in p0 for some reason even after 25 days of uptime while I remember when I initially played with them they all eventually switched to p2.
The other thing is that EVGA Precision reports the memory clocks being at 3000mhz while everything else (GPU-Z, nvidia inspector) shows 3500 (1750/7000).

The hashrates are fine and with +150Mhz GPU OC they boost to 1428-1491 Mhz az 1212-1225mV.
legendary
Activity: 2940
Merit: 1091
--- ChainWorks Industries ---

Hi all, could this thread be relevant for Maxwell based mining? Supposedly there is a trick to boost memclock safely using nvidia-smi.

https://devtalk.nvidia.com/default/topic/892842/cuda-programming-and-performance/one-weird-trick-to-get-a-maxwell-v2-gpu-to-reach-its-max-memory-clock-/

Christian


Holy shit, you're alive!

hahaha ... ditto that ...

pity we dont see you more often christian ...

#crysx
legendary
Activity: 1176
Merit: 1015

Boost only mining eth for me. From something around 78MH to 87MH.
BTW not talking about memory OC, just reaching max original boost clocks.

Are you saying you are mining ETH@87MH? Windows?

Yes, with 4 gpus. 3x970+1x980 making 719W at the wall. All cores overclocked and memory of all gpus at 7000MHz.
I'm not tottally offtopic because I'm using the sp-mod version of the genoil's miner  Grin

The same rig makes 76MH in quark algo, consuming 840-850W. 



Aah ok then... good luck with fighting with those amd guys...
full member
Activity: 201
Merit: 100

Boost only mining eth for me. From something around 78MH to 87MH.
BTW not talking about memory OC, just reaching max original boost clocks.

Are you saying you are mining ETH@87MH? Windows?

Yes, with 4 gpus. 3x970+1x980 making 719W at the wall. All cores overclocked and memory of all gpus at 7000MHz.
I'm not tottally offtopic because I'm using the sp-mod version of the genoil's miner  Grin

The same rig makes 76MH in quark algo, consuming 840-850W. 

legendary
Activity: 1176
Merit: 1015

Boost only mining eth for me. From something around 78MH to 87MH.
BTW not talking about memory OC, just reaching max original boost clocks.

Are you saying you are mining ETH@87MH? Windows?
full member
Activity: 201
Merit: 100
You can boost the core clock as well if you put the gpu in the p1 state.

p2 state: adjust memory
p1 state: adjust gpu boost clock

I think I will make code that does this automaticly with an option.. --Max-Boost --Max-memspeed

1. List all possible modes, parse the information and select the fastest for each gpu in the rig
2. Set p1 mode and set the max possible gpu-boost clock
3. set p2 mode and set the max possible memclock clock

The NVIDIA api is broken in x86 builds, so I will call the commandline from the c++ code in the same way I have done before. (windows only)

For 970 cards with 4pin+3pin connector, boosting is important to reach full speed. (but this costs more power)

Other cards will probobly trottle and perform worse..

(f.ex all the gtx 970 mini cards, and the 970 cards that are made small to fit in a small box.)
I can't get my 980ti to boost at all on mem.... Undecided

Sometimes I'm using nvidia inspector(from guru3d) to find out the actual state and boost the memory itself.
Maybe it can help You if You are using windows. It works on my 970 and 980.

How much hashrate are you getting with that? And for which algos?

Personally I never even OC memory clocks as it never giving me any noticeable speed just more power consumption and instability.

Boost only mining eth for me. From something around 78MH to 87MH.
BTW not talking about memory OC, just reaching max original boost clocks.
legendary
Activity: 2002
Merit: 1051
ICO? Not even once.
You can boost the core clock as well if you put the gpu in the p1 state.

p2 state: adjust memory
p1 state: adjust gpu boost clock

I think I will make code that does this automaticly with an option.. --Max-Boost --Max-memspeed

1. List all possible modes, parse the information and select the fastest for each gpu in the rig
2. Set p1 mode and set the max possible gpu-boost clock
3. set p2 mode and set the max possible memclock clock

The NVIDIA api is broken in x86 builds, so I will call the commandline from the c++ code in the same way I have done before. (windows only)

For 970 cards with 4pin+3pin connector, boosting is important to reach full speed. (but this costs more power)

Other cards will probobly trottle and perform worse..

(f.ex all the gtx 970 mini cards, and the 970 cards that are made small to fit in a small box.)
I can't get my 980ti to boost at all on mem.... Undecided

Sometimes I'm using nvidia inspector(from guru3d) to find out the actual state and boost the memory itself.
Maybe it can help You if You are using windows. It works on my 970 and 980.

How much hashrate are you getting with that? And for which algos?

Personally I never even OC memory clocks as it never giving me any noticeable speed just more power consumption and instability.
legendary
Activity: 1797
Merit: 1028

Hi all, could this thread be relevant for Maxwell based mining? Supposedly there is a trick to boost memclock safely using nvidia-smi.

https://devtalk.nvidia.com/default/topic/892842/cuda-programming-and-performance/one-weird-trick-to-get-a-maxwell-v2-gpu-to-reach-its-max-memory-clock-/

Christian


INTEL PROCESSOR BOARD--

Christian, just before you lessened your presence on the CudaMiner thread, you posted about your purchase of an Intel processor board (Xeon Phi), and that you wanted to explore programming for it.  Is there a thread where you posted results?  Did you program any mining application for the board?

Really curious about the results...       --scryptr
legendary
Activity: 1797
Merit: 1028
You can boost the core clock as well if you put the gpu in the p1 state.

p2 state: adjust memory
p1 state: adjust gpu boost clock

I think I will make code that does this automaticly with an option.. --Max-Boost --Max-memspeed

1. List all possible modes, parse the information and select the fastest for each gpu in the rig
2. Set p1 mode and set the max possible gpu-boost clock
3. set p2 mode and set the max possible memclock clock

The NVIDIA api is broken in x86 builds, so I will call the commandline from the c++ code in the same way I have done before. (windows only)

For 970 cards with 4pin+3pin connector, boosting is important to reach full speed. (but this costs more power)

Other cards will probobly trottle and perform worse..

(f.ex all the gtx 970 mini cards, and the 970 cards that are made small to fit in a small box.)
I can't get my 980ti to boost at all on mem.... Undecided

Sometimes I'm using nvidia inspector(from guru3d) to find out the actual state and boost the memory itself.
Maybe it can help You if You are using windows. It works on my 970 and 980.
Thx....I'll give it a try.

NVIDIA INSPECTOR--

It works on 6 cards, and allows setting clocks and fans in Windows.  It was written by Orbmu2k, if you google his name, you will find his programming.       --scryptr
legendary
Activity: 1764
Merit: 1024

Hi all, could this thread be relevant for Maxwell based mining? Supposedly there is a trick to boost memclock safely using nvidia-smi.

https://devtalk.nvidia.com/default/topic/892842/cuda-programming-and-performance/one-weird-trick-to-get-a-maxwell-v2-gpu-to-reach-its-max-memory-clock-/

Christian


I tried it some months ago on a 970 and it worked, but you loose the auto clock tuning based on temperature and fan speed, which you may or may not like ;-)
I personally like it and ended up with better hashrate using the default settings than changing it with nvidia-smi, but YMMV.

Keep in mind there are only a handful of algos that get a boost out of memory speed... Juicing your memory on a 970 adds like a extra 20w~.

And the altocoins are up, and the mining profit meassured in BTC is up..
I have twice the hash power now and making the same amount.  Undecided

Yeah they're still playing with BTC... probably will be a week or two before altcoins rebound, depending on how long they mess around with BTC. All hands are on deck for BTC right now.
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
I looked into this a for a bit. I rewrote it like this:
block size == 128
lower byte of FNV_PRIME is 147, not 193. (0x01000193 & 0xFF = 0x93 == 147)

Then the precalc table would have to be 0x193 * 4 bytes big.
It doesn't work. But more importantly, your suggestion of only requiring the lower byte from the DAG entry seems wrong, since that's param b in the fnv function...

I haven't started on my mod yet. But it wlll come later.. I can remove many isntructions, but I cannot remove the memory latency..

legendary
Activity: 3164
Merit: 1003
You can boost the core clock as well if you put the gpu in the p1 state.

p2 state: adjust memory
p1 state: adjust gpu boost clock

I think I will make code that does this automaticly with an option.. --Max-Boost --Max-memspeed

1. List all possible modes, parse the information and select the fastest for each gpu in the rig
2. Set p1 mode and set the max possible gpu-boost clock
3. set p2 mode and set the max possible memclock clock

The NVIDIA api is broken in x86 builds, so I will call the commandline from the c++ code in the same way I have done before. (windows only)

For 970 cards with 4pin+3pin connector, boosting is important to reach full speed. (but this costs more power)

Other cards will probobly trottle and perform worse..

(f.ex all the gtx 970 mini cards, and the 970 cards that are made small to fit in a small box.)
I can't get my 980ti to boost at all on mem.... Undecided

Sometimes I'm using nvidia inspector(from guru3d) to find out the actual state and boost the memory itself.
Maybe it can help You if You are using windows. It works on my 970 and 980.
Thx....I'll give it a try.
full member
Activity: 201
Merit: 100
You can boost the core clock as well if you put the gpu in the p1 state.

p2 state: adjust memory
p1 state: adjust gpu boost clock

I think I will make code that does this automaticly with an option.. --Max-Boost --Max-memspeed

1. List all possible modes, parse the information and select the fastest for each gpu in the rig
2. Set p1 mode and set the max possible gpu-boost clock
3. set p2 mode and set the max possible memclock clock

The NVIDIA api is broken in x86 builds, so I will call the commandline from the c++ code in the same way I have done before. (windows only)

For 970 cards with 4pin+3pin connector, boosting is important to reach full speed. (but this costs more power)

Other cards will probobly trottle and perform worse..

(f.ex all the gtx 970 mini cards, and the 970 cards that are made small to fit in a small box.)
I can't get my 980ti to boost at all on mem.... Undecided

Sometimes I'm using nvidia inspector(from guru3d) to find out the actual state and boost the memory itself.
Maybe it can help You if You are using windows. It works on my 970 and 980.
legendary
Activity: 3164
Merit: 1003
The EVGA_PrecisionX_16 does support more than 3 cards.....

MAXIMUM OF FOUR CARDS--

It only supports four at the most.  I had to remove EVGA PrecisionX 16 from my rig when I added a fifth card, it complained on boot-up.       --scryptr
Ok ...scryptr what are you using in place of it.   Thx
EDIT: It was complaining about the 4th card though.... that's what I have in there.

legendary
Activity: 1797
Merit: 1028
The EVGA_PrecisionX_16 does support more than 3 cards.....

MAXIMUM OF FOUR CARDS--

It only supports four at the most.  I had to remove EVGA PrecisionX 16 from my rig when I added a fifth card, it complained on boot-up.       --scryptr
sr. member
Activity: 438
Merit: 250
Thanks Smiley
I am looking at the Etherum miner now. I have some improvements.
Very curious what you come up with. I hope you can challenge me to look at the code once again, Kind of lost interest with the whole TLB trashing thing going on on Windows.

in the dagger.cuh:

__device__ uint4 fnv4(uint4 a, uint4 b)
{
   uint4 c;
   c.x = a.x * FNV_PRIME ^ b.x;
   c.y = a.y * FNV_PRIME ^ b.y;
   c.z = a.z * FNV_PRIME ^ b.z;
   c.w = a.w * FNV_PRIME ^ b.w;
   return c;
}


Since a.x*2^24= a.x<<24


This can be rewritten to:

__device__ uint4 fnv4(uint4 a, uint4 b)
{
   c.x = sharedmemprecalc[a.x&0xff]^ b.x;
   c.y = sharedmemprecalc[a.y&0xff] ^ b.y;
   c.z = sharedmemprecalc[a.z&0xff] ^ b.z;
   c.w = sharedmemprecalc[a.w&0xff] ^ b.w;
   return c;
}

The precalcbuffer must be 32bit  (256*4 bytes) and the  values shifted by 24 bits (shared mem level1cache):

xx000000

Code:
		__shared__ uint32_t sharedmemprecalc[256 * 4];
for (int i = 0; i<256; i++)
{
sharedmemprecalc[i] = (193 * i) << 24;  // Since the FNV_PRIME is a high number the 24 highest bits of the product are ignored. We only need to know the 8 low bits.
}





since you ony need to read 1 byte and not the whole 4 bytes (32 bits), you might be able to solve it with 1/4th of the memory reads...

__device__ uint4 fnv4(uchar4 a, uint4 b)
{
   c.x = sharedmemprecalc[a.x]^ b.x;
   c.y = sharedmemprecalc[a.y] ^ b.y;
   c.z = sharedmemprecalc[a.z] ^ b.z;
   c.w = sharedmemprecalc[a.w] ^ b.w;
   return c;
}

But you might have to reorganize /scramble the memory. and read 32 bit lineary in one read to fill the uchar4.



I looked into this a for a bit. I rewrote it like this:

block size == 128
lower byte of FNV_PRIME is 147, not 193. (0x01000193 & 0xFF = 0x93 == 147)

Code:
__shared__ uint32_t sharedmemprecalc[256];

In compute_hash_shuffle:

sharedmemprecalc[threadIdx.x] = (147 * threadIdx.x) << 24; 
sharedmemprecalc[threadIdx.x + 128] = (147 * (threadIdx.x + 128)) << 24;
__syncthreads();

And this unmodified:

__device__ uint4 fnv4s(uint4 a, uint4 b)
{
uint4 c;
c.x = sharedmemprecalc[a.x & 0xff] ^ b.x;
c.y = sharedmemprecalc[a.y & 0xff] ^ b.y;
c.z = sharedmemprecalc[a.z & 0xff] ^ b.z;
c.w = sharedmemprecalc[a.w & 0xff] ^ b.w;
return c;
}

It doesn't work. But more importantly, your suggestion of only requiring the lower byte from the DAG entry seems wrong, since that's param b in the fnv function...

legendary
Activity: 3164
Merit: 1003
The EVGA_PrecisionX_16 does support more than 3 cards.....
legendary
Activity: 3164
Merit: 1003
And the altocoins are up, and the mining profit meassured in BTC is up..
I have twice the hash power now and making the same amount.  Undecided
legendary
Activity: 3164
Merit: 1003

Hi all, could this thread be relevant for Maxwell based mining? Supposedly there is a trick to boost memclock safely using nvidia-smi.

https://devtalk.nvidia.com/default/topic/892842/cuda-programming-and-performance/one-weird-trick-to-get-a-maxwell-v2-gpu-to-reach-its-max-memory-clock-/

Christian

Hi Christian, how you been?.... Thank you for the info.
legendary
Activity: 3164
Merit: 1003
You can boost the core clock as well if you put the gpu in the p1 state.

p2 state: adjust memory
p1 state: adjust gpu boost clock

I think I will make code that does this automaticly with an option.. --Max-Boost --Max-memspeed

1. List all possible modes, parse the information and select the fastest for each gpu in the rig
2. Set p1 mode and set the max possible gpu-boost clock
3. set p2 mode and set the max possible memclock clock

The NVIDIA api is broken in x86 builds, so I will call the commandline from the c++ code in the same way I have done before. (windows only)

For 970 cards with 4pin+3pin connector, boosting is important to reach full speed. (but this costs more power)

Other cards will probobly trottle and perform worse..

(f.ex all the gtx 970 mini cards, and the 970 cards that are made small to fit in a small box.)
I can't get my 980ti to boost at all on mem.... Undecided
Jump to: