CCminer(SP-MOD) Modded NVIDIA Maxwell / Pascal kernels. - page 870.

bathrobehero

legendary

Activity: 2002

Merit: 1051

ICO? Not even once.

Quote from: dominuspro on November 05, 2015, 02:47:02 PM

Quote from: bathrobehero on November 05, 2015, 02:31:20 PM

How much hashrate are you getting with that? And for which algos?

Personally I never even OC memory clocks as it never giving me any noticeable speed just more power consumption and instability.

Boost only mining eth for me. From something around 78MH to 87MH.
BTW not talking about memory OC, just reaching max original boost clocks.

Ah, I see.
I just checked and all of my rigs with 970s are in p0 for some reason even after 25 days of uptime while I remember when I initially played with them they all eventually switched to p2.
The other thing is that EVGA Precision reports the memory clocks being at 3000mhz while everything else (GPU-Z, nvidia inspector) shows 3500 (1750/7000).

The hashrates are fine and with +150Mhz GPU OC they boost to 1428-1491 Mhz az 1212-1225mV.

chrysophylax

legendary

Activity: 2940

Merit: 1091

--- ChainWorks Industries ---

Quote from: ?? on ??

Quote from: cbuchner1 on November 05, 2015, 04:47:24 AM

Hi all, could this thread be relevant for Maxwell based mining? Supposedly there is a trick to boost memclock safely using nvidia-smi.

https://devtalk.nvidia.com/default/topic/892842/cuda-programming-and-performance/one-weird-trick-to-get-a-maxwell-v2-gpu-to-reach-its-max-memory-clock-/

Christian

Holy shit, you're alive!

hahaha ... ditto that ...

pity we dont see you more often christian ...

#crysx

antantti

legendary

Activity: 1176

Merit: 1015

Quote from: dominuspro on November 05, 2015, 03:49:32 PM

Quote from: antantti on November 05, 2015, 03:34:27 PM

Quote from: dominuspro on November 05, 2015, 02:47:02 PM

Boost only mining eth for me. From something around 78MH to 87MH.
BTW not talking about memory OC, just reaching max original boost clocks.

Are you saying you are mining ETH@87MH? Windows?

Yes, with 4 gpus. 3x970+1x980 making 719W at the wall. All cores overclocked and memory of all gpus at 7000MHz.
I'm not tottally offtopic because I'm using the sp-mod version of the genoil's miner Grin

The same rig makes 76MH in quark algo, consuming 840-850W.

Aah ok then... good luck with fighting with those amd guys...

dominuspro

full member

Activity: 201

Merit: 100

Quote from: antantti on November 05, 2015, 03:34:27 PM

Quote from: dominuspro on November 05, 2015, 02:47:02 PM

Boost only mining eth for me. From something around 78MH to 87MH.
BTW not talking about memory OC, just reaching max original boost clocks.

Are you saying you are mining ETH@87MH? Windows?

Yes, with 4 gpus. 3x970+1x980 making 719W at the wall. All cores overclocked and memory of all gpus at 7000MHz.
I'm not tottally offtopic because I'm using the sp-mod version of the genoil's miner Grin

The same rig makes 76MH in quark algo, consuming 840-850W.

antantti

legendary

Activity: 1176

Merit: 1015

Quote from: dominuspro on November 05, 2015, 02:47:02 PM

Boost only mining eth for me. From something around 78MH to 87MH.
BTW not talking about memory OC, just reaching max original boost clocks.

Are you saying you are mining ETH@87MH? Windows?

dominuspro

full member

Activity: 201

Merit: 100

Quote from: bathrobehero on November 05, 2015, 02:31:20 PM

Quote from: dominuspro on November 05, 2015, 12:02:55 PM

Quote from: tbearhere on November 05, 2015, 10:41:59 AM

Quote from: sp_ on November 05, 2015, 06:03:43 AM

You can boost the core clock as well if you put the gpu in the p1 state.

p2 state: adjust memory
p1 state: adjust gpu boost clock

I think I will make code that does this automaticly with an option.. --Max-Boost --Max-memspeed

1. List all possible modes, parse the information and select the fastest for each gpu in the rig
2. Set p1 mode and set the max possible gpu-boost clock
3. set p2 mode and set the max possible memclock clock

The NVIDIA api is broken in x86 builds, so I will call the commandline from the c++ code in the same way I have done before. (windows only)

For 970 cards with 4pin+3pin connector, boosting is important to reach full speed. (but this costs more power)

Other cards will probobly trottle and perform worse..

(f.ex all the gtx 970 mini cards, and the 970 cards that are made small to fit in a small box.)

I can't get my 980ti to boost at all on mem.... Undecided

Sometimes I'm using nvidia inspector(from guru3d) to find out the actual state and boost the memory itself.
Maybe it can help You if You are using windows. It works on my 970 and 980.

How much hashrate are you getting with that? And for which algos?

Personally I never even OC memory clocks as it never giving me any noticeable speed just more power consumption and instability.

Boost only mining eth for me. From something around 78MH to 87MH.
BTW not talking about memory OC, just reaching max original boost clocks.

bathrobehero

legendary

Activity: 2002

Merit: 1051

ICO? Not even once.

Quote from: dominuspro on November 05, 2015, 12:02:55 PM

Quote from: tbearhere on November 05, 2015, 10:41:59 AM

Quote from: sp_ on November 05, 2015, 06:03:43 AM

You can boost the core clock as well if you put the gpu in the p1 state.

p2 state: adjust memory
p1 state: adjust gpu boost clock

I think I will make code that does this automaticly with an option.. --Max-Boost --Max-memspeed

1. List all possible modes, parse the information and select the fastest for each gpu in the rig
2. Set p1 mode and set the max possible gpu-boost clock
3. set p2 mode and set the max possible memclock clock

The NVIDIA api is broken in x86 builds, so I will call the commandline from the c++ code in the same way I have done before. (windows only)

For 970 cards with 4pin+3pin connector, boosting is important to reach full speed. (but this costs more power)

Other cards will probobly trottle and perform worse..

(f.ex all the gtx 970 mini cards, and the 970 cards that are made small to fit in a small box.)

I can't get my 980ti to boost at all on mem.... Undecided

Sometimes I'm using nvidia inspector(from guru3d) to find out the actual state and boost the memory itself.
Maybe it can help You if You are using windows. It works on my 970 and 980.

How much hashrate are you getting with that? And for which algos?

Personally I never even OC memory clocks as it never giving me any noticeable speed just more power consumption and instability.

scryptr

legendary

Activity: 1797

Merit: 1028

Quote from: cbuchner1 on November 05, 2015, 04:47:24 AM

Hi all, could this thread be relevant for Maxwell based mining? Supposedly there is a trick to boost memclock safely using nvidia-smi.

https://devtalk.nvidia.com/default/topic/892842/cuda-programming-and-performance/one-weird-trick-to-get-a-maxwell-v2-gpu-to-reach-its-max-memory-clock-/

Christian

INTEL PROCESSOR BOARD--

Christian, just before you lessened your presence on the CudaMiner thread, you posted about your purchase of an Intel processor board (Xeon Phi), and that you wanted to explore programming for it. Is there a thread where you posted results? Did you program any mining application for the board?

Really curious about the results... --scryptr

scryptr

legendary

Activity: 1797

Merit: 1028

Quote from: tbearhere on November 05, 2015, 12:19:23 PM

Quote from: dominuspro on November 05, 2015, 12:02:55 PM

Quote from: tbearhere on November 05, 2015, 10:41:59 AM

Quote from: sp_ on November 05, 2015, 06:03:43 AM

You can boost the core clock as well if you put the gpu in the p1 state.

p2 state: adjust memory
p1 state: adjust gpu boost clock

I think I will make code that does this automaticly with an option.. --Max-Boost --Max-memspeed

1. List all possible modes, parse the information and select the fastest for each gpu in the rig
2. Set p1 mode and set the max possible gpu-boost clock
3. set p2 mode and set the max possible memclock clock

The NVIDIA api is broken in x86 builds, so I will call the commandline from the c++ code in the same way I have done before. (windows only)

For 970 cards with 4pin+3pin connector, boosting is important to reach full speed. (but this costs more power)

Other cards will probobly trottle and perform worse..

(f.ex all the gtx 970 mini cards, and the 970 cards that are made small to fit in a small box.)

I can't get my 980ti to boost at all on mem.... Undecided

Sometimes I'm using nvidia inspector(from guru3d) to find out the actual state and boost the memory itself.
Maybe it can help You if You are using windows. It works on my 970 and 980.

Thx....I'll give it a try.

NVIDIA INSPECTOR--

It works on 6 cards, and allows setting clocks and fans in Windows. It was written by Orbmu2k, if you google his name, you will find his programming. --scryptr

bensam1231

legendary

Activity: 1764

Merit: 1024

Quote from: pallas on November 05, 2015, 05:43:35 AM

Quote from: cbuchner1 on November 05, 2015, 04:47:24 AM

Hi all, could this thread be relevant for Maxwell based mining? Supposedly there is a trick to boost memclock safely using nvidia-smi.

https://devtalk.nvidia.com/default/topic/892842/cuda-programming-and-performance/one-weird-trick-to-get-a-maxwell-v2-gpu-to-reach-its-max-memory-clock-/

Christian

I tried it some months ago on a 970 and it worked, but you loose the auto clock tuning based on temperature and fan speed, which you may or may not like ;-)
I personally like it and ended up with better hashrate using the default settings than changing it with nvidia-smi, but YMMV.

Keep in mind there are only a handful of algos that get a boost out of memory speed... Juicing your memory on a 970 adds like a extra 20w~.

Quote from: tbearhere on November 05, 2015, 10:46:14 AM

Quote from: sp_ on November 05, 2015, 04:21:04 AM

And the altocoins are up, and the mining profit meassured in BTC is up..

I have twice the hash power now and making the same amount. Undecided

Yeah they're still playing with BTC... probably will be a week or two before altcoins rebound, depending on how long they mess around with BTC. All hands are on deck for BTC right now.

sp_

legendary

Activity: 2954

Merit: 1087

Team Black developer

Quote from: Genoil on November 05, 2015, 11:12:22 AM

I looked into this a for a bit. I rewrote it like this:
block size == 128
lower byte of FNV_PRIME is 147, not 193. (0x01000193 & 0xFF = 0x93 == 147)

Then the precalc table would have to be 0x193 * 4 bytes big.

Quote from: Genoil on November 05, 2015, 11:12:22 AM

It doesn't work. But more importantly, your suggestion of only requiring the lower byte from the DAG entry seems wrong, since that's param b in the fnv function...

I haven't started on my mod yet. But it wlll come later.. I can remove many isntructions, but I cannot remove the memory latency..

tbearhere

legendary

Activity: 3164

Merit: 1003

Quote from: dominuspro on November 05, 2015, 12:02:55 PM

Quote from: tbearhere on November 05, 2015, 10:41:59 AM

Quote from: sp_ on November 05, 2015, 06:03:43 AM

You can boost the core clock as well if you put the gpu in the p1 state.

p2 state: adjust memory
p1 state: adjust gpu boost clock

I think I will make code that does this automaticly with an option.. --Max-Boost --Max-memspeed

1. List all possible modes, parse the information and select the fastest for each gpu in the rig
2. Set p1 mode and set the max possible gpu-boost clock
3. set p2 mode and set the max possible memclock clock

The NVIDIA api is broken in x86 builds, so I will call the commandline from the c++ code in the same way I have done before. (windows only)

For 970 cards with 4pin+3pin connector, boosting is important to reach full speed. (but this costs more power)

Other cards will probobly trottle and perform worse..

(f.ex all the gtx 970 mini cards, and the 970 cards that are made small to fit in a small box.)

I can't get my 980ti to boost at all on mem.... Undecided

Sometimes I'm using nvidia inspector(from guru3d) to find out the actual state and boost the memory itself.
Maybe it can help You if You are using windows. It works on my 970 and 980.

Thx....I'll give it a try.

dominuspro

full member

Activity: 201

Merit: 100

Quote from: tbearhere on November 05, 2015, 10:41:59 AM

Quote from: sp_ on November 05, 2015, 06:03:43 AM

You can boost the core clock as well if you put the gpu in the p1 state.

p2 state: adjust memory
p1 state: adjust gpu boost clock

I think I will make code that does this automaticly with an option.. --Max-Boost --Max-memspeed

1. List all possible modes, parse the information and select the fastest for each gpu in the rig
2. Set p1 mode and set the max possible gpu-boost clock
3. set p2 mode and set the max possible memclock clock

The NVIDIA api is broken in x86 builds, so I will call the commandline from the c++ code in the same way I have done before. (windows only)

For 970 cards with 4pin+3pin connector, boosting is important to reach full speed. (but this costs more power)

Other cards will probobly trottle and perform worse..

(f.ex all the gtx 970 mini cards, and the 970 cards that are made small to fit in a small box.)

I can't get my 980ti to boost at all on mem.... Undecided

Sometimes I'm using nvidia inspector(from guru3d) to find out the actual state and boost the memory itself.
Maybe it can help You if You are using windows. It works on my 970 and 980.

tbearhere

legendary

Activity: 3164

Merit: 1003

Quote from: scryptr on November 05, 2015, 11:19:23 AM

Quote from: tbearhere on November 05, 2015, 11:02:26 AM

The EVGA_PrecisionX_16 does support more than 3 cards.....

MAXIMUM OF FOUR CARDS--

It only supports four at the most. I had to remove EVGA PrecisionX 16 from my rig when I added a fifth card, it complained on boot-up. --scryptr

Ok ...scryptr what are you using in place of it. Thx
EDIT: It was complaining about the 4th card though.... that's what I have in there.

scryptr

legendary

Activity: 1797

Merit: 1028

Quote from: tbearhere on November 05, 2015, 11:02:26 AM

The EVGA_PrecisionX_16 does support more than 3 cards.....

MAXIMUM OF FOUR CARDS--

It only supports four at the most. I had to remove EVGA PrecisionX 16 from my rig when I added a fifth card, it complained on boot-up. --scryptr

Genoil

sr. member

Activity: 438

Merit: 250

Quote from: sp_ on November 03, 2015, 11:39:42 AM

Quote from: Genoil on November 03, 2015, 11:25:10 AM

Quote from: sp_ on November 03, 2015, 10:06:45 AM

Thanks

I am looking at the Etherum miner now. I have some improvements.

Very curious what you come up with. I hope you can challenge me to look at the code once again, Kind of lost interest with the whole TLB trashing thing going on on Windows.

in the dagger.cuh:

__device__ uint4 fnv4(uint4 a, uint4 b)
{
   uint4 c;
   c.x = a.x * FNV_PRIME ^ b.x;
   c.y = a.y * FNV_PRIME ^ b.y;
   c.z = a.z * FNV_PRIME ^ b.z;
   c.w = a.w * FNV_PRIME ^ b.w;
   return c;
}

Since a.x*2^24= a.x<<24

This can be rewritten to:

__device__ uint4 fnv4(uint4 a, uint4 b)
{
   c.x = sharedmemprecalc[a.x&0xff]^ b.x;
   c.y = sharedmemprecalc[a.y&0xff] ^ b.y;
   c.z = sharedmemprecalc[a.z&0xff] ^ b.z;
   c.w = sharedmemprecalc[a.w&0xff] ^ b.w;
   return c;
}

The precalcbuffer must be 32bit (256*4 bytes) and the values shifted by 24 bits (shared mem level1cache):

xx000000

Code:

		__shared__ uint32_t sharedmemprecalc[256 * 4];
		for (int i = 0; i<256; i++)
		{
			sharedmemprecalc[i] = (193 * i) << 24;  // Since the FNV_PRIME is a high number the 24 highest bits of the product are ignored. We only need to know the 8 low bits.
		}

since you ony need to read 1 byte and not the whole 4 bytes (32 bits), you might be able to solve it with 1/4th of the memory reads...

__device__ uint4 fnv4(uchar4 a, uint4 b)
{
c.x = sharedmemprecalc[a.x]^ b.x;
c.y = sharedmemprecalc[a.y] ^ b.y;
c.z = sharedmemprecalc[a.z] ^ b.z;
c.w = sharedmemprecalc[a.w] ^ b.w;
return c;
}

But you might have to reorganize /scramble the memory. and read 32 bit lineary in one read to fill the uchar4.

I looked into this a for a bit. I rewrote it like this:

block size == 128
lower byte of FNV_PRIME is 147, not 193. (0x01000193 & 0xFF = 0x93 == 147)

Code:

__shared__ uint32_t sharedmemprecalc[256];

In compute_hash_shuffle:

sharedmemprecalc[threadIdx.x] = (147 * threadIdx.x) << 24;  
sharedmemprecalc[threadIdx.x + 128] = (147 * (threadIdx.x + 128)) << 24;
__syncthreads();

And this unmodified:

__device__ uint4 fnv4s(uint4 a, uint4 b)
{
	uint4 c;
	c.x = sharedmemprecalc[a.x & 0xff] ^ b.x;
	c.y = sharedmemprecalc[a.y & 0xff] ^ b.y;
	c.z = sharedmemprecalc[a.z & 0xff] ^ b.z;
	c.w = sharedmemprecalc[a.w & 0xff] ^ b.w;
	return c;
}

It doesn't work. But more importantly, your suggestion of only requiring the lower byte from the DAG entry seems wrong, since that's param b in the fnv function...

tbearhere

legendary

Activity: 3164

Merit: 1003

The EVGA_PrecisionX_16 does support more than 3 cards.....

tbearhere

legendary

Activity: 3164

Merit: 1003

Quote from: sp_ on November 05, 2015, 04:21:04 AM

And the altocoins are up, and the mining profit meassured in BTC is up..

I have twice the hash power now and making the same amount. Undecided

tbearhere

legendary

Activity: 3164

Merit: 1003

Quote from: cbuchner1 on November 05, 2015, 04:47:24 AM

Hi all, could this thread be relevant for Maxwell based mining? Supposedly there is a trick to boost memclock safely using nvidia-smi.

https://devtalk.nvidia.com/default/topic/892842/cuda-programming-and-performance/one-weird-trick-to-get-a-maxwell-v2-gpu-to-reach-its-max-memory-clock-/

Christian

Hi Christian, how you been?.... Thank you for the info.

tbearhere

legendary

Activity: 3164

Merit: 1003

Quote from: sp_ on November 05, 2015, 06:03:43 AM

You can boost the core clock as well if you put the gpu in the p1 state.

p2 state: adjust memory
p1 state: adjust gpu boost clock

I think I will make code that does this automaticly with an option.. --Max-Boost --Max-memspeed

1. List all possible modes, parse the information and select the fastest for each gpu in the rig
2. Set p1 mode and set the max possible gpu-boost clock
3. set p2 mode and set the max possible memclock clock

The NVIDIA api is broken in x86 builds, so I will call the commandline from the c++ code in the same way I have done before. (windows only)

For 970 cards with 4pin+3pin connector, boosting is important to reach full speed. (but this costs more power)

Other cards will probobly trottle and perform worse..

(f.ex all the gtx 970 mini cards, and the 970 cards that are made small to fit in a small box.)

I can't get my 980ti to boost at all on mem.... Undecided

Topic: CCminer(SP-MOD) Modded NVIDIA Maxwell / Pascal kernels. - page 870. (Read 2347659 times)