Author

Topic: CCminer(SP-MOD) Modded NVIDIA Maxwell / Pascal kernels. - page 1034. (Read 2347601 times)

legendary
Activity: 3164
Merit: 1003
Same thing over and over.  It says.
'cuda_check_cpu_setTarget '  at line 28 : unknown error
This happens on any version.

Try to remove the overclock. Remember to upgrade the driver to the latest.
other things to try:
Add 16GB of virtual memory in windows
or reduce the intensity
I did all  ... no good.
I have 16 gig ram.
It has been doing this for some time..but right now I can't get it to last 5 minutes.

Remove one of the cards. Mine for a while. If it works, change the risercable to a powered riser, and mount the card again. Check the powersupply and remove  dust.
'cuda_check_cpu_setTarget '  at line 28 : unknown error

They are on power risers... it's ccminer that is the problem I think..because on crash it won't restart either.
legendary
Activity: 2912
Merit: 1091
--- ChainWorks Industries ---
quark hashrate is about double that of x11, so it's still the most profitable for me, except when neoscrypt goes over 7 BTC/Gh/day.
this miner is extremely good mining neoscrypt, it uses less power than quark (it's the opposite on amd, afaik).

quark is very 'profitable' in terms of mining compared to most other algos - but we funnel our farm ( in three sections ) through mintsy and rent the hash out while we mine other coins and trade those ...

this is the 'test' we are undertaking currently to see what is the most profitable ( by means of btc ) way of mining ...

we were mining with nicehash exclusively for a little over two weeks - now we will go about this way for the next two weeks ...

the preliminary results will show which 'seems' to be the better way of mining with this simple short testing phase for us ...

there are too many things going on at the same time - so the farm needs to just be running at the best rate as it can - and x11 with rental options seems to be proving more than straight mining ...

BUT - that is not conclusive nor accurate for the two days that it has been ... apart from that - the electrical systems we have in place have been 'shaky' to be put bluntly ...

the issue with this setup though - is that we cannot direct the hashrate TO nicehash FROM mintsy due to teh incompatible extranonce issues between the two sites and their stratums ...

will keep you posted if you like ...

#crysx
sp_
legendary
Activity: 2926
Merit: 1087
Team Black developer
Same thing over and over.  It says.
'cuda_check_cpu_setTarget '  at line 28 : unknown error
This happens on any version.

Try to remove the overclock. Remember to upgrade the driver to the latest.
other things to try:
Add 16GB of virtual memory in windows
or reduce the intensity
I did all  ... no good.
I have 16 gig ram.
It has been doing this for some time..but right now I can't get it to last 5 minutes.

Remove one of the cards. Mine for a while. If it works, change the risercable to a powered riser, and mount the card again. Check the powersupply and remove  dust.
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
quark hashrate is about double that of x11, so it's still the most profitable for me, except when neoscrypt goes over 7 BTC/Gh/day.
this miner is extremely good mining neoscrypt, it uses less power than quark (it's the opposite on amd, afaik).
legendary
Activity: 2912
Merit: 1091
--- ChainWorks Industries ---
5.6MHASH@38W. Low clockspeed/memory speed




that is more our rates ...

we have stopped mining quark for the time being - due to the personal / work environment we are in at the moment ...

for the moment though - our favourite algo - x11 - is on the farm again and is hashing at a comfortable rate ...

the latest commit dropped the hash by about 20KH from the previous compile - so we have reassigned the previous compile until further testing can be done ...

5 x gigabyte 750ti oc lp ( nvidia test system - fedora 20 x64 - cuda 6.5 ) = ~14000KH with the latest compile ... and ~14022KH with the previous compile on x11 ...

keep up the awesome work sp ...

we now have a total of 4 servers - yours being one of them - for the donation links fully active ...

a few issues still - but that seems to be from the amount of algos we have running on the one server - so will probably end up dropping a few of the less used algos from the server ...

as for the farm - we are sticking to x11 for the time being - while the upgrades to the electricity and miners and software are completed in the next week ...

#crysx
legendary
Activity: 3164
Merit: 1003
Same thing over and over.  It says.
'cuda_check_cpu_setTarget '  at line 28 : unknown error
This happens on any version.


Try to remove the overclock. Remember to upgrade the driver to the latest.

other things to try:

Add 16GB of virtual memory in windows
or reduce the intensity

I did all  ... no good.
I have 16 gig ram.
It has been doing this for some time..but right now I can't get it to last 5 minutes.
sp_
legendary
Activity: 2926
Merit: 1087
Team Black developer
Seems that gtx750 with high oc is more energy efficient then 750ti in low clocks )))
Code:
C:\Program Files\NVIDIA Corporation\NVSMI>nvidia-smi.exe

Tue Jul 21 13:11:42 2015
+------------------------------------------------------+
| NVIDIA-SMI 353.30     Driver Version: 353.30         |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 750    WDDM  | 0000:02:00.0     Off |                  N/A |
|100%   52C    P0    35W /  65W |    858MiB /  1024MiB |     99%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0       464  C+G   Insufficient Permissions                     N/A      |
|    0      3836    C   J:\Mining\release55++\ccminer.exe            N/A      |
+-----------------------------------------------------------------------------+

quark 6mhs with such run ... but neoscrypt with 190khs is more profitable now

Did you mod the bios to get 65W power CAP? my cards are @ 38W and 45W. (early 750ti cards without the 6pin power connector) Do you have a 6pin powerconnector to your card?
sp_
legendary
Activity: 2926
Merit: 1087
Team Black developer
Same thing over and over.  It says.
'cuda_check_cpu_setTarget '  at line 28 : unknown error
This happens on any version.


Try to remove the overclock. Remember to upgrade the driver to the latest.

other things to try:

Add 16GB of virtual memory in windows
or reduce the intensity
sp_
legendary
Activity: 2926
Merit: 1087
Team Black developer
C:\Program Files\NVIDIA Corporation\NVSMI>nvidia-smi.exe  it's not working on me? what i need ?

you need to upgrade to the latest driver.

NVIDIA-SMI 353.30
legendary
Activity: 3164
Merit: 1003
Same thing over and over.  It says.

'cuda_check_cpu_setTarget '  at line 28 : unknown error

This happens on any version.


sr. member
Activity: 248
Merit: 250
C:\Program Files\NVIDIA Corporation\NVSMI>nvidia-smi.exe  it's not working on me? what i need ?
legendary
Activity: 1510
Merit: 1003
Seems that gtx750 with high oc is more energy efficient then 750ti in low clocks )))
Code:
C:\Program Files\NVIDIA Corporation\NVSMI>nvidia-smi.exe

Tue Jul 21 13:11:42 2015
+------------------------------------------------------+
| NVIDIA-SMI 353.30     Driver Version: 353.30         |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 750    WDDM  | 0000:02:00.0     Off |                  N/A |
|100%   52C    P0    35W /  65W |    858MiB /  1024MiB |     99%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0       464  C+G   Insufficient Permissions                     N/A      |
|    0      3836    C   J:\Mining\release55++\ccminer.exe            N/A      |
+-----------------------------------------------------------------------------+

quark 6mhs with such run ... but neoscrypt with 190khs is more profitable now
sp_
legendary
Activity: 2926
Merit: 1087
Team Black developer
5.6MHASH@38W. Low clockspeed/memory speed


legendary
Activity: 1764
Merit: 1024
Quote
Quote from: hashbrown9000 on Today at 01:25:29 AM
looks like the Quark honeymoon is over. Now 0.187 on hashpower
Explain plz?
Well, for people that pay for electricity anyways.  For me, at $0.11/kwH , it's just about breaking even = no point to keep the rigs running.

Quark is paying 0.23BTC/day. (nicehash 24hours average)
If you mine with the 750ti, quark will still be profitable with  0,07BTC/day (GHASH)
 


Weird, when I figured things out, a 750ti was closer to a 970 in terms of efficiency. How many watts per 750ti and how much hash?

I'm at .1144 kwh and it's getting closer to .14BTC/day to be profitable

Quark
190w per 970, 16.5Mhs

Is there a special TDP I'm supposed to be running 970s at? I know the efficiency goes up when you drop the TDP, but not by that much and you lose a lot of hash.
sp_
legendary
Activity: 2926
Merit: 1087
Team Black developer
Quote
Quote from: hashbrown9000 on Today at 01:25:29 AM
looks like the Quark honeymoon is over. Now 0.187 on hashpower
Explain plz?
Well, for people that pay for electricity anyways.  For me, at $0.11/kwH , it's just about breaking even = no point to keep the rigs running.

Quark is paying 0.23BTC/day. (nicehash 24hours average)
If you mine with the 750ti, quark will still be profitable with  0,07BTC/day (GHASH)
 
BUILD 861-
I compiled commit 861 and it is now running on my Linux rigs.  My 750ti cards are now mining Quark in the 6500kh/s range, with an occasional 6600kh/s reading.  This is an improvement of 100kh/s over previous builds.  
We will see about stability over the next few days.
--scryptr

6.6MHASH@50watt

7575W for 1 ghash/s of quark

Electricity cost:
7575 * $0.11 *24/1000= $20

Mining income per day:
0.23 * 287 = $66

Net gain: $46 a day.

You need 151 750ti cards, or around 40 980ti cards.
legendary
Activity: 1470
Merit: 1114
80 GH/s on Quark on nicehash even at 0.23 BTC/GH, how's that even profitable?

On the mining side I don't know why some orf the quark volume hasn't moved to other algos.
At current quark rates some algos are becoming more competitive.

For rentals the margin hasn't changed much. The rental price on Nicehash is in line with what
SAK is paying on ffpool as it was when SAK was double the current price. If it was profitable
then it must still be profitable now. With margins that tight I don't see how it was ever profitable
to pay to mine SAK.
hero member
Activity: 1974
Merit: 502
Vave.com - Crypto Casino
80 GH/s on Quark on nicehash even at 0.23 BTC/GH, how's that even profitable?
dga
hero member
Activity: 737
Merit: 511
Thanks for keeping my toys profitable this week.  This isn't a particularly effective patch - sub-0.1% improvement on 750ti - but it's a little cleaner.  (I'm not sure if this part is the speedup or some other changes I made that I'll submit separately, but I figure the cleanup is worthwhile anyway).  The only real substance in here is ensuring that the temporary variable for the swaps is scoped more tightly; the rest just shifts them to using typesafe inline functions instead of the existing macros.

Code:
diff --git a/bitslice_transformations_quad.cu b/bitslice_transformations_quad.cu
index fa81e83..67786c0 100644
--- a/bitslice_transformations_quad.cu
+++ b/bitslice_transformations_quad.cu
@@ -10,46 +10,53 @@
 #define merge8(z, x, y, b)\
                z=__byte_perm(x, y, b); \
 
-#define SWAP8(x,y)\
-               x=__byte_perm(x, y, 0x5410); \
-               y=__byte_perm(x, y, 0x7632);
-
-#define SWAP4(x,y)\
-               t = (y<<4); \
-               t = (x ^ t); \
-               t = 0xf0f0f0f0UL & t; \
-               x = (x ^ t); \
-               t=  t>>4;\
-               y=  y ^ t;
-
-#define SWAP4_final(x,y)\
-               t = (y<<4); \
-               t = (x ^ t); \
-               t = 0xf0f0f0f0UL & t; \
-               x = (x ^ t); \
-
-
-#define SWAP2(x,y)\
-               t = (y<<2); \
-               t = (x ^ t); \
-               t = 0xccccccccUL & t; \
-               x = (x ^ t); \
-               t=  t>>2;\
-               y=  y ^ t;
-
-#define SWAP1(x,y)\
-               t = (y+y); \
-               t = (x ^ t); \
-               t = 0xaaaaaaaaUL & t; \
-               x = (x ^ t); \
-               t=  t>>1;\
-               y=  y ^ t;
+__device__ __forceinline__
+void SWAP8(uint32_t &x, uint32_t &y) {
+       x = __byte_perm(x, y, 0x5410);
+       y = __byte_perm(x, y, 0x7632);
+}
+
+__device__ __forceinline__
+void SWAP4(uint32_t &x, uint32_t &y) {
+       uint32_t t = (y<<4) ^ x;
+       t = 0xf0f0f0f0UL & t;
+       x = (x ^ t);
+       t = t>>4;
+       y = y ^ t;
+}
+
+__device__ __forceinline__
+void SWAP4_final(uint32_t &x, const uint32_t y) {
+       uint32_t t = (y<<4);
+       t = (x ^ t);
+       t = 0xf0f0f0f0UL & t;
+       x = (x ^ t);
+}
+
+__device__ __forceinline__
+void SWAP2(uint32_t &x, uint32_t &y) {
+       uint32_t t = (y<<2);
+       t = (x ^ t);
+       t = 0xccccccccUL & t;
+       x = (x ^ t);
+       t = t>>2;
+       y = y ^ t;
+}
+
+__device__ __forceinline__
+void SWAP1(uint32_t &x, uint32_t &y) {
+       uint32_t t = (y+y);
+       t = (x ^ t);
+       t = 0xaaaaaaaaUL & t;
+       x = (x ^ t);
+       t = t>>1;
+       y = y ^ t;
+}
 
 __device__ __forceinline__
 void to_bitslice_quad(uint32_t *const __restrict__ input, uint32_t *const __restrict__ output)
 {
     uint32_t other[8];
-       uint32_t t;
 
        uint32_t perm = (threadIdx.x & 1) ? 0x7362 : 0x5140;
        const unsigned int n = threadIdx.x & 3;
@@ -90,7 +97,6 @@ void to_bitslice_quad(uint32_t *const __restrict__ input, uint32_t *const __rest
 __device__ __forceinline__
 void from_bitslice_quad(const uint32_t *const __restrict__ input, uint32_t *const __restrict__ output)
 {
-       uint32_t t;
        const uint32_t perm = 0x7531;//(threadIdx.x & 1) ? 0x3175 : 0x7531;
 
                output[0] = __byte_perm(input[0], input[4], perm);
@@ -158,7 +164,6 @@ void from_bitslice_quad(const uint32_t *const __restrict__ input, uint32_t *cons
 __device__ __forceinline__
 void from_bitslice_quad_final(const uint32_t *const __restrict__ input, uint32_t *const __restrict__ output)
 {
-       uint32_t t;
        const uint32_t perm = 0x7531;//(threadIdx.x & 1) ? 0x3175 : 0x7531;
 
        if (threadIdx.x & 3)

And to groestl functions:

Code:
diff --git a/groestl_functions_quad.cu b/groestl_functions_quad.cu
index c39e81d..5b1cdb1 100644
--- a/groestl_functions_quad.cu
+++ b/groestl_functions_quad.cu
@@ -54,11 +56,9 @@ __device__ __forceinline__ void G256_AddRoundConstantP_quad(uint32_t &x7, uint32
 __device__ __forceinline__ void G16mul_quad(uint32_t &x3, uint32_t &x2, uint32_t &x1, uint32_t &x0,
        const uint32_t &y3, const uint32_t &y2, const uint32_t &y1, const uint32_t &y0)
 {
-    uint32_t t0,t1,t2;
-    
-    t0 = ((x2 ^ x0) ^ (x3 ^ x1)) & ((y2 ^ y0) ^ (y3 ^ y1));
-    t1 = ((x2 ^ x0) & (y2 ^ y0)) ^ t0;
-    t2 = ((x3 ^ x1) & (y3 ^ y1)) ^ t0 ^ t1;
+    uint32_t t0 = ((x2 ^ x0) ^ (x3 ^ x1)) & ((y2 ^ y0) ^ (y3 ^ y1));
+    uint32_t t1 = ((x2 ^ x0) & (y2 ^ y0)) ^ t0;
+    uint32_t t2 = ((x3 ^ x1) & (y3 ^ y1)) ^ t0 ^ t1;
 
     t0 = (x2^x3) & (y2^y3);
     x3 = (x3 & y3) ^ t0 ^ t1;
@@ -71,26 +71,24 @@ __device__ __forceinline__ void G16mul_quad(uint32_t &x3, uint32_t &x2, uint32_t
 
 __device__ __forceinline__ void G256_inv_quad(uint32_t &x7, uint32_t &x6, uint32_t &x5, uint32_t &x4, uint32_
 {
-    uint32_t t0,t1,t2,t3,t4,t5,t6,a,b;
-
-    t3 = x7;
-    t2 = x6;
-    t1 = x5;
-    t0 = x4;
+    uint32_t t3 = x7;
+    uint32_t t2 = x6;
+    uint32_t t1 = x5;
+    uint32_t t0 = x4;
 
     G16mul_quad(t3, t2, t1, t0, x3, x2, x1, x0);
 
-    a = (x4 ^ x0);
+    uint32_t a = (x4 ^ x0);
     t0 ^= a;
     t2 ^= (x7 ^ x3) ^ (x5 ^ x1);
     t1 ^= (x5 ^ x1) ^ a;
     t3 ^= (x6 ^ x2) ^ a;
 
-    b = t0 ^ t1;
-    t4 = (t2 ^ t3) & b;
+    uint32_t b = t0 ^ t1;
+    uint32_t t4 = (t2 ^ t3) & b;
     a = t4 ^ t3 ^ t1;
-    t5 = (t3 & t1) ^ a;
-    t6 = (t2 & t0) ^ a ^ (t2 ^ t0);
+    uint32_t t5 = (t3 & t1) ^ a;
+    uint32_t t6 = (t2 & t0) ^ a ^ (t2 ^ t0);
 
     t4 = (t5 ^ t6) & b;
     t1 = (t6 & t1) ^ t4;
@@ -107,9 +105,8 @@ __device__ __forceinline__ void G256_inv_quad(uint32_t &x7, uint32_t &x6, uint32
 
 __device__ __forceinline__ void transAtoX_quad(uint32_t &x0, uint32_t &x1, uint32_t &x2, uint32_t &x3, uint32
 {
-    uint32_t t0, t1;
-    t0 = x0 ^ x1 ^ x2;
-    t1 = x5 ^ x6;
+    uint32_t t0 = x0 ^ x1 ^ x2;
+    uint32_t t1 = x5 ^ x6;
     x2 = t0 ^ t1 ^ x7;
     x6 = t0 ^ x3 ^ x6;
     x3 = x0 ^ x1 ^ x3 ^ x4 ^ x7;    
@@ -122,19 +119,17 @@ __device__ __forceinline__ void transAtoX_quad(uint32_t &x0, uint32_t &x1, uint3
 
 __device__ __forceinline__ void transXtoA_quad(uint32_t &x0, uint32_t &x1, uint32_t &x2, uint32_t &x3, uint32
 {
-    uint32_t t0,t2,t3,t5;
-
     x1 ^= x4;
-    t0 = x1 ^ x6;
+    uint32_t t0 = x1 ^ x6;
     x1 ^= x5;
 
-    t2 = x0 ^ x2;
+    uint32_t t2 = x0 ^ x2;
     x2 = x3 ^ x5;
     t2 ^= x2 ^ x6;
     x2 ^= x7;
-    t3 = x4 ^ x2 ^ x6;
+    uint32_t t3 = x4 ^ x2 ^ x6;
 
-    t5 = x0 ^ x6;
+    uint32_t t5 = x0 ^ x6;
     x4 = x3 ^ x7;
     x0 = x3 ^ x5;
 
@@ -160,14 +155,12 @@ __device__ __forceinline__ void sbox_quad(uint32_t *const r)
 
 __device__ __forceinline__ void G256_ShiftBytesP_quad(uint32_t &x7, uint32_t &x6, uint32_t &x5, uint32_t &x4,
 {
-    uint32_t t0,t1;
-
        const uint32_t tpos = threadIdx.x & 0x03;
        const uint32_t shift1 = tpos << 1;
        const uint32_t shift2 = shift1 + 1 + ((tpos == 3) << 2);
 
-    t0 = __byte_perm(x0, 0, 0x1010)>>shift1;
-    t1 = __byte_perm(x0, 0, 0x3232)>>shift2;
+    uint32_t t0 = __byte_perm(x0, 0, 0x1010)>>shift1;
+    uint32_t t1 = __byte_perm(x0, 0, 0x3232)>>shift2;
     x0 = __byte_perm(t0, t1, 0x5410);
 
     t0 = __byte_perm(x1, 0, 0x1010)>>shift1;
@@ -201,14 +194,12 @@ __device__ __forceinline__ void G256_ShiftBytesP_quad(uint32_t &x7, uint32_t &x6
 
 __device__ __forceinline__ void G256_ShiftBytesQ_quad(uint32_t &x7, uint32_t &x6, uint32_t &x5, uint32_t &x4,
 {
-    uint32_t t0,t1;
-
        const uint32_t tpos = threadIdx.x & 0x03;
        const uint32_t shift1 = (1 - (tpos >> 1)) + ((tpos & 0x01) << 2);
        const uint32_t shift2 = shift1 + 2 + ((tpos == 1) << 2);
 
-    t0 = __byte_perm(x0, 0, 0x1010)>>shift1;
-    t1 = __byte_perm(x0, 0, 0x3232)>>shift2;
+    uint32_t t0 = __byte_perm(x0, 0, 0x1010)>>shift1;
+    uint32_t t1 = __byte_perm(x0, 0, 0x3232)>>shift2;
     x0 = __byte_perm(t0, t1, 0x5410);
 
     t0 = __byte_perm(x1, 0, 0x1010)>>shift1;

Cheers.
legendary
Activity: 1797
Merit: 1028
BUILD 861-

I compiled commit 861 and it is now running on my Linux rigs.  My 750ti cards are now mining Quark in the 6500kh/s range, with an occasional 6600kh/s reading.  This is an improvement of 100kh/s over previous builds. 

We will see about stability over the next few days.

--scryptr
legendary
Activity: 1764
Merit: 1024
Thanks for the support guys.

I found another improvement:

   x8 = ROTL32(0x4D42C787, 7);

compiles to:

   // inline asm
   mov.u32    %r4599, 1296222087;
   // inline asm
   shf.l.wrap.b32 %r4597, %r4599, %r4599, %r4756;

stupid compiler.

x11 improvement coming soon. (luffacubehash512)

and just as i finished writing in the sgminer thread about wanting more x11 optimizations and how ive been asking for them for many many months - you come out with this ...

the farm will be grateful - and will mine donations when the donation links get up and running again next week mate ...

x11 opts makes us smile here ...

#crysx

I'd also vote for x11. It looks like the safest option right now or Neoscrypt or Cryptonote. But apparently Cryptonote is overrun with bots so it doesn't really even matter if it's improved. Neoscrypt is still relatively nice though.
Jump to: