CCminer(SP-MOD) Modded NVIDIA Maxwell / Pascal kernels. - page 1035.

sp_

legendary

Activity: 2954

Merit: 1087

Team Black developer

Quote from: djm34 on July 19, 2015, 07:10:24 AM

Quote from: sp_ on July 18, 2015, 02:22:09 PM

stupid compiler.
x11 improvement coming soon. (luffacubehash512)

why don't you just declare it as constant ? no point in doing a rotation at all, if the result is known

I did. not much gain in x11 though, but qubit got a little bit faster.

djm34

legendary

Activity: 1400

Merit: 1050

Quote from: sp_ on July 18, 2015, 02:22:09 PM

Thanks for the support guys.

I found another improvement:

   x8 = ROTL32(0x4D42C787, 7);

compiles to:

   // inline asm
   mov.u32    %r4599, 1296222087;
   // inline asm
   shf.l.wrap.b32 %r4597, %r4599, %r4599, %r4756;

stupid compiler.

x11 improvement coming soon. (luffacubehash512)

why don't you just declare it as constant ? no point in doing a rotation at all, if the result is known

sp_

legendary

Activity: 2954

Merit: 1087

Team Black developer

Submitted a small boost in the qubit algo.

lawrencelyl

member

Activity: 94

Merit: 10

Quote from: hashbrown9000 on July 18, 2015, 10:04:08 PM

Quote

Quote from: hashbrown9000 on Today at 01:25:29 AM
looks like the Quark honeymoon is over. Now 0.187 on hashpower

Explain plz?

Well, for people that pay for electricity anyways. For me, at $0.11/kwH , it's just about breaking even = no point to keep the rigs running.

In my country, I pay $0.137/kwh excluding tax. Cry

hashbrown9000

sr. member

Activity: 427

Merit: 250

Quote

Quote from: hashbrown9000 on Today at 01:25:29 AM
looks like the Quark honeymoon is over. Now 0.187 on hashpower

Explain plz?

Well, for people that pay for electricity anyways. For me, at $0.11/kwH , it's just about breaking even = no point to keep the rigs running.

bensam1231

legendary

Activity: 1764

Merit: 1024

Yeah all the algos are a flaming wreck right now. Hopefully the chinese find some other shit coin they're interested in.

CapnBDL

hero member

Activity: 1064

Merit: 500

MOBU

Quote from: hashbrown9000 on July 18, 2015, 08:25:29 PM

looks like the Quark honeymoon is over. Now 0.187 on hashpower

Explain plz?

hashbrown9000

sr. member

Activity: 427

Merit: 250

looks like the Quark honeymoon is over. Now 0.187 on hashpower

chrysophylax

legendary

Activity: 2940

Merit: 1091

--- ChainWorks Industries ---

Quote from: sp_ on July 18, 2015, 02:22:09 PM

Thanks for the support guys.

I found another improvement:

   x8 = ROTL32(0x4D42C787, 7);

compiles to:

   // inline asm
   mov.u32    %r4599, 1296222087;
   // inline asm
   shf.l.wrap.b32 %r4597, %r4599, %r4599, %r4756;

stupid compiler.

x11 improvement coming soon. (luffacubehash512)

and just as i finished writing in the sgminer thread about wanting more x11 optimizations and how ive been asking for them for many many months - you come out with this ...

the farm will be grateful - and will mine donations when the donation links get up and running again next week mate ...

x11 opts makes us smile here ...

#crysx

sp_

legendary

Activity: 2954

Merit: 1087

Team Black developer

Thanks for the support guys.

I found another improvement:

   x8 = ROTL32(0x4D42C787, 7);

compiles to:

   // inline asm
   mov.u32    %r4599, 1296222087;
   // inline asm
   shf.l.wrap.b32 %r4597, %r4599, %r4599, %r4756;

stupid compiler.

x11 improvement coming soon. (luffacubehash512)

go6ooo1212

legendary

Activity: 1512

Merit: 1000

quarkchain.io

Just did a small donation to SP_ , TxID:
58b1cd0e556f708bd6da1a7d5cad19b3a7dbca88b75d39ce63ccc4b921f7fb9f

rednoW

legendary

Activity: 1510

Merit: 1003

Quote from: chrysophylax on July 18, 2015, 08:32:06 AM

stock? ...
or you overclock also? ...
#crysx

all me results are for max possible overclock for the algo.
For quark it is 1510/1600
6114 khash new record

sp_

legendary

Activity: 2954

Merit: 1087

Team Black developer

   #if __CUDA_ARCH__ > 500
   #pragma unroll
   #endif

djm34

legendary

Activity: 1400

Merit: 1050

Quote from: pallas on July 18, 2015, 10:34:10 AM

Quote from: sp_ on July 18, 2015, 10:29:31 AM

Quote from: pallas on July 18, 2015, 09:28:47 AM

I'm curious to see if it's good for the 750 as well.

no good for the 750ti. A drop of 400khash.

I'm new to cuda: is there an ifdef or something we can put to enable the unroll for 9xx cards only?

#pragma unroll n (and #pragma nounroll )

pallas

legendary

Activity: 2716

Merit: 1094

Black Belt Developer

Quote from: sp_ on July 18, 2015, 10:29:31 AM

Quote from: pallas on July 18, 2015, 09:28:47 AM

I'm curious to see if it's good for the 750 as well.

no good for the 750ti. A drop of 400khash.

I'm new to cuda: is there an ifdef or something we can put to enable the unroll for 9xx cards only?

sp_

legendary

Activity: 2954

Merit: 1087

Team Black developer

Quote from: pallas on July 18, 2015, 09:28:47 AM

I'm curious to see if it's good for the 750 as well.

no good for the 750ti. A drop of 400khash.

pallas

legendary

Activity: 2716

Merit: 1094

Black Belt Developer

Quote from: chrysophylax on July 18, 2015, 09:15:28 AM

Quote from: pallas on July 18, 2015, 09:07:04 AM

This little patch is giving quark +300 Kh/s on my 970:

diff --git a/groestl_functions_quad.cu b/groestl_functions_quad.cu
index c39e81d..3ac3c5d 100644
--- a/groestl_functions_quad.cu
+++ b/groestl_functions_quad.cu
@@ -285,7 +285,7 @@ __device__ __forceinline__ void G256_MixFunction_quad(uint32_t *r)

__device__ __forceinline__ void groestl512_perm_P_quad(uint32_t *const r)
{
-
+#pragma unroll
for(int round=0;round<14;round++)
{
G256_AddRoundConstantP_quad(r[7], r[6], r[5], r[4], r[3], r[2], r[1], r[0], round);

where does the patch go? ...

#crysx

"patch -p0" should work.
If you wonna do it manually, the file name is at the start of the patch.
I'm curious to see if it's good for the 750 as well.

chrysophylax

legendary

Activity: 2940

Merit: 1091

--- ChainWorks Industries ---

Quote from: pallas on July 18, 2015, 09:07:04 AM

This little patch is giving quark +300 Kh/s on my 970:

diff --git a/groestl_functions_quad.cu b/groestl_functions_quad.cu
index c39e81d..3ac3c5d 100644
--- a/groestl_functions_quad.cu
+++ b/groestl_functions_quad.cu
@@ -285,7 +285,7 @@ __device__ __forceinline__ void G256_MixFunction_quad(uint32_t *r)

__device__ __forceinline__ void groestl512_perm_P_quad(uint32_t *const r)
{
-
+#pragma unroll
for(int round=0;round<14;round++)
{
G256_AddRoundConstantP_quad(r[7], r[6], r[5], r[4], r[3], r[2], r[1], r[0], round);

where does the patch go? ...

#crysx

pallas

legendary

Activity: 2716

Merit: 1094

Black Belt Developer

This little patch is giving quark +300 Kh/s on my 970:

diff --git a/groestl_functions_quad.cu b/groestl_functions_quad.cu
index c39e81d..3ac3c5d 100644
--- a/groestl_functions_quad.cu
+++ b/groestl_functions_quad.cu
@@ -285,7 +285,7 @@ __device__ __forceinline__ void G256_MixFunction_quad(uint32_t *r)

__device__ __forceinline__ void groestl512_perm_P_quad(uint32_t *const r)
{
-
+#pragma unroll
for(int round=0;round<14;round++)
{
G256_AddRoundConstantP_quad(r[7], r[6], r[5], r[4], r[3], r[2], r[1], r[0], round);

chrysophylax

legendary

Activity: 2940

Merit: 1091

--- ChainWorks Industries ---

Quote from: rednoW on July 18, 2015, 08:22:34 AM

Quote from: sp_ on July 18, 2015, 07:35:01 AM

submitted another optimalization in bmw512.
Visible improvement in the the quark algo the gtx960 is up 50-100khash

+10-15 khash with gtx750 )))

stock? ...

or you overclock also? ...

#crysx

Topic: CCminer(SP-MOD) Modded NVIDIA Maxwell / Pascal kernels. - page 1035. (Read 2347659 times)