Author

Topic: CCminer(SP-MOD) Modded NVIDIA Maxwell / Pascal kernels. - page 1035. (Read 2347659 times)

sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
stupid compiler.
x11 improvement coming soon. (luffacubehash512)
why don't you just declare it as constant ? no point in doing a rotation at all, if the result is known

I did. not much gain in x11 though, but qubit got a little bit faster.
legendary
Activity: 1400
Merit: 1050
Thanks for the support guys.

I found another improvement:

   x8 = ROTL32(0x4D42C787, 7);

compiles to:

   // inline asm
   mov.u32    %r4599, 1296222087;
   // inline asm
   shf.l.wrap.b32 %r4597, %r4599, %r4599, %r4756;

stupid compiler.

x11 improvement coming soon. (luffacubehash512)
why don't you just declare it as constant ? no point in doing a rotation at all, if the result is known
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
Submitted a small boost in the qubit algo.
member
Activity: 94
Merit: 10
Quote
Quote from: hashbrown9000 on Today at 01:25:29 AM
looks like the Quark honeymoon is over. Now 0.187 on hashpower

Explain plz?

Well, for people that pay for electricity anyways.  For me, at $0.11/kwH , it's just about breaking even = no point to keep the rigs running.
In my country, I pay $0.137/kwh excluding tax.  Cry
sr. member
Activity: 427
Merit: 250
Quote
Quote from: hashbrown9000 on Today at 01:25:29 AM
looks like the Quark honeymoon is over. Now 0.187 on hashpower

Explain plz?

Well, for people that pay for electricity anyways.  For me, at $0.11/kwH , it's just about breaking even = no point to keep the rigs running.
legendary
Activity: 1764
Merit: 1024
Yeah all the algos are a flaming wreck right now. Hopefully the chinese find some other shit coin they're interested in.
hero member
Activity: 1064
Merit: 500
MOBU
looks like the Quark honeymoon is over. Now 0.187 on hashpower

Explain plz?
sr. member
Activity: 427
Merit: 250
looks like the Quark honeymoon is over. Now 0.187 on hashpower
legendary
Activity: 2940
Merit: 1091
--- ChainWorks Industries ---
Thanks for the support guys.

I found another improvement:

   x8 = ROTL32(0x4D42C787, 7);

compiles to:

   // inline asm
   mov.u32    %r4599, 1296222087;
   // inline asm
   shf.l.wrap.b32 %r4597, %r4599, %r4599, %r4756;

stupid compiler.

x11 improvement coming soon. (luffacubehash512)

and just as i finished writing in the sgminer thread about wanting more x11 optimizations and how ive been asking for them for many many months - you come out with this ...

the farm will be grateful - and will mine donations when the donation links get up and running again next week mate ...

x11 opts makes us smile here ...

#crysx
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
Thanks for the support guys.

I found another improvement:

   x8 = ROTL32(0x4D42C787, 7);

compiles to:

   // inline asm
   mov.u32    %r4599, 1296222087;
   // inline asm
   shf.l.wrap.b32 %r4597, %r4599, %r4599, %r4756;

stupid compiler.

x11 improvement coming soon. (luffacubehash512)
legendary
Activity: 1512
Merit: 1000
quarkchain.io
Just did a small donation to SP_ , TxID:
58b1cd0e556f708bd6da1a7d5cad19b3a7dbca88b75d39ce63ccc4b921f7fb9f
legendary
Activity: 1510
Merit: 1003
stock? ...
or you overclock also? ...
#crysx

all me results are for max possible overclock for the algo.
For quark it is 1510/1600
6114 khash new record
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
   #if __CUDA_ARCH__ > 500
   #pragma unroll
   #endif   
legendary
Activity: 1400
Merit: 1050
I'm curious to see if it's good for the 750 as well.

no good for the 750ti. A drop of 400khash.

I'm new to cuda: is there an ifdef or something we can put to enable the unroll for 9xx cards only?
#pragma unroll n (and #pragma nounroll )
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
I'm curious to see if it's good for the 750 as well.

no good for the 750ti. A drop of 400khash.

I'm new to cuda: is there an ifdef or something we can put to enable the unroll for 9xx cards only?
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
I'm curious to see if it's good for the 750 as well.

no good for the 750ti. A drop of 400khash.
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
This little patch is giving quark +300 Kh/s on my 970:

diff --git a/groestl_functions_quad.cu b/groestl_functions_quad.cu
index c39e81d..3ac3c5d 100644
--- a/groestl_functions_quad.cu
+++ b/groestl_functions_quad.cu
@@ -285,7 +285,7 @@ __device__ __forceinline__ void G256_MixFunction_quad(uint32_t *r)

 __device__ __forceinline__ void groestl512_perm_P_quad(uint32_t *const r)
 {
-
+#pragma unroll
        for(int round=0;round<14;round++)
     {
         G256_AddRoundConstantP_quad(r[7], r[6], r[5], r[4], r[3], r[2], r[1], r[0], round);

where does the patch go? ...

#crysx

"patch -p0" should work.
If you wonna do it manually, the file name is at the start of the patch.
I'm curious to see if it's good for the 750 as well.
legendary
Activity: 2940
Merit: 1091
--- ChainWorks Industries ---
This little patch is giving quark +300 Kh/s on my 970:

diff --git a/groestl_functions_quad.cu b/groestl_functions_quad.cu
index c39e81d..3ac3c5d 100644
--- a/groestl_functions_quad.cu
+++ b/groestl_functions_quad.cu
@@ -285,7 +285,7 @@ __device__ __forceinline__ void G256_MixFunction_quad(uint32_t *r)

 __device__ __forceinline__ void groestl512_perm_P_quad(uint32_t *const r)
 {
-
+#pragma unroll
        for(int round=0;round<14;round++)
     {
         G256_AddRoundConstantP_quad(r[7], r[6], r[5], r[4], r[3], r[2], r[1], r[0], round);

where does the patch go? ...

#crysx
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
This little patch is giving quark +300 Kh/s on my 970:

diff --git a/groestl_functions_quad.cu b/groestl_functions_quad.cu
index c39e81d..3ac3c5d 100644
--- a/groestl_functions_quad.cu
+++ b/groestl_functions_quad.cu
@@ -285,7 +285,7 @@ __device__ __forceinline__ void G256_MixFunction_quad(uint32_t *r)

 __device__ __forceinline__ void groestl512_perm_P_quad(uint32_t *const r)
 {
-
+#pragma unroll
        for(int round=0;round<14;round++)
     {
         G256_AddRoundConstantP_quad(r[7], r[6], r[5], r[4], r[3], r[2], r[1], r[0], round);
legendary
Activity: 2940
Merit: 1091
--- ChainWorks Industries ---
submitted another optimalization in bmw512.
Visible improvement in the the quark algo the gtx960 is up 50-100khash
+10-15 khash with gtx750 )))

stock? ...

or you overclock also? ...

#crysx
Jump to: