[ANN][X11/X13] X11 (Darkcoin)/X13 (Marucoin) miner (based on sph-sgminer)

Wolf0

member

Activity: 81

Merit: 1002

It was only the wind.

Quote from: K1773R on February 06, 2015, 05:09:19 AM

Quote from: webprods on February 06, 2015, 02:42:12 AM

Quote from: restless on December 23, 2014, 04:34:04 AM

latest x13 optimisations are not compatible with 6xxx and 5xxx radeons
Best way is to run with -d switch - pointing to your 7970 and another instance using x13modold/marucoin-modold , again with -d but pointing to 6950 card
Best speed achieved by 6970 is ~ 1.4MH/s for x13

I'm mining with 280x using sgminer 4.2.2-298-g3bb4 with wolf and got 8,2Mh/s for single card and 34 Mh/s with 4 280x
here my bat.file for single card. sgminer.exe --kernel darkcoin-mod --api-listen -o stratum+tcp://cann.suprnova.cc:4442 -u xxxx -p xxxxx -w 64 -g 2 --thread-concurrency 8192 --intensity 21 --lookup-gap 2 --no-submit-stale --gpu-powertune 20 --gpu-fan 55 --temp-cutoff 95 --gpu-engine 1150 --gpu-memclock 1450
From sgminer screen
sgminer 4.2.2-298-g3bb4 - Started: [2015-02-06 01:21:49] - [0 days 00:24:53]
--------------------------------------------------------------------------------
(5s):8.017M (avg):6.683Mh/s | A:2 R:0 HW:0 WU:0.094/m
ST: 2 SS: 5 NB: 25 LW: 1557 GF: 0 RF: 0
Connected to cann.suprnova.cc (stratum) diff 0.022 as user xxxxxxxx
Block: 4bd89bcc... Diff:37 Started: [01:46:42] .

can you link that kernel please?

He probably bought it, so he might not.

Wolf0

member

Activity: 81

Merit: 1002

It was only the wind.

Quote from: pallas on December 07, 2014, 12:43:52 PM

Quote from: Wolf0 on December 07, 2014, 11:07:12 AM

Quote from: Ignition75 on December 07, 2014, 10:47:42 AM

Quote from: ?? on ??

Quote from: djm34 on December 07, 2014, 10:40:02 AM

as french, I like revolutions so whose head shall we cut ? Grin

x11 is like a banana republic, there is a revolution every now and then.
Really want to revolutionize things ? rewrite from scratch sgminer.

an advice: stop mistaking revolution and pissing contest...

God, SGMiner DOES need rewriting...

What will be the advantages to re-writing SGMiner?

You've obviously never seen the code. One thing that holds back development is that no one wants to touch it. CGMiner was bad enough, and that is horrid - now we have more misfit pieces of code grafted onto it. It works, but that's about the only good thing you can say about that code.

Everybody use it because, as far as I know, it's the best at handling OpenCL (so AMD cards) ;-)

No, everybody uses it because there's no reasonable alternative. It's not the best at OpenCL, it's the ONLY one - otherwise you must write it from scratch.

Wolf0

member

Activity: 81

Merit: 1002

It was only the wind.

Quote from: Ignition75 on December 07, 2014, 10:47:42 AM

Quote from: ?? on ??

Quote from: djm34 on December 07, 2014, 10:40:02 AM

as french, I like revolutions so whose head shall we cut ? Grin

x11 is like a banana republic, there is a revolution every now and then.
Really want to revolutionize things ? rewrite from scratch sgminer.

an advice: stop mistaking revolution and pissing contest...

God, SGMiner DOES need rewriting...

What will be the advantages to re-writing SGMiner?

You've obviously never seen the code. One thing that holds back development is that no one wants to touch it. CGMiner was bad enough, and that is horrid - now we have more misfit pieces of code grafted onto it. It works, but that's about the only good thing you can say about that code.

partmakeo

newbie

Activity: 8

Merit: 0

I see this as less of a problem. Let say he has 1% of the overall network hashrate (180 khash/s), he could only send those spammy transactions in 1/100 of the blocks found. This also assumes everyone has updated to the latest code.

pallas

legendary

Activity: 2716

Merit: 1094

Black Belt Developer

Quote from: timetox on January 24, 2016, 10:20:45 AM

help why am i geting extreme LW what dus this mean is it mining for someone elss please help

this miner is obsolete, please use sgminer and the related thread:

https://bitcointalksearch.org/topic/ann-sgminer-v5-optimized-x11x13neoscryptlyra2reetc-kernel-switch-miner-632503

timetox

newbie

Activity: 1

Merit: 0

help why am i geting extreme LW what dus this mean is it mining for someone elss please help

Cryptozillah

hero member

Activity: 687

Merit: 502

Quote from: MidwestMiner on March 15, 2015, 09:58:36 AM

I am considering throwing a few old GPU rigs at x13 (15-20 R9 280/290x cards) is there a stupid simple miner I can run?

With the miner linked in this article i get like 13.5-14Mhs with some oc while mining Quark @ Nicehash with my 280x cards.
Either that or mining ETH should give you some pretty descent profits.

http://cryptomining-blog.com/4819-new-sgminer-with-optimized-quark-and-qubit-kernels/

fullintegrity

member

Activity: 110

Merit: 10

please forgive, but this may be dumb question.
Can i use that sgminer and run on my gridseeds?
if so what is batch file look like?

Eastwind

hero member

Activity: 896

Merit: 1000

Quote from: MidwestMiner on March 15, 2015, 09:58:36 AM

I am considering throwing a few old GPU rigs at x13 (15-20 R9 280/290x cards) is there a stupid simple miner I can run?

SGminer is simple and most popular.

MidwestMiner

full member

Activity: 224

Merit: 100

I am considering throwing a few old GPU rigs at x13 (15-20 R9 280/290x cards) is there a stupid simple miner I can run?

K1773R

legendary

Activity: 1792

Merit: 1008

/dev/null

Quote from: webprods on February 06, 2015, 02:42:12 AM

Quote from: restless on December 23, 2014, 04:34:04 AM

latest x13 optimisations are not compatible with 6xxx and 5xxx radeons
Best way is to run with -d switch - pointing to your 7970 and another instance using x13modold/marucoin-modold , again with -d but pointing to 6950 card
Best speed achieved by 6970 is ~ 1.4MH/s for x13

I'm mining with 280x using sgminer 4.2.2-298-g3bb4 with wolf and got 8,2Mh/s for single card and 34 Mh/s with 4 280x
here my bat.file for single card. sgminer.exe --kernel darkcoin-mod --api-listen -o stratum+tcp://cann.suprnova.cc:4442 -u xxxx -p xxxxx -w 64 -g 2 --thread-concurrency 8192 --intensity 21 --lookup-gap 2 --no-submit-stale --gpu-powertune 20 --gpu-fan 55 --temp-cutoff 95 --gpu-engine 1150 --gpu-memclock 1450
From sgminer screen
sgminer 4.2.2-298-g3bb4 - Started: [2015-02-06 01:21:49] - [0 days 00:24:53]
--------------------------------------------------------------------------------
(5s):8.017M (avg):6.683Mh/s | A:2 R:0 HW:0 WU:0.094/m
ST: 2 SS: 5 NB: 25 LW: 1557 GF: 0 RF: 0
Connected to cann.suprnova.cc (stratum) diff 0.022 as user xxxxxxxx
Block: 4bd89bcc... Diff:37 Started: [01:46:42] .

can you link that kernel please?

webprods

sr. member

Activity: 308

Merit: 250

Millionaires Club 47

Quote from: restless on December 23, 2014, 04:34:04 AM

latest x13 optimisations are not compatible with 6xxx and 5xxx radeons
Best way is to run with -d switch - pointing to your 7970 and another instance using x13modold/marucoin-modold , again with -d but pointing to 6950 card
Best speed achieved by 6970 is ~ 1.4MH/s for x13

I'm mining with 280x using sgminer 4.2.2-298-g3bb4 with wolf and got 8,2Mh/s for single card and 34 Mh/s with 4 280x
here my bat.file for single card. sgminer.exe --kernel darkcoin-mod --api-listen -o stratum+tcp://cann.suprnova.cc:4442 -u xxxx -p xxxxx -w 64 -g 2 --thread-concurrency 8192 --intensity 21 --lookup-gap 2 --no-submit-stale --gpu-powertune 20 --gpu-fan 55 --temp-cutoff 95 --gpu-engine 1150 --gpu-memclock 1450
From sgminer screen
sgminer 4.2.2-298-g3bb4 - Started: [2015-02-06 01:21:49] - [0 days 00:24:53]
--------------------------------------------------------------------------------
(5s):8.017M (avg):6.683Mh/s | A:2 R:0 HW:0 WU:0.094/m
ST: 2 SS: 5 NB: 25 LW: 1557 GF: 0 RF: 0
Connected to cann.suprnova.cc (stratum) diff 0.022 as user xxxxxxxx
Block: 4bd89bcc... Diff:37 Started: [01:46:42] .

Oscilson

sr. member

Activity: 434

Merit: 250

Quote from: thevictimofuktyranny on January 16, 2015, 01:01:51 PM

Quote from: kopam on January 16, 2015, 12:34:18 PM

What are currently the best hashrates one can get with 7950 or 280x ?

I don't know about the 7950, but the 280x is about 6.6mhs with a overclock. This is on Windows 7 or 8 O/S. Don't know about linux distros.

You need to use wolf0's old modded kernel and bins leaked by LovesToShare on November 30: http://www.filedropper.com/optmizedsgminerkernels

I see you posted on the other thread as well: https://bitcointalk.org/index.php?topic=854257.320

X11 is 6.6mhs overclocked wolf0 screenshot
X13 I don't know, but a R9 290 is 5.1mhs not overclocked, my own card.

There is modded kernel for neoscrypt that give extra 4% on 280x, again from WolfO.

Copy and past this replacement into the neoscrypt kernel file (delete the old contents).

snip of codes

That is all I found myself.
Please remember, you buy better mods directly from wolf0. He has X13 mod for sale that gives another 50% boost to X13 algo hash. He not selling his latest neoscrypt algo.

I tried your neoscrypt kernel file, it does not compile on my GPU. Can you upload a bin file?

Which sgminer version do you use? Link?

thevictimofuktyranny

legendary

Activity: 1092

Merit: 1004

Quote from: kopam on January 16, 2015, 12:34:18 PM

What are currently the best hashrates one can get with 7950 or 280x ?

I don't know about the 7950, but the 280x is about 6.6mhs with a overclock. This is on Windows 7 or 8 O/S. Don't know about linux distros.

You need to use wolf0's old modded kernel and bins leaked by LovesToShare on November 30: http://www.filedropper.com/optmizedsgminerkernels

I see you posted on the other thread as well: https://bitcointalk.org/index.php?topic=854257.320

X11 is 6.6mhs overclocked wolf0 screenshot
X13 I don't know, but a R9 290 is 5.1mhs not overclocked, my own card.

There is modded kernel for neoscrypt that give extra 4% on 280x, again from WolfO.

Copy and past this replacement into the neoscrypt kernel file (delete the old contents).

// NeoScrypt(128, 2, 1) with Salsa20/20 and ChaCha20/20

// Stupid AMD compiler ignores the unroll pragma in these two
#define SALSA_SMALL_UNROLL 3
#define CHACHA_SMALL_UNROLL 3

// If SMALL_BLAKE2S is defined, BLAKE2S_UNROLL is interpreted
// as the unroll factor; must divide cleanly into ten.
// Usually a bad idea.
//#define SMALL_BLAKE2S
//#define BLAKE2S_UNROLL 5

#define BLOCK_SIZE 64U
#define FASTKDF_BUFFER_SIZE 256U
#ifndef PASSWORD_LEN
#define PASSWORD_LEN 80U
#endif

#if !defined(cl_khr_byte_addressable_store)
#error "Device does not support unaligned stores"
#endif

// Swaps 128 bytes at a time without using temp vars
void SwapBytes128(void *restrict A, void *restrict B, uint len)
{
   #pragma unroll 2
   for(int i = 0; i < (len >> 7); ++i)
   {
   ((ulong16 *)A) ^= ((ulong16 *)B);
   ((ulong16 *)B) ^= ((ulong16 *)A);
   ((ulong16 *)A) ^= ((ulong16 *)B);
   }
}

void CopyBytes128(void *restrict dst, const void *restrict src, uint len)
{
   #pragma unroll 2
   for(int i = 0; i < len; ++i)
   ((ulong16 *)dst) = ((ulong16 *)src);
}

void CopyBytes(void *restrict dst, const void *restrict src, uint len)
{
   for(int i = 0; i < len; ++i)
   ((uchar *)dst) = ((uchar *)src);
}

//
// a bit of byte alignment checking goes a long ways...
//
void XORBytesInPlace(void *restrict dst, const void *restrict src, uint mod)
{
  switch(mod % 4)
  {
  case 0:
   #pragma unroll 2
   for(int i = 0; i < 4; i+=2)
   {
   ((uint2 *)dst) ^= ((uint2 *)src);
   ((uint2 *)dst)[i+1] ^= ((uint2 *)src)[i+1];
   }
   break;

  case 2:
   #pragma unroll 8
   for(int i = 0; i < 16; i+=2)
   {
   ((uchar2 *)dst) ^= ((uchar2 *)src);
   ((uchar2 *)dst)[i+1] ^= ((uchar2 *)src)[i+1];
   }
   break;

  default:
  #pragma unroll 8
   for(int i = 0; i < 31; i+=4)
   {
   ((uchar *)dst) ^= ((uchar *)src);
   ((uchar *)dst)[i+1] ^= ((uchar *)src)[i+1];
   ((uchar *)dst)[i+2] ^= ((uchar *)src)[i+2];
   ((uchar *)dst)[i+3] ^= ((uchar *)src)[i+3];
   }
  }
}

void XORBytes(void *restrict dst, const void *restrict src1, const void *restrict src2, uint len)
{
   #pragma unroll 1
   for(int i = 0; i < len; ++i)
   ((uchar *)dst) = ((uchar *)src1) ^ ((uchar *)src2);
}

// Blake2S

#define BLAKE2S_BLOCK_SIZE 64U
#define BLAKE2S_OUT_SIZE 32U
#define BLAKE2S_KEY_SIZE 32U

static const __constant uint BLAKE2S_IV[8] =
{
   0x6A09E667, 0xBB67AE85, 0x3C6EF372, 0xA54FF53A,
   0x510E527F, 0x9B05688C, 0x1F83D9AB, 0x5BE0CD19
};

static const __constant uchar BLAKE2S_SIGMA[10][16] =
{
   { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 } ,
   { 14, 10, 4, 8, 9, 15, 13, 6, 1, 12, 0, 2, 11, 7, 5, 3 } ,
   { 11, 8, 12, 0, 5, 2, 15, 13, 10, 14, 3, 6, 7, 1, 9, 4 } ,
   { 7, 9, 3, 1, 13, 12, 11, 14, 2, 6, 5, 10, 4, 0, 15, 8 } ,
   { 9, 0, 5, 7, 2, 4, 10, 15, 14, 1, 11, 12, 6, 8, 3, 13 } ,
   { 2, 12, 6, 10, 0, 11, 8, 3, 4, 13, 7, 5, 15, 14, 1, 9 } ,
   { 12, 5, 1, 15, 14, 13, 4, 10, 0, 7, 6, 3, 9, 2, 8, 11 } ,
   { 13, 11, 7, 14, 12, 1, 3, 9, 5, 0, 15, 4, 8, 6, 2, 10 } ,
   { 6, 15, 14, 9, 11, 3, 0, 8, 12, 2, 13, 7, 1, 4, 10, 5 } ,
   { 10, 2, 8, 4, 7, 6, 1, 5, 15, 11, 9, 14, 3, 12, 13 , 0 } ,
};

#define BLAKE_G(idx0, idx1, a, b, c, d, key) do { \
   a += b + key[BLAKE2S_SIGMA[idx0][idx1]]; \
   d = rotate(d ^ a, 16U); \
   c += d; \
   b = rotate(b ^ c, 20U); \
   a += b + key[BLAKE2S_SIGMA[idx0][idx1 + 1]]; \
   d = rotate(d ^ a, 24U); \
   c += d; \
   b = rotate(b ^ c, 25U); \
} while(0)

void Blake2S(uint *restrict inout, const uint *restrict inkey)
{
   uint16 V;
   uint8 tmpblock;

   // Load first block (IV into V.lo) and constants (IV into V.hi)
   V.lo = V.hi = vload8(0U, BLAKE2S_IV);

   // XOR with initial constant
   V.s0 ^= 0x01012020;

   // Copy input block for later
   tmpblock = V.lo;

   // XOR length of message so far (including this block)
   // There are two uints for this field, but high uint is zero
   V.sc ^= BLAKE2S_BLOCK_SIZE;

   // Compress state, using the key as the key
   #ifdef SMALL_BLAKE2S
   #pragma unroll BLAKE2S_UNROLL
   #else
   #pragma unroll
   #endif
   for(int x = 0; x < 10; ++x)
   {
   BLAKE_G(x, 0x00, V.s0, V.s4, V.s8, V.sc, inkey);
   BLAKE_G(x, 0x02, V.s1, V.s5, V.s9, V.sd, inkey);
   BLAKE_G(x, 0x04, V.s2, V.s6, V.sa, V.se, inkey);
   BLAKE_G(x, 0x06, V.s3, V.s7, V.sb, V.sf, inkey);
   BLAKE_G(x, 0x08, V.s0, V.s5, V.sa, V.sf, inkey);
   BLAKE_G(x, 0x0A, V.s1, V.s6, V.sb, V.sc, inkey);
   BLAKE_G(x, 0x0C, V.s2, V.s7, V.s8, V.sd, inkey);
   BLAKE_G(x, 0x0E, V.s3, V.s4, V.s9, V.se, inkey);
   }

   // XOR low part of state with the high part,
   // then with the original input block.
   V.lo ^= V.hi ^ tmpblock;

   // Load constants (IV into V.hi)
   V.hi = vload8(0U, BLAKE2S_IV);

   // Copy input block for later
   tmpblock = V.lo;

   // XOR length of message into block again
   V.sc ^= BLAKE2S_BLOCK_SIZE << 1;

   // Last block compression - XOR final constant into state
   V.se ^= 0xFFFFFFFFU;

   // Compress block, using the input as the key
   #ifdef SMALL_BLAKE2S
   #pragma unroll BLAKE2S_UNROLL
   #else
   #pragma unroll
   #endif
   for(int x = 0; x < 10; ++x)
   {
   BLAKE_G(x, 0x00, V.s0, V.s4, V.s8, V.sc, inout);
   BLAKE_G(x, 0x02, V.s1, V.s5, V.s9, V.sd, inout);
   BLAKE_G(x, 0x04, V.s2, V.s6, V.sa, V.se, inout);
   BLAKE_G(x, 0x06, V.s3, V.s7, V.sb, V.sf, inout);
   BLAKE_G(x, 0x08, V.s0, V.s5, V.sa, V.sf, inout);
   BLAKE_G(x, 0x0A, V.s1, V.s6, V.sb, V.sc, inout);
   BLAKE_G(x, 0x0C, V.s2, V.s7, V.s8, V.sd, inout);
   BLAKE_G(x, 0x0E, V.s3, V.s4, V.s9, V.se, inout);
   }

   // XOR low part of state with high part, then with input block
   V.lo ^= V.hi ^ tmpblock;

   // Store result in input/output buffer
   vstore8(V.lo, 0, inout);
}

/* FastKDF, a fast buffered key derivation function:
* FASTKDF_BUFFER_SIZE must be a power of 2;
* password_len, salt_len and output_len should not exceed FASTKDF_BUFFER_SIZE;
* prf_output_size must be <= prf_key_size; */
void fastkdf(const uchar *restrict password, const uchar *restrict salt, const uint salt_len, uchar *restrict

output, uint output_len)
{

   /* WARNING!
   * This algorithm uses byte-wise addressing for memory blocks.
   * Or in other words, trying to copy an unaligned memory region
   * will significantly slow down the algorithm, when copying uses
   * words or bigger entities. It even may corrupt the data, when
   * the device does not support it properly.
   * Therefore use byte copying, which will not the fastest but at
   * least get reliable results. */

   // BLOCK_SIZE 64U
   // FASTKDF_BUFFER_SIZE 256U
   // BLAKE2S_BLOCK_SIZE 64U
   // BLAKE2S_KEY_SIZE 32U
   // BLAKE2S_OUT_SIZE 32U
   uchar bufidx = 0;
   uint8 Abuffer[9], Bbuffer[9] = { (uint8)(0) };
   uchar *A = (uchar *)Abuffer, *B = (uchar *)Bbuffer;

   // Initialize the password buffer
   #pragma unroll 1
   for(int i = 0; i < (FASTKDF_BUFFER_SIZE >> 3); ++i) ((ulong *)A) = ((ulong *)password)[i % 10];

   ((uint16 *)(A + FASTKDF_BUFFER_SIZE))[0] = ((uint16 *)password)[0];

   // Initialize the salt buffer
   if(salt_len == FASTKDF_BUFFER_SIZE)
   {
   ((ulong16 *)B)[0] = ((ulong16 *)B)[2] = ((ulong16 *)salt)[0];
   ((ulong16 *)B)[1] = ((ulong16 *)B)[3] = ((ulong16 *)salt)[1];
   }
   else
   {
   // salt_len is 80 bytes here
   #pragma unroll 1
   for(int i = 0; i < (FASTKDF_BUFFER_SIZE >> 3); ++i) ((ulong *)B) = ((ulong *)salt)[i % 10];

   // Initialized the rest to zero earlier
   #pragma unroll 1
   for(int i = 0; i < 10; ++i) ((ulong *)(B + FASTKDF_BUFFER_SIZE)) = ((ulong *)salt);
   }

   // The primary iteration
   #pragma unroll 1
   for(int i = 0; i < 32; ++i)
   {
   // Make the key buffer twice the size of the key so it fits a Blake2S block
   // This way, we don't need a temp buffer in the Blake2S function.
   uchar input[BLAKE2S_BLOCK_SIZE], key[BLAKE2S_BLOCK_SIZE] = { 0 };

   // Copy input and key to their buffers
   CopyBytes(input, A + bufidx, BLAKE2S_BLOCK_SIZE);
   CopyBytes(key, B + bufidx, BLAKE2S_KEY_SIZE);

   // PRF
   Blake2S((uint *)input, (uint *)key);

   // Calculate the next buffer pointer
   bufidx = 0;

   for(int x = 0; x < BLAKE2S_OUT_SIZE; ++x)
   bufidx += input
;

   // bufidx a uchar now - always mod 255
   //bufidx &= (FASTKDF_BUFFER_SIZE - 1);

   // Modify the salt buffer
   XORBytesInPlace(B + bufidx, input, bufidx);

   if(bufidx < BLAKE2S_KEY_SIZE)
   {
   // Head modified, tail updated
   // this was made off the original code... wtf
   //CopyBytes(B + FASTKDF_BUFFER_SIZE + bufidx, B + bufidx, min(BLAKE2S_OUT_SIZE, BLAKE2S_KEY_SIZE -

bufidx));
   CopyBytes(B + FASTKDF_BUFFER_SIZE + bufidx, B + bufidx, BLAKE2S_KEY_SIZE - bufidx);
   }
   else if((FASTKDF_BUFFER_SIZE - bufidx) < BLAKE2S_OUT_SIZE)
   {
   // Tail modified, head updated
   CopyBytes(B, B + FASTKDF_BUFFER_SIZE, BLAKE2S_OUT_SIZE - (FASTKDF_BUFFER_SIZE - bufidx));
   }
   }

   // Modify and copy into the output buffer

   // Damned compiler crashes
   // Fuck you, AMD

   //for(uint i = 0; i < output_len; ++i, ++bufidx)
   // output = B[bufidx] ^ A;

   uint left = FASTKDF_BUFFER_SIZE - bufidx;
   //uint left = (~bufidx) + 1

   if(left < output_len)
   {
   XORBytes(output, B + bufidx, A, left);
   XORBytes(output + left, B, A + left, output_len - left);
   }
   else
   {
   XORBytes(output, B + bufidx, A, output_len);
   }
}

#define SALSA_CORE(state) do { \
   state.s4 ^= rotate(state.s0 + state.sc, 7U); state.s8 ^= rotate(state.s4 + state.s0, 9U); state.sc ^=

rotate(state.s8 + state.s4, 13U); state.s0 ^= rotate(state.sc + state.s8, 18U); \
   state.s9 ^= rotate(state.s5 + state.s1, 7U); state.sd ^= rotate(state.s9 + state.s5, 9U); state.s1 ^=

rotate(state.sd + state.s9, 13U); state.s5 ^= rotate(state.s1 + state.sd, 18U); \
   state.se ^= rotate(state.sa + state.s6, 7U); state.s2 ^= rotate(state.se + state.sa, 9U); state.s6 ^=

rotate(state.s2 + state.se, 13U); state.sa ^= rotate(state.s6 + state.s2, 18U); \
   state.s3 ^= rotate(state.sf + state.sb, 7U); state.s7 ^= rotate(state.s3 + state.sf, 9U); state.sb ^=

rotate(state.s7 + state.s3, 13U); state.sf ^= rotate(state.sb + state.s7, 18U); \
   state.s1 ^= rotate(state.s0 + state.s3, 7U); state.s2 ^= rotate(state.s1 + state.s0, 9U); state.s3 ^=

rotate(state.s2 + state.s1, 13U); state.s0 ^= rotate(state.s3 + state.s2, 18U); \
   state.s6 ^= rotate(state.s5 + state.s4, 7U); state.s7 ^= rotate(state.s6 + state.s5, 9U); state.s4 ^=

rotate(state.s7 + state.s6, 13U); state.s5 ^= rotate(state.s4 + state.s7, 18U); \
   state.sb ^= rotate(state.sa + state.s9, 7U); state.s8 ^= rotate(state.sb + state.sa, 9U); state.s9 ^=

rotate(state.s8 + state.sb, 13U); state.sa ^= rotate(state.s9 + state.s8, 18U); \
   state.sc ^= rotate(state.sf + state.se, 7U); state.sd ^= rotate(state.sc + state.sf, 9U); state.se ^=

rotate(state.sd + state.sc, 13U); state.sf ^= rotate(state.se + state.sd, 18U); \
} while(0)

uint16 salsa_small_scalar_rnd(uint16 X)
{
   uint16 st = X;

   #if SALSA_SMALL_UNROLL == 1

   for(int i = 0; i < 10; ++i)
   {
   SALSA_CORE(st);
   }

   #elif SALSA_SMALL_UNROLL == 2

   for(int i = 0; i < 5; ++i)
   {
   SALSA_CORE(st);
   SALSA_CORE(st);
   }

   #elif SALSA_SMALL_UNROLL == 3

   for(int i = 0; i < 4; ++i)
   {
   SALSA_CORE(st);
   if(i == 3) break;
   SALSA_CORE(st);
   SALSA_CORE(st);
   }

   #elif SALSA_SMALL_UNROLL == 4

   for(int i = 0; i < 3; ++i)
   {
   SALSA_CORE(st);
   SALSA_CORE(st);
   if(i == 2) break;
   SALSA_CORE(st);
   SALSA_CORE(st);
   }

   #else

   for(int i = 0; i < 2; ++i)
   {
   SALSA_CORE(st);
   SALSA_CORE(st);
   SALSA_CORE(st);
   SALSA_CORE(st);
   SALSA_CORE(st);
   }

   #endif

   return(X + st);
}

#define CHACHA_CORE_PARALLEL(state) do { \
   state[0] += state[1]; state[3] = rotate(state[3] ^ state[0], (uint4)(16U, 16U, 16U, 16U)); \
   state[2] += state[3]; state[1] = rotate(state[1] ^ state[2], (uint4)(12U, 12U, 12U, 12U)); \
   state[0] += state[1]; state[3] = rotate(state[3] ^ state[0], (uint4)(8U, 8U, 8U, 8U)); \
   state[2] += state[3]; state[1] = rotate(state[1] ^ state[2], (uint4)(7U, 7U, 7U, 7U)); \
   \
   state[0] += state[1].yzwx; state[3].wxyz = rotate(state[3].wxyz ^ state[0], (uint4)(16U, 16U, 16U, 16U));

\
   state[2].zwxy += state[3].wxyz; state[1].yzwx = rotate(state[1].yzwx ^ state[2].zwxy, (uint4)(12U, 12U,

12U, 12U)); \
   state[0] += state[1].yzwx; state[3].wxyz = rotate(state[3].wxyz ^ state[0], (uint4)(8U, 8U, 8U, 8U)); \
   state[2].zwxy += state[3].wxyz; state[1].yzwx = rotate(state[1].yzwx ^ state[2].zwxy, (uint4)(7U, 7U, 7U,

7U)); \
} while(0)

uint16 chacha_small_parallel_rnd(uint16 X)
{
   uint4 t, st[4];

   ((uint16 *)st)[0] = X;

   #if CHACHA_SMALL_UNROLL == 1

   for(int i = 0; i < 10; ++i)
   {
   CHACHA_CORE_PARALLEL(st);
   }

   #elif CHACHA_SMALL_UNROLL == 2

   for(int i = 0; i < 5; ++i)
   {
   CHACHA_CORE_PARALLEL(st);
   CHACHA_CORE_PARALLEL(st);
   }

   #elif CHACHA_SMALL_UNROLL == 3

   for(int i = 0; i < 4; ++i)
   {
   CHACHA_CORE_PARALLEL(st);
   if(i == 3) break;
   CHACHA_CORE_PARALLEL(st);
   CHACHA_CORE_PARALLEL(st);
   }

   #elif CHACHA_SMALL_UNROLL == 4

   for(int i = 0; i < 3; ++i)
   {
   CHACHA_CORE_PARALLEL(st);
   CHACHA_CORE_PARALLEL(st);
   if(i == 2) break;
   CHACHA_CORE_PARALLEL(st);
   CHACHA_CORE_PARALLEL(st);
   }

   #else

   for(int i = 0; i < 2; ++i)
   {
   CHACHA_CORE_PARALLEL(st);
   CHACHA_CORE_PARALLEL(st);
   CHACHA_CORE_PARALLEL(st);
   CHACHA_CORE_PARALLEL(st);
   CHACHA_CORE_PARALLEL(st);
   }

   #endif

   return(X + ((uint16 *)st)[0]);
}

void neoscrypt_blkmix(uint16 *XV, bool alg)
{

   /* NeoScrypt flow: Scrypt flow:
   Xa ^= Xd; M(Xa'); Ya = Xa"; Xa ^= Xb; M(Xa'); Ya = Xa";
   Xb ^= Xa"; M(Xb'); Yb = Xb"; Xb ^= Xa"; M(Xb'); Yb = Xb";
   Xc ^= Xb"; M(Xc'); Yc = Xc"; Xa" = Ya;
   Xd ^= Xc"; M(Xd'); Yd = Xd"; Xb" = Yb;
   Xa" = Ya; Xb" = Yc;
   Xc" = Yb; Xd" = Yd; */

   XV[0] ^= XV[3];

   if(!alg)
   {
   XV[0] = salsa_small_scalar_rnd(XV[0]); XV[1] ^= XV[0];
   XV[1] = salsa_small_scalar_rnd(XV[1]); XV[2] ^= XV[1];
   XV[2] = salsa_small_scalar_rnd(XV[2]); XV[3] ^= XV[2];
   XV[3] = salsa_small_scalar_rnd(XV[3]);
   }
   else
   {
   XV[0] = chacha_small_parallel_rnd(XV[0]); XV[1] ^= XV[0];
   XV[1] = chacha_small_parallel_rnd(XV[1]); XV[2] ^= XV[1];
   XV[2] = chacha_small_parallel_rnd(XV[2]); XV[3] ^= XV[2];
   XV[3] = chacha_small_parallel_rnd(XV[3]);
   }

   XV[1] ^= XV[2];
   XV[2] ^= XV[1];
   XV[1] ^= XV[2];
}

void ScratchpadStore(__global void *V, void *X, uchar idx)
{
   ((__global ulong16 *)V)[idx] = ((ulong16 *)X)[0];
   ((__global ulong16 *)V)[idx + 128] = ((ulong16 *)X)[1];
}
void ScratchpadMix(void *X, const __global void *V, uchar idx)
{
   ((ulong16 *)X)[0] ^= ((__global ulong16 *)V)[idx];
   ((ulong16 *)X)[1] ^= ((__global ulong16 *)V)[idx + 128];
}

void SMix(uint16 *X, __global uint16 *V, bool flag)
{
   #pragma unroll 1
   for(int i = 0; i < 128; ++i)
   {
   ScratchpadStore(V, X, i);
   neoscrypt_blkmix(X, flag);
   }

   #pragma unroll 1
   for(int i = 0; i < 128; ++i)
   {
   const uint idx = convert_uchar(((uint *)X)[48] & 0x7F);
   ScratchpadMix(X, V, idx);
   neoscrypt_blkmix(X, flag);
   }
}

__attribute__((reqd_work_group_size(WORKSIZE, 1, 1)))
__kernel void search(__global const uchar* restrict input, __global uint* restrict output, __global uchar

*padcache, const uint target)
{
#define CONSTANT_N 128
#define CONSTANT_r 2
   // X = CONSTANT_r * 2 * BLOCK_SIZE(64); Z is a copy of X for ChaCha
   uint16 X[4], Z[4];
   /* V = CONSTANT_N * CONSTANT_r * 2 * BLOCK_SIZE */
   __global ulong16 *V = (__global ulong16 *)(padcache + (0x8000 * (get_global_id(0) % MAX_GLOBAL_THREADS)));
   uchar outbuf[32];
   uchar data[PASSWORD_LEN];

   ((ulong8 *)data)[0] = ((__global const ulong8 *)input)[0];
   ((ulong *)data)[8] = ((__global const ulong *)input)[8];
   ((uint *)data)[18] = ((__global const uint *)input)[18];
   ((uint *)data)[19] = get_global_id(0);

   // X = KDF(password, salt)
   fastkdf(data, data, PASSWORD_LEN, (uchar *)X, 256);

   // Process ChaCha 1st, Salsa 2nd and XOR them - run that through PBKDF2
   CopyBytes128(Z, X, 2);

   // X = SMix(X); X & Z are swapped, repeat.
   for(bool flag = false;; ++flag)
   {
   SMix(X, V, flag);
   if(flag) break;
   SwapBytes128(X, Z, 256);
   }

   // blkxor(X, Z)
   ((ulong16 *)X)[0] ^= ((ulong16 *)Z)[0];
   ((ulong16 *)X)[1] ^= ((ulong16 *)Z)[1];

   // output = KDF(password, X)
   fastkdf(data, (uchar *)X, FASTKDF_BUFFER_SIZE, outbuf, 32);
   if(((uint *)outbuf)[7] <= target) output[atomic_add(output + 0xFF, 1)] = get_global_id(0);
}

Delete the old neoscrypt bin file and new bin created will be 4% faster
That is all I found myself.

Please remember, you buy better mods directly from wolf0. He has X13 mod for sale that gives another 50% boost to X13 algo hash. He not selling his latest neoscrypt algo.

kopam

hero member

Activity: 518

Merit: 500

What are currently the best hashrates one can get with 7950 or 280x ?

thevictimofuktyranny

legendary

Activity: 1092

Merit: 1004

More uptodate info on mods?

On these threads:
https://bitcointalksearch.org/topic/x11-x13-x15-with-50-more-hashrate-7mhs-on-280x-or-10-mhs-on-290x-854257

and

https://bitcointalksearch.org/topic/ann-sgminer-v5-optimized-x11x13neoscryptlyra2reetc-kernel-switch-miner-632503

K1773R

legendary

Activity: 1792

Merit: 1008

/dev/null

Quote from: Blawpaw on December 21, 2014, 03:46:02 PM

Has anyone succeded in getting 2 GPUs -AMD 6950 \ 7970 working with X13 algo?
If so please show me your config file!

I've tried the kernel x13modold and it doesnt work...

any advice? Huh

it does, even with 5??? cards
tell us what isnt working. provide more details.

restless

legendary

Activity: 1151

Merit: 1001

latest x13 optimisations are not compatible with 6xxx and 5xxx radeons
Best way is to run with -d switch - pointing to your 7970 and another instance using x13modold/marucoin-modold , again with -d but pointing to 6950 card
Best speed achieved by 6970 is ~ 1.4MH/s for x13

Blawpaw

legendary

Activity: 1596

Merit: 1027

Has anyone succeded in getting 2 GPUs -AMD 6950 \ 7970 working with X13 algo?
If so please show me your config file!

I've tried the kernel x13modold and it doesnt work...

any advice? Huh

kenshirothefist

sr. member

Activity: 457

Merit: 273

Guys, why are you keeping this thread alive? All the excellent work that has been done by the OP author - lasybear and other developers has already been included in the "official" sgminer release (it's a community work, therefore I guess it can be called in any way, if you don't like "official" then it can be the "sgminer-dev" version). Information gets lost if it is kept in several separate threads, therefore I welcome you to discuss topics regarding sgminer and related stuff in this thread: https://bitcointalksearch.org/topic/ann-sgminer-v5-optimized-x11x13neoscryptlyra2reetc-kernel-switch-miner-632503

If you agree, we could consider this thread closed and continue the debate here: https://bitcointalk.org/index.php?topic=632503.new#new

Topic: [ANN][X11/X13] X11 (Darkcoin)/X13 (Marucoin) miner (based on sph-sgminer) (Read 351569 times)