Pages:
Author

Topic: Krnlx Nvidia xevan miner - 3.3+ mh on 1070, ~6mh on 80ti FREE, OPENSOURCE - page 13. (Read 30123 times)

sr. member
Activity: 266
Merit: 250
I don't undestand why you spend so much time on a non profitable shitcoin. So many coders, and no progress. I buildt the shit on windows in 30 minutes and it works perfectly fine... 5 pages already with puppet postings.

So, sell it for 200 bucks/copy and claim bounties, make yourself richer, you know it  Grin Grin Grin Grin

It's just fun, debugging always train your skills  Wink
copper member
Activity: 970
Merit: 287
Per aspera ad astra
Dear sp_,

Stop being an ass for once.
Seriously now, is it worth it?!
full member
Activity: 161
Merit: 100
I don't undestand why you spend so much time on a non profitable shitcoin. So many coders, and no progress. I buildt the shit on windows in 30 minutes and it works perfectly fine... 5 pages already with puppet postings.
If it is shitcoin and non profit why don't you share binaries?
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
I don't undestand why you spend so much time on a non profitable shitcoin. So many coders, and no progress. I buildt the shit on windows in 30 minutes and it works perfectly fine... 5 pages already with puppet postings.
sr. member
Activity: 266
Merit: 250
it works fine with cuda8, ubuntu16,suprnova, gtx1080ti get  5.6m @core clock+100, 200watts.
Compilation with no problem under Ubuntu 16.04 and Cuda 8.
But, when I start the ccminer, he ask me for the Cuda 7.5 library...
How do you make to use the Cuda 8 library (Cuda 7.5 is not installed on this new RIG installation) ?

Check /usr/local  for cuda folders. Default coming in configure.sh is cuda 7.5
legendary
Activity: 1260
Merit: 1046
it works fine with cuda8, ubuntu16,suprnova, gtx1080ti get  5.6m @core clock+100, 200watts.
Compilation with no problem under Ubuntu 16.04 and Cuda 8.
But, when I start the ccminer, he ask me for the Cuda 7.5 library...
How do you make to use the Cuda 8 library (Cuda 7.5 is not installed on this new RIG installation) ?
newbie
Activity: 1
Merit: 0
i try to build it on visual studio 2013 but i get next error

error LNK1181: cannot open input file 'libcurl.x64.lib' ccminer

can someone help me ?
newbie
Activity: 15
Merit: 0
Somehow I got stuck at the api.cpp stage during compilation:

ccminer-util.o: In function `stratum_handle_method':
util.cpp:(.text+0x4af3): undefined reference to `gpu_power(cgpu_info*)'
util.cpp:(.text+0x4b51): undefined reference to `nvml_get_current_clocks(int, unsigned int*, unsigned int*)'
util.cpp:(.text+0x4b59): undefined reference to `gpu_info(cgpu_info*)'
ccminer-api.o: In function `gpustatus(int)':
api.cpp:(.text+0x554): undefined reference to `gpu_busid(cgpu_info*)'
api.cpp:(.text+0x560): undefined reference to `gpu_temp(cgpu_info*)'
api.cpp:(.text+0x56b): undefined reference to `gpu_fanpercent(cgpu_info*)'
api.cpp:(.text+0x57b): undefined reference to `gpu_fanrpm(cgpu_info*)'
api.cpp:(.text+0x587): undefined reference to `gpu_power(cgpu_info*)'
ccminer-api.o: In function `gpuhwinfos(int)':
api.cpp:(.text+0xbf0): undefined reference to `gpu_busid(cgpu_info*)'
api.cpp:(.text+0xbfc): undefined reference to `gpu_temp(cgpu_info*)'
api.cpp:(.text+0xc07): undefined reference to `gpu_fanpercent(cgpu_info*)'
api.cpp:(.text+0xc17): undefined reference to `gpu_fanrpm(cgpu_info*)'
api.cpp:(.text+0xc23): undefined reference to `gpu_pstate(cgpu_info*)'
api.cpp:(.text+0xc2f): undefined reference to `gpu_power(cgpu_info*)'
api.cpp:(.text+0xc42): undefined reference to `gpu_info(cgpu_info*)'
collect2: error: ld returned 1 exit status
make[2]: *** [ccminer] Error 1

Any ideas on how to fix this? Compiling on Ubuntu 14.04 & CUDA 8.0
copper member
Activity: 970
Merit: 287
Per aspera ad astra
It has something to do with platform target (x86/x64) for sure.
See below how it behaves for x86.

sr. member
Activity: 266
Merit: 250
Next part of check, now GPU part. All the same, x17.cu is following:


http://paste.ubuntu.com/25681179/

Thank you, you're really helping, will check now.

UPDATE: Hmm, behaviour is the same, but what is more strange, pool accepts shares? That shouldn't be this way, my mind blows off, think everybody should wait for krnlx comments, he's the code author  Grin
newbie
Activity: 7
Merit: 0
sr. member
Activity: 266
Merit: 250
Next part of check, now GPU part. All the same, x17.cu is following:

Code:
/**
 * X17 algorithm (X15 + sha512 + haval256)
 */

extern "C" {
#include "sph/sph_blake.h"
#include "sph/sph_bmw.h"
#include "sph/sph_groestl.h"
#include "sph/sph_skein.h"
#include "sph/sph_jh.h"
#include "sph/sph_keccak.h"

#include "sph/sph_luffa.h"
#include "sph/sph_cubehash.h"
#include "sph/sph_shavite.h"
#include "sph/sph_simd.h"
#include "sph/sph_echo.h"

#include "sph/sph_hamsi.h"
#include "sph/sph_fugue.h"

#include "sph/sph_shabal.h"
#include "sph/sph_whirlpool.h"

#include "sph/sph_sha2.h"
#include "sph/sph_haval.h"
}

#include "miner.h"
#include "cuda_helper.h"
#include "x11/cuda_x11.h"

#define NBN 2

// Memory for the hash functions
static uint32_t *d_hash[MAX_GPUS];
static uint32_t *d_resNonce[MAX_GPUS];
static uint32_t *h_resNonce[MAX_GPUS];

extern void x13_hamsi_fugue512_cpu_hash_64(int thr_id, uint32_t threads, uint32_t *d_hash);

extern void x14_shabal512_cpu_hash_64(int thr_id, uint32_t threads, uint32_t *d_hash);

extern void x15_whirlpool_cpu_init(int thr_id, uint32_t threads, int mode);
extern void x15_whirlpool_cpu_hash_64(int thr_id, uint32_t threads, uint32_t *d_hash);
extern void x15_whirlpool_cpu_free(int thr_id);

extern void x17_sha512_cpu_hash_64(int thr_id, uint32_t threads, uint32_t *d_hash);

extern void x17_haval256_cpu_hash_64_final(int thr_id, uint32_t threads, uint32_t *d_hash, uint32_t* resNonce, uint64_t target);
extern void bmw256_cpu_hash_32_full(int thr_id, uint32_t threads, uint32_t *g_hash);
extern void quark_bmw512_cpu_hash_64x(int thr_id, uint32_t threads, uint32_t *d_nonceVector, uint32_t *d_hash);
extern void quark_groestl512(int thr_id, uint32_t threads, uint32_t *d_hash);
extern void groestl512_cpu_init(int thr_id, uint32_t threads);
extern void groestl512_cpu_hash(int thr_id, uint32_t threads, uint32_t *d_hash);
extern void xevan_skein512(int thr_id, uint32_t threads, uint32_t *d_hash);
extern void keccak_xevan_cpu_hash_64(int thr_id, uint32_t threads, uint32_t *d_hash);
extern void qubit_luffa512_cpu_hash_80(int thr_id, uint32_t threads, uint32_t startNounce, uint32_t *d_outputHash);
extern void x11_cubehash512_cpu_hash_64(int thr_id, uint32_t threads, uint32_t *d_hash);
extern void x11_shavite512_cpu_hash_64(int thr_id, uint32_t threads, uint32_t *d_hash);
extern void xevan_shavite512_cpu_hash_64(int thr_id, uint32_t threads, uint32_t *d_hash);
extern void x11_echo512_cpu_hash_64(int thr_id, uint32_t threads, uint32_t *d_hash);
extern void xevan_echo512_cpu_hash_64(int thr_id, uint32_t threads, uint32_t *d_hash);
extern void x11_simd512_cpu_hash_64(int thr_id, uint32_t threads, uint32_t *d_hash);
extern void x11_simd512_cpu_init(int thr_id, uint32_t threads);
extern void xevan_simd512_cpu_hash_64(int thr_id, uint32_t threads, uint32_t *d_hash);
extern void x13_hamsi512_cpu_hash_64(int thr_id, uint32_t threads, uint32_t *d_hash);
extern void x13_fugue512_cpu_hash_64(int thr_id, uint32_t threads, uint32_t *d_hash);
extern void xevan_whirlpool_cpu_hash_64(int thr_id, uint32_t threads, uint32_t *d_hash);
extern void xevan_sha512_cpu_hash_64(int thr_id, int threads, uint32_t *d_hash);
extern void xevan_haval512_cpu_hash_64(int thr_id, uint32_t threads, uint32_t *d_hash);
extern void quark_blake512_cpu_hash_64(int thr_id, uint32_t threads, uint32_t *d_nonceVector, uint32_t *d_outputHash);
extern void xevan_blake512_cpu_hash_64(int thr_id, uint32_t threads, uint32_t *d_hash);
extern void xevan_haval512_cpu_hash_64_final(int thr_id, uint32_t threads, uint32_t *d_hash, uint32_t *resNonce, uint64_t target);
extern void xevan_groestl512_cpu_hash(int thr_id, uint32_t threads, uint32_t *d_hash);
extern void keccak_xevan_cpu_hash_64_A(int thr_id, uint32_t threads, uint32_t *d_hash);
extern void quark_blake512_cpu_hash_64(int thr_id, uint32_t threads, uint32_t *d_nonceVector, uint32_t *d_outputHash);
extern void quark_blake512_cpu_hash_128(int thr_id, uint32_t threads, uint32_t *d_outputHash);
extern void quark_groestl512_cpu_hash_128(int thr_id, uint32_t threads,  uint32_t *d_hash);
extern void x11_luffa512_cpu_hash_128(int thr_id, uint32_t threads,uint32_t *d_hash);



// X17 CPU Hash (Validation)
extern "C" void x17hash(void *output, const void *input)
{
uint32_t _ALIGN(64) hash[32]; // 128 bytes required
const int dataLen = 128;
//return;
sph_blake512_context     ctx_blake;
sph_bmw512_context       ctx_bmw;
sph_groestl512_context   ctx_groestl;
sph_skein512_context     ctx_skein;
sph_jh512_context        ctx_jh;
sph_keccak512_context    ctx_keccak;
sph_luffa512_context     ctx_luffa;
sph_cubehash512_context  ctx_cubehash;
sph_shavite512_context   ctx_shavite;
sph_simd512_context      ctx_simd;
sph_echo512_context      ctx_echo;
sph_hamsi512_context     ctx_hamsi;
sph_fugue512_context     ctx_fugue;
sph_shabal512_context    ctx_shabal;
sph_whirlpool_context    ctx_whirlpool;
sph_sha512_context       ctx_sha512;
sph_haval256_5_context   ctx_haval;

//print_hash(input,20);
sph_blake512_init(&ctx_blake);
sph_blake512(&ctx_blake, input, 80);
sph_blake512_close(&ctx_blake, hash);
//print_hash(hash,32);
memset(&hash[16], 0, 64);

sph_bmw512_init(&ctx_bmw);
sph_bmw512(&ctx_bmw, hash, dataLen);
sph_bmw512_close(&ctx_bmw, hash);
//print_hash(hash,32);
//for(int i=0;i<32;i++)hash[i]=0;
sph_groestl512_init(&ctx_groestl);
sph_groestl512(&ctx_groestl, hash, dataLen);
sph_groestl512_close(&ctx_groestl, hash);
//print_hash(hash,32);
//for(int i=0;i<32;i++)hash[i]=0;

sph_skein512_init(&ctx_skein);
sph_skein512(&ctx_skein, hash, dataLen);
sph_skein512_close(&ctx_skein, hash);

//print_hash(hash,32);
//for(int i=0;i<32;i++)hash[i]=0;
sph_jh512_init(&ctx_jh);
sph_jh512(&ctx_jh, hash, dataLen);
sph_jh512_close(&ctx_jh, hash);
//print_hash(hash,32);

sph_keccak512_init(&ctx_keccak);
sph_keccak512(&ctx_keccak, hash, dataLen);
sph_keccak512_close(&ctx_keccak, hash);
//print_hash(hash,32);
//for(int i=0;i<32;i++)hash[i]=0;
sph_luffa512_init(&ctx_luffa);
sph_luffa512(&ctx_luffa, hash, dataLen);
sph_luffa512_close(&ctx_luffa, hash);
//print_hash(hash,32);
//for(int i=0;i<32;i++)hash[i]=0;
sph_cubehash512_init(&ctx_cubehash);
sph_cubehash512(&ctx_cubehash, hash, dataLen);
sph_cubehash512_close(&ctx_cubehash, hash);
//print_hash(hash,32);
sph_shavite512_init(&ctx_shavite);
sph_shavite512(&ctx_shavite, hash, dataLen);
sph_shavite512_close(&ctx_shavite, hash);
//print_hash(hash,32);
sph_simd512_init(&ctx_simd);
sph_simd512(&ctx_simd, hash, dataLen);
sph_simd512_close(&ctx_simd, hash);
//print_hash(hash,32);
sph_echo512_init(&ctx_echo);
sph_echo512(&ctx_echo, hash, dataLen);
sph_echo512_close(&ctx_echo, hash);
//print_hash(hash,32);
//for(int i=0;i<32;i++)hash[i]=0;
sph_hamsi512_init(&ctx_hamsi);
sph_hamsi512(&ctx_hamsi, hash, dataLen);
sph_hamsi512_close(&ctx_hamsi, hash);
//print_hash(hash,32);
//for(int i=0;i<32;i++)hash[i]=0;
sph_fugue512_init(&ctx_fugue);
sph_fugue512(&ctx_fugue, hash, dataLen);
sph_fugue512_close(&ctx_fugue, hash);
//print_hash(hash,32);
sph_shabal512_init(&ctx_shabal);
sph_shabal512(&ctx_shabal, hash, dataLen);
sph_shabal512_close(&ctx_shabal, hash);
//print_hash(hash,32);
sph_whirlpool_init(&ctx_whirlpool);
sph_whirlpool(&ctx_whirlpool, hash, dataLen);
sph_whirlpool_close(&ctx_whirlpool, hash);
//print_hash(hash,32);
//for(int i=0;i<32;i++)hash[i]=0;
sph_sha512_init(&ctx_sha512);
sph_sha512(&ctx_sha512,(const void*) hash, dataLen);
sph_sha512_close(&ctx_sha512,(void*) hash);
//print_hash(hash,32);
sph_haval256_5_init(&ctx_haval);
sph_haval256_5(&ctx_haval,(const void*) hash, dataLen);
sph_haval256_5_close(&ctx_haval, hash);
//print_hash(hash,32);

memset(&hash[8], 0, dataLen - 32);

sph_blake512_init(&ctx_blake);
sph_blake512(&ctx_blake, hash, dataLen);
sph_blake512_close(&ctx_blake, hash);

//print_hash(hash,32);

sph_bmw512_init(&ctx_bmw);
sph_bmw512(&ctx_bmw, hash, dataLen);
sph_bmw512_close(&ctx_bmw, hash);

sph_groestl512_init(&ctx_groestl);
sph_groestl512(&ctx_groestl, hash, dataLen);
sph_groestl512_close(&ctx_groestl, hash);

sph_skein512_init(&ctx_skein);
sph_skein512(&ctx_skein, hash, dataLen);
sph_skein512_close(&ctx_skein, hash);

sph_jh512_init(&ctx_jh);
sph_jh512(&ctx_jh, hash, dataLen);
sph_jh512_close(&ctx_jh, hash);

sph_keccak512_init(&ctx_keccak);
sph_keccak512(&ctx_keccak, hash, dataLen);
sph_keccak512_close(&ctx_keccak, hash);

sph_luffa512_init(&ctx_luffa);
sph_luffa512(&ctx_luffa, hash, dataLen);
sph_luffa512_close(&ctx_luffa, hash);

sph_cubehash512_init(&ctx_cubehash);
sph_cubehash512(&ctx_cubehash, hash, dataLen);
sph_cubehash512_close(&ctx_cubehash, hash);

sph_shavite512_init(&ctx_shavite);
sph_shavite512(&ctx_shavite, hash, dataLen);
sph_shavite512_close(&ctx_shavite, hash);

sph_simd512_init(&ctx_simd);
sph_simd512(&ctx_simd, hash, dataLen);
sph_simd512_close(&ctx_simd, hash);

sph_echo512_init(&ctx_echo);
sph_echo512(&ctx_echo, hash, dataLen);
sph_echo512_close(&ctx_echo, hash);

sph_hamsi512_init(&ctx_hamsi);
sph_hamsi512(&ctx_hamsi, hash, dataLen);
sph_hamsi512_close(&ctx_hamsi, hash);

sph_fugue512_init(&ctx_fugue);
sph_fugue512(&ctx_fugue, hash, dataLen);
sph_fugue512_close(&ctx_fugue, hash);

sph_shabal512_init(&ctx_shabal);
sph_shabal512(&ctx_shabal, hash, dataLen);
sph_shabal512_close(&ctx_shabal, hash);

sph_whirlpool_init(&ctx_whirlpool);
sph_whirlpool(&ctx_whirlpool, hash, dataLen);
sph_whirlpool_close(&ctx_whirlpool, hash);

sph_sha512_init(&ctx_sha512);
sph_sha512(&ctx_sha512,(const void*) hash, dataLen);
sph_sha512_close(&ctx_sha512,(void*) hash);

//print_hash(hash,32);
sph_haval256_5_init(&ctx_haval);
sph_haval256_5(&ctx_haval,(const void*) hash, dataLen);
sph_haval256_5_close(&ctx_haval, hash);
//print_hash(hash,8);
memcpy(output, hash, 32);
}

static bool init[MAX_GPUS] = { 0 };


void print_hash(unsigned int *data,int size){
for(int i=0;i        gpulog(LOG_WARNING, 0,"%x ",data[i]);
gpulog(LOG_WARNING, 0,"-------------");
}


extern "C" int scanhash_x17(int thr_id, struct work* work, uint32_t max_nonce, unsigned long *hashes_done){

int dev_id = device_map[thr_id];

uint32_t *pdata = work->data;
uint32_t *ptarget = work->target;
const uint32_t first_nonce = pdata[19];
/*
uint32_t default_throughput = 1<<20;

if (strstr(device_name[dev_id], "GTX 970")) default_throughput+=256*256*6;
if (strstr(device_name[dev_id], "GTX 980")) default_throughput =1<<22;

uint32_t throughput = cuda_default_throughput(thr_id, default_throughput); // 19=256*256*8;
*/
uint32_t default_throughput;
if(device_sm[dev_id]<=500) default_throughput = 1<<20;
else if(device_sm[dev_id]<=520) default_throughput = 1<<21;
else if(device_sm[dev_id]>520) default_throughput = (1<<22) + (1<<21);

if((strstr(device_name[dev_id], "1070")))default_throughput = 1<<20;
if((strstr(device_name[dev_id], "1080")))default_throughput = 1<<20;

uint32_t throughput = cuda_default_throughput(thr_id, default_throughput); // 19=256*256*8;
if (init[thr_id]) throughput = min(throughput, max_nonce - first_nonce);

throughput&=0xFFFFFF70; //multiples of 128 due to simd_echo kernel

if (opt_benchmark)
((uint32_t*)ptarget)[7] = 0xff;

gpulog(LOG_INFO,thr_id,"target %x %x %x",ptarget[5], ptarget[6], ptarget[7]);
        gpulog(LOG_INFO,thr_id,"target %llx",*(uint64_t*)&ptarget[6]);

if (!init[thr_id])
{
cudaSetDevice(device_map[thr_id]);
if (opt_cudaschedule == -1 && gpu_threads == 1) {
cudaDeviceReset();
// reduce cpu usage
cudaSetDeviceFlags(cudaDeviceScheduleBlockingSync);
// cudaDeviceSetCacheConfig(cudaFuncCachePreferShared);
}
gpulog(LOG_INFO,thr_id, "Intensity set to %g, %u cuda threads", throughput2intensity(throughput), throughput);

// x11_simd_echo_512_cpu_init(thr_id, throughput);
x15_whirlpool_cpu_init(thr_id, throughput, 0);
groestl512_cpu_init(thr_id, throughput);
x11_simd512_cpu_init(thr_id, throughput);
//for(;;);
CUDA_SAFE_CALL(cudaMalloc(&d_hash[thr_id], 8 * sizeof(uint64_t) * throughput));
CUDA_SAFE_CALL(cudaMalloc(&d_resNonce[thr_id], NBN * sizeof(uint32_t)));
h_resNonce[thr_id] = (uint32_t*) malloc(NBN  * 8 * sizeof(uint32_t));
if(h_resNonce[thr_id] == NULL){
gpulog(LOG_ERR,thr_id,"Host memory allocation failed");
exit(EXIT_FAILURE);
}
init[thr_id] = true;
}

uint32_t _ALIGN(64) endiandata[20];
for (int k=0; k < 20; k++)
be32enc(&endiandata[k], pdata[k]);
// endiandata[k]=0;
// print_hash(endiandata,20);
quark_blake512_cpu_setBlock_80(thr_id, endiandata);
cudaMemset(d_resNonce[thr_id], 0xff, NBN*sizeof(uint32_t));
// x11_simd512_cpu_init(thr_id, throughput);
// for(;;);
do {
// Hash with CUDA


quark_blake512_cpu_hash_80(thr_id, throughput, pdata[19], d_hash[thr_id]);//A

cudaMemcpy(h_resNonce[thr_id], &d_hash[thr_id][0], 16 * sizeof(uint32_t), cudaMemcpyDeviceToHost);
print_hash(h_resNonce[thr_id], 16);
cudaMemset(d_hash[thr_id], 0x00, 16 * sizeof(uint32_t));

quark_groestl512_cpu_hash_128(thr_id, throughput, d_hash[thr_id]);

cudaMemcpy(h_resNonce[thr_id], &d_hash[thr_id][0], 16*sizeof(uint32_t), cudaMemcpyDeviceToHost);
print_hash(h_resNonce[thr_id],16);
cudaMemset(d_hash[thr_id], 0x00, 16 * sizeof(uint32_t));

quark_skein512_cpu_hash_64(thr_id, throughput, NULL, d_hash[thr_id]);

cudaMemcpy(h_resNonce[thr_id], &d_hash[thr_id][0], 16 * sizeof(uint32_t), cudaMemcpyDeviceToHost);
print_hash(h_resNonce[thr_id], 16);
cudaMemset(d_hash[thr_id], 0x00, 16 * sizeof(uint32_t));

quark_jh512_cpu_hash_64(thr_id, throughput, NULL, d_hash[thr_id]);//A //fast

cudaMemcpy(h_resNonce[thr_id], &d_hash[thr_id][0], 16 * sizeof(uint32_t), cudaMemcpyDeviceToHost);
print_hash(h_resNonce[thr_id], 16);
cudaMemset(d_hash[thr_id], 0x00, 16 * sizeof(uint32_t));
// keccak_xevan_cpu_hash_64_A(thr_id, throughput,  d_hash[thr_id]);//A

//cudaMemset(d_hash[thr_id], 0x00, 16*sizeof(uint32_t));
// x11_luffa512_cpu_hash_64(thr_id, throughput, d_hash[thr_id]); //P
//cudaMemcpy(h_resNonce[thr_id], &d_hash[thr_id][0], 16*sizeof(uint32_t), cudaMemcpyDeviceToHost);
//print_hash(h_resNonce[thr_id],16);
//cudaMemset(d_hash[thr_id], 0x00, 16*sizeof(uint32_t));

x11_luffa512_cpu_hash_128(thr_id, throughput, d_hash[thr_id]);//A

cudaMemcpy(h_resNonce[thr_id], &d_hash[thr_id][0], 16 * sizeof(uint32_t), cudaMemcpyDeviceToHost);
print_hash(h_resNonce[thr_id], 16);
cudaMemset(d_hash[thr_id], 0x00, 16 * sizeof(uint32_t));

//cudaMemcpy(h_resNonce[thr_id], &d_hash[thr_id][0], 16*sizeof(uint32_t), cudaMemcpyDeviceToHost);
//print_hash(h_resNonce[thr_id],16);
//for(;;);

x11_cubehash512_cpu_hash_64(thr_id, throughput, d_hash[thr_id]); //A 256

cudaMemcpy(h_resNonce[thr_id], &d_hash[thr_id][0], 16 * sizeof(uint32_t), cudaMemcpyDeviceToHost);
print_hash(h_resNonce[thr_id], 16);
cudaMemset(d_hash[thr_id], 0x00, 16 * sizeof(uint32_t));

xevan_shavite512_cpu_hash_64(thr_id, throughput, d_hash[thr_id]);//P slow r2

cudaMemcpy(h_resNonce[thr_id], &d_hash[thr_id][0], 16 * sizeof(uint32_t), cudaMemcpyDeviceToHost);
print_hash(h_resNonce[thr_id], 16);
cudaMemset(d_hash[thr_id], 0x00, 16 * sizeof(uint32_t));

                x11_simd512_cpu_hash_64(thr_id, throughput, d_hash[thr_id]);  //A slow r3

cudaMemcpy(h_resNonce[thr_id], &d_hash[thr_id][0], 16 * sizeof(uint32_t), cudaMemcpyDeviceToHost);
print_hash(h_resNonce[thr_id], 16);
cudaMemset(d_hash[thr_id], 0x00, 16 * sizeof(uint32_t));


//                cudaMemset(d_hash[thr_id], 0x00, 16*sizeof(uint32_t));


// xevan_echo512_cpu_hash_64(thr_id, throughput, d_hash[thr_id]); //slow r1

//                cudaMemcpy(h_resNonce[thr_id], &d_hash[thr_id][0], 16*sizeof(uint32_t), cudaMemcpyDeviceToHost);
// print_hash(h_resNonce[thr_id],16);


  //              cudaMemset(d_hash[thr_id], 0x00, 16*sizeof(uint32_t));

x11_echo512_cpu_hash_64(thr_id, throughput, d_hash[thr_id]);//A

cudaMemcpy(h_resNonce[thr_id], &d_hash[thr_id][0], 16 * sizeof(uint32_t), cudaMemcpyDeviceToHost);
print_hash(h_resNonce[thr_id], 16);
cudaMemset(d_hash[thr_id], 0x00, 16 * sizeof(uint32_t));



//                cudaMemcpy(h_resNonce[thr_id], &d_hash[thr_id][0], 16*sizeof(uint32_t), cudaMemcpyDeviceToHost);
 //               print_hash(h_resNonce[thr_id],16);

//for(;;);

                x13_hamsi512_cpu_hash_64(thr_id, throughput, d_hash[thr_id]); //fast

cudaMemcpy(h_resNonce[thr_id], &d_hash[thr_id][0], 16 * sizeof(uint32_t), cudaMemcpyDeviceToHost);
print_hash(h_resNonce[thr_id], 16);
cudaMemset(d_hash[thr_id], 0x00, 16 * sizeof(uint32_t));

x13_fugue512_cpu_hash_64(thr_id, throughput, d_hash[thr_id]); //fast ++

cudaMemcpy(h_resNonce[thr_id], &d_hash[thr_id][0], 16 * sizeof(uint32_t), cudaMemcpyDeviceToHost);
print_hash(h_resNonce[thr_id], 16);
cudaMemset(d_hash[thr_id], 0x00, 16 * sizeof(uint32_t));

x14_shabal512_cpu_hash_64(thr_id, throughput, d_hash[thr_id]); //fast

cudaMemcpy(h_resNonce[thr_id], &d_hash[thr_id][0], 16 * sizeof(uint32_t), cudaMemcpyDeviceToHost);
print_hash(h_resNonce[thr_id], 16);
cudaMemset(d_hash[thr_id], 0x00, 16 * sizeof(uint32_t));

xevan_whirlpool_cpu_hash_64(thr_id, throughput, d_hash[thr_id]); //opt2

cudaMemcpy(h_resNonce[thr_id], &d_hash[thr_id][0], 16 * sizeof(uint32_t), cudaMemcpyDeviceToHost);
print_hash(h_resNonce[thr_id], 16);
cudaMemset(d_hash[thr_id], 0x00, 16 * sizeof(uint32_t));

xevan_sha512_cpu_hash_64(thr_id, throughput, d_hash[thr_id]); //fast

cudaMemcpy(h_resNonce[thr_id], &d_hash[thr_id][0], 16 * sizeof(uint32_t), cudaMemcpyDeviceToHost);
print_hash(h_resNonce[thr_id], 16);
cudaMemset(d_hash[thr_id], 0x00, 16 * sizeof(uint32_t));

xevan_haval512_cpu_hash_64(thr_id, throughput, d_hash[thr_id]); //fast

cudaMemcpy(h_resNonce[thr_id], &d_hash[thr_id][0], 16 * sizeof(uint32_t), cudaMemcpyDeviceToHost);
print_hash(h_resNonce[thr_id], 16);
cudaMemset(d_hash[thr_id], 0x00, 16 * sizeof(uint32_t));



// xevan_blake512_cpu_hash_64(thr_id, throughput,  d_hash[thr_id]);//BAD
quark_blake512_cpu_hash_128(thr_id, throughput,  d_hash[thr_id]);//BAD

cudaMemcpy(h_resNonce[thr_id], &d_hash[thr_id][0], 16 * sizeof(uint32_t), cudaMemcpyDeviceToHost);
print_hash(h_resNonce[thr_id], 16);
cudaMemset(d_hash[thr_id], 0x00, 16 * sizeof(uint32_t));


//
                quark_bmw512_cpu_hash_64x(thr_id, throughput, NULL, d_hash[thr_id]);

cudaMemcpy(h_resNonce[thr_id], &d_hash[thr_id][0], 16 * sizeof(uint32_t), cudaMemcpyDeviceToHost);
print_hash(h_resNonce[thr_id], 16);
cudaMemset(d_hash[thr_id], 0x00, 16 * sizeof(uint32_t));

//                xevan_groestl512_cpu_hash(thr_id, throughput, d_hash[thr_id]);
quark_groestl512_cpu_hash_128(thr_id, throughput, d_hash[thr_id]);

cudaMemcpy(h_resNonce[thr_id], &d_hash[thr_id][0], 16 * sizeof(uint32_t), cudaMemcpyDeviceToHost);
print_hash(h_resNonce[thr_id], 16);


//                xevan_skein512(thr_id, throughput, d_hash[thr_id]);
                quark_skein512_cpu_hash_64(thr_id, throughput, NULL, d_hash[thr_id]);

cudaMemcpy(h_resNonce[thr_id], &d_hash[thr_id][0], 16 * sizeof(uint32_t), cudaMemcpyDeviceToHost);
print_hash(h_resNonce[thr_id], 16);
cudaMemset(d_hash[thr_id], 0x00, 16 * sizeof(uint32_t));


                quark_jh512_cpu_hash_64(thr_id, throughput, NULL, d_hash[thr_id]);

cudaMemcpy(h_resNonce[thr_id], &d_hash[thr_id][0], 16 * sizeof(uint32_t), cudaMemcpyDeviceToHost);
print_hash(h_resNonce[thr_id], 16);
cudaMemset(d_hash[thr_id], 0x00, 16 * sizeof(uint32_t));

//                keccak_xevan_cpu_hash_64_A(thr_id, throughput,  d_hash[thr_id]);
//                x11_luffa512_cpu_hash_64(thr_id, throughput, d_hash[thr_id]);
                x11_luffa512_cpu_hash_128(thr_id, throughput, d_hash[thr_id]);//A

cudaMemcpy(h_resNonce[thr_id], &d_hash[thr_id][0], 16 * sizeof(uint32_t), cudaMemcpyDeviceToHost);
print_hash(h_resNonce[thr_id], 16);
cudaMemset(d_hash[thr_id], 0x00, 16 * sizeof(uint32_t));


                x11_cubehash512_cpu_hash_64(thr_id, throughput, d_hash[thr_id]);

cudaMemcpy(h_resNonce[thr_id], &d_hash[thr_id][0], 16 * sizeof(uint32_t), cudaMemcpyDeviceToHost);
print_hash(h_resNonce[thr_id], 16);
cudaMemset(d_hash[thr_id], 0x00, 16 * sizeof(uint32_t));

                xevan_shavite512_cpu_hash_64(thr_id, throughput, d_hash[thr_id]);//move to shared

cudaMemcpy(h_resNonce[thr_id], &d_hash[thr_id][0], 16 * sizeof(uint32_t), cudaMemcpyDeviceToHost);
print_hash(h_resNonce[thr_id], 16);
cudaMemset(d_hash[thr_id], 0x00, 16 * sizeof(uint32_t));

                x11_simd512_cpu_hash_64(thr_id, throughput, d_hash[thr_id]);

cudaMemcpy(h_resNonce[thr_id], &d_hash[thr_id][0], 16 * sizeof(uint32_t), cudaMemcpyDeviceToHost);
print_hash(h_resNonce[thr_id], 16);
cudaMemset(d_hash[thr_id], 0x00, 16 * sizeof(uint32_t));


//                xevan_echo512_cpu_hash_64(thr_id, throughput, d_hash[thr_id]);
                x11_echo512_cpu_hash_64(thr_id, throughput, d_hash[thr_id]);

cudaMemcpy(h_resNonce[thr_id], &d_hash[thr_id][0], 16 * sizeof(uint32_t), cudaMemcpyDeviceToHost);
print_hash(h_resNonce[thr_id], 16);
cudaMemset(d_hash[thr_id], 0x00, 16 * sizeof(uint32_t));


                x13_hamsi512_cpu_hash_64(thr_id, throughput, d_hash[thr_id]);

cudaMemcpy(h_resNonce[thr_id], &d_hash[thr_id][0], 16 * sizeof(uint32_t), cudaMemcpyDeviceToHost);
print_hash(h_resNonce[thr_id], 16);
cudaMemset(d_hash[thr_id], 0x00, 16 * sizeof(uint32_t));

                x13_fugue512_cpu_hash_64(thr_id, throughput, d_hash[thr_id]);

cudaMemcpy(h_resNonce[thr_id], &d_hash[thr_id][0], 16 * sizeof(uint32_t), cudaMemcpyDeviceToHost);
print_hash(h_resNonce[thr_id], 16);
cudaMemset(d_hash[thr_id], 0x00, 16 * sizeof(uint32_t));

                x14_shabal512_cpu_hash_64(thr_id, throughput, d_hash[thr_id]);

cudaMemcpy(h_resNonce[thr_id], &d_hash[thr_id][0], 16 * sizeof(uint32_t), cudaMemcpyDeviceToHost);
print_hash(h_resNonce[thr_id], 16);
cudaMemset(d_hash[thr_id], 0x00, 16 * sizeof(uint32_t));

                xevan_whirlpool_cpu_hash_64(thr_id, throughput, d_hash[thr_id]);

cudaMemcpy(h_resNonce[thr_id], &d_hash[thr_id][0], 16 * sizeof(uint32_t), cudaMemcpyDeviceToHost);
print_hash(h_resNonce[thr_id], 16);
cudaMemset(d_hash[thr_id], 0x00, 16 * sizeof(uint32_t));

                xevan_sha512_cpu_hash_64(thr_id, throughput, d_hash[thr_id]);

cudaMemcpy(h_resNonce[thr_id], &d_hash[thr_id][0], 16 * sizeof(uint32_t), cudaMemcpyDeviceToHost);
print_hash(h_resNonce[thr_id], 16);
cudaMemset(d_hash[thr_id], 0x00, 16 * sizeof(uint32_t));








/*
for(int i = 10000;i< 10016;i++){
                cudaMemcpy(h_resNonce[thr_id], &d_hash[thr_id][16*i], 16*sizeof(uint32_t), cudaMemcpyDeviceToHost);
print_hash(h_resNonce[thr_id],8);
}
for(;;);

*/
xevan_haval512_cpu_hash_64_final(thr_id, throughput, d_hash[thr_id],d_resNonce[thr_id],*(uint64_t*)&ptarget[6]);

cudaMemcpy(h_resNonce[thr_id], d_resNonce[thr_id], NBN*sizeof(uint32_t), cudaMemcpyDeviceToHost);

print_hash(h_resNonce[thr_id], 16);

if (h_resNonce[thr_id][0] != UINT32_MAX){
const uint32_t Htarg = ptarget[7];
const uint32_t startNounce = pdata[19];
uint32_t vhash64[8];
be32enc(&endiandata[19], startNounce + h_resNonce[thr_id][0]);
x17hash(vhash64, endiandata);
// *hashes_done = pdata[19] - first_nonce + throughput + 1;
// pdata[19] = startNounce + h_resNonce[thr_id][0];
gpulog(LOG_WARNING, 0,"NONCE FOUND ");
// return 1;
if (vhash64[7] <= Htarg && fulltest(vhash64, ptarget)) {
int res = 1;
*hashes_done = pdata[19] - first_nonce + throughput + 1;
work_set_target_ratio(work, vhash64);
pdata[19] = startNounce + h_resNonce[thr_id][0];
if (h_resNonce[thr_id][1] != UINT32_MAX) {
pdata[21] = startNounce+h_resNonce[thr_id][1];
if(!opt_quiet)
gpulog(LOG_BLUE,dev_id,"Found 2nd nonce: %08x", pdata[21]);
be32enc(&endiandata[19], pdata[21]);
x17hash(vhash64, endiandata);
if (bn_hash_target_ratio(vhash64, ptarget) > work->shareratio[0]){
work_set_target_ratio(work, vhash64);
xchg(pdata[19],pdata[21]);
}
res++;
}
return res;
}
else {
gpulog(LOG_WARNING, thr_id, "result for %08x does not validate on CPU!", h_resNonce[thr_id][0]);
cudaMemset(d_resNonce[thr_id], 0xff, NBN*sizeof(uint32_t));
}
}

pdata[19] += throughput;
} while (!work_restart[thr_id].restart && ((uint64_t)max_nonce > (uint64_t)throughput + pdata[19]));

*hashes_done = pdata[19] - first_nonce + 1;

return 0;
}

// cleanup
extern "C" void free_x17(int thr_id)
{
if (!init[thr_id])
return;

cudaDeviceSynchronize();

free(h_resNonce[thr_id]);
cudaFree(d_resNonce[thr_id]);
cudaFree(d_hash[thr_id]);

x11_simd_echo_512_cpu_free(thr_id);
x15_whirlpool_cpu_free(thr_id);
cudaDeviceSynchronize();
init[thr_id] = false;
}

Thanks for everybody who helps in investigation.

UPDATE: think I've found the bug, it's in xevan_haval512_cpu_hash_64_final function, so h_resNonce[0] and h_resNonce[1] are always random on Win with exactly the same input data... Need confirmation that behaviour is different on *nix, of course. CUDA memcpy bug?
copper member
Activity: 970
Merit: 287
Per aspera ad astra
full member
Activity: 124
Merit: 100
newbie
Activity: 7
Merit: 0
welp i tried for my cookie anyways.  Anything else I can do to help lmk
sr. member
Activity: 266
Merit: 250
UPDATE: Hashing is exactly the same as on Win, so it comes out to be very weird problem. The one thing I suspect is ulong and uint compiler handling difference on Win and on *nix, for example, ulong will be equal to uint64_t on 64-bit build, but different on 32-bit build. Will look into this way, maybe will come up with something.
newbie
Activity: 7
Merit: 0
full member
Activity: 144
Merit: 100
Eager to learn
copied from my Linux build   x17.cu  hopefully it helps


Code:
/**
 * X17 algorithm (X15 + sha512 + haval256)
 */

extern "C" {
#include "sph/sph_blake.h"
#include "sph/sph_bmw.h"
#include "sph/sph_groestl.h"
#include "sph/sph_skein.h"
#include "sph/sph_jh.h"
#include "sph/sph_keccak.h"

#include "sph/sph_luffa.h"
#include "sph/sph_cubehash.h"
#include "sph/sph_shavite.h"
#include "sph/sph_simd.h"
#include "sph/sph_echo.h"

#include "sph/sph_hamsi.h"
#include "sph/sph_fugue.h"

#include "sph/sph_shabal.h"
#include "sph/sph_whirlpool.h"

#include "sph/sph_sha2.h"
#include "sph/sph_haval.h"
}

#include "miner.h"
#include "cuda_helper.h"
#include "x11/cuda_x11.h"

#define NBN 2

// Memory for the hash functions
static uint32_t *d_hash[MAX_GPUS];
static uint32_t *d_resNonce[MAX_GPUS];
static uint32_t *h_resNonce[MAX_GPUS];

extern void x13_hamsi_fugue512_cpu_hash_64(int thr_id, uint32_t threads, uint32_t *d_hash);

extern void x14_shabal512_cpu_hash_64(int thr_id, uint32_t threads, uint32_t *d_hash);

extern void x15_whirlpool_cpu_init(int thr_id, uint32_t threads, int mode);
extern void x15_whirlpool_cpu_hash_64(int thr_id, uint32_t threads, uint32_t *d_hash);
extern void x15_whirlpool_cpu_free(int thr_id);

extern void x17_sha512_cpu_hash_64(int thr_id, uint32_t threads, uint32_t *d_hash);

extern void x17_haval256_cpu_hash_64_final(int thr_id, uint32_t threads, uint32_t *d_hash, uint32_t* resNonce, uint64_t target);
extern void bmw256_cpu_hash_32_full(int thr_id, uint32_t threads, uint32_t *g_hash);
extern void quark_bmw512_cpu_hash_64x(int thr_id, uint32_t threads, uint32_t *d_nonceVector, uint32_t *d_hash);
extern void quark_groestl512(int thr_id, uint32_t threads, uint32_t *d_hash);
extern void groestl512_cpu_init(int thr_id, uint32_t threads);
extern void groestl512_cpu_hash(int thr_id, uint32_t threads, uint32_t *d_hash);
extern void xevan_skein512(int thr_id, uint32_t threads, uint32_t *d_hash);
extern void keccak_xevan_cpu_hash_64(int thr_id, uint32_t threads, uint32_t *d_hash);
extern void qubit_luffa512_cpu_hash_80(int thr_id, uint32_t threads, uint32_t startNounce, uint32_t *d_outputHash);
extern void x11_cubehash512_cpu_hash_64(int thr_id, uint32_t threads, uint32_t *d_hash);
extern void x11_shavite512_cpu_hash_64(int thr_id, uint32_t threads, uint32_t *d_hash);
extern void xevan_shavite512_cpu_hash_64(int thr_id, uint32_t threads, uint32_t *d_hash);
extern void x11_echo512_cpu_hash_64(int thr_id, uint32_t threads, uint32_t *d_hash);
extern void xevan_echo512_cpu_hash_64(int thr_id, uint32_t threads, uint32_t *d_hash);
extern void x11_simd512_cpu_hash_64(int thr_id, uint32_t threads, uint32_t *d_hash);
extern void x11_simd512_cpu_init(int thr_id, uint32_t threads);
extern void xevan_simd512_cpu_hash_64(int thr_id, uint32_t threads, uint32_t *d_hash);
extern void x13_hamsi512_cpu_hash_64(int thr_id, uint32_t threads, uint32_t *d_hash);
extern void x13_fugue512_cpu_hash_64(int thr_id, uint32_t threads, uint32_t *d_hash);
extern void xevan_whirlpool_cpu_hash_64(int thr_id, uint32_t threads, uint32_t *d_hash);
extern void xevan_sha512_cpu_hash_64(int thr_id, int threads, uint32_t *d_hash);
extern void xevan_haval512_cpu_hash_64(int thr_id, uint32_t threads, uint32_t *d_hash);
extern void quark_blake512_cpu_hash_64(int thr_id, uint32_t threads, uint32_t *d_nonceVector, uint32_t *d_outputHash);
extern void xevan_blake512_cpu_hash_64(int thr_id, uint32_t threads, uint32_t *d_hash);
extern void xevan_haval512_cpu_hash_64_final(int thr_id, uint32_t threads, uint32_t *d_hash, uint32_t *resNonce, uint64_t target);
extern void xevan_groestl512_cpu_hash(int thr_id, uint32_t threads, uint32_t *d_hash);
extern void keccak_xevan_cpu_hash_64_A(int thr_id, uint32_t threads, uint32_t *d_hash);
extern void quark_blake512_cpu_hash_64(int thr_id, uint32_t threads, uint32_t *d_nonceVector, uint32_t *d_outputHash);
extern void quark_blake512_cpu_hash_128(int thr_id, uint32_t threads, uint32_t *d_outputHash);
extern void quark_groestl512_cpu_hash_128(int thr_id, uint32_t threads,  uint32_t *d_hash);
extern void x11_luffa512_cpu_hash_128(int thr_id, uint32_t threads,uint32_t *d_hash);



// X17 CPU Hash (Validation)
extern "C" void x17hash(void *output, const void *input)
{
uint32_t _ALIGN(64) hash[32]; // 128 bytes required
const int dataLen = 128;
//return;
sph_blake512_context     ctx_blake;
sph_bmw512_context       ctx_bmw;
sph_groestl512_context   ctx_groestl;
sph_skein512_context     ctx_skein;
sph_jh512_context        ctx_jh;
sph_keccak512_context    ctx_keccak;
sph_luffa512_context     ctx_luffa;
sph_cubehash512_context  ctx_cubehash;
sph_shavite512_context   ctx_shavite;
sph_simd512_context      ctx_simd;
sph_echo512_context      ctx_echo;
sph_hamsi512_context     ctx_hamsi;
sph_fugue512_context     ctx_fugue;
sph_shabal512_context    ctx_shabal;
sph_whirlpool_context    ctx_whirlpool;
sph_sha512_context       ctx_sha512;
sph_haval256_5_context   ctx_haval;

//print_hash(input,20);
sph_blake512_init(&ctx_blake);
sph_blake512(&ctx_blake, input, 80);
sph_blake512_close(&ctx_blake, hash);
//print_hash(hash,32);
memset(&hash[16], 0, 64);

sph_bmw512_init(&ctx_bmw);
sph_bmw512(&ctx_bmw, hash, dataLen);
sph_bmw512_close(&ctx_bmw, hash);
//print_hash(hash,32);
//for(int i=0;i<32;i++)hash[i]=0;
sph_groestl512_init(&ctx_groestl);
sph_groestl512(&ctx_groestl, hash, dataLen);
sph_groestl512_close(&ctx_groestl, hash);
//print_hash(hash,32);
//for(int i=0;i<32;i++)hash[i]=0;

sph_skein512_init(&ctx_skein);
sph_skein512(&ctx_skein, hash, dataLen);
sph_skein512_close(&ctx_skein, hash);

//print_hash(hash,32);
//for(int i=0;i<32;i++)hash[i]=0;
sph_jh512_init(&ctx_jh);
sph_jh512(&ctx_jh, hash, dataLen);
sph_jh512_close(&ctx_jh, hash);
//print_hash(hash,32);

sph_keccak512_init(&ctx_keccak);
sph_keccak512(&ctx_keccak, hash, dataLen);
sph_keccak512_close(&ctx_keccak, hash);
//print_hash(hash,32);
//for(int i=0;i<32;i++)hash[i]=0;
sph_luffa512_init(&ctx_luffa);
sph_luffa512(&ctx_luffa, hash, dataLen);
sph_luffa512_close(&ctx_luffa, hash);
//print_hash(hash,32);
//for(int i=0;i<32;i++)hash[i]=0;
sph_cubehash512_init(&ctx_cubehash);
sph_cubehash512(&ctx_cubehash, hash, dataLen);
sph_cubehash512_close(&ctx_cubehash, hash);
//print_hash(hash,32);
sph_shavite512_init(&ctx_shavite);
sph_shavite512(&ctx_shavite, hash, dataLen);
sph_shavite512_close(&ctx_shavite, hash);
//print_hash(hash,32);
sph_simd512_init(&ctx_simd);
sph_simd512(&ctx_simd, hash, dataLen);
sph_simd512_close(&ctx_simd, hash);
//print_hash(hash,32);
sph_echo512_init(&ctx_echo);
sph_echo512(&ctx_echo, hash, dataLen);
sph_echo512_close(&ctx_echo, hash);
//print_hash(hash,32);
//for(int i=0;i<32;i++)hash[i]=0;
sph_hamsi512_init(&ctx_hamsi);
sph_hamsi512(&ctx_hamsi, hash, dataLen);
sph_hamsi512_close(&ctx_hamsi, hash);
//print_hash(hash,32);
//for(int i=0;i<32;i++)hash[i]=0;
sph_fugue512_init(&ctx_fugue);
sph_fugue512(&ctx_fugue, hash, dataLen);
sph_fugue512_close(&ctx_fugue, hash);
//print_hash(hash,32);
sph_shabal512_init(&ctx_shabal);
sph_shabal512(&ctx_shabal, hash, dataLen);
sph_shabal512_close(&ctx_shabal, hash);
//print_hash(hash,32);
sph_whirlpool_init(&ctx_whirlpool);
sph_whirlpool(&ctx_whirlpool, hash, dataLen);
sph_whirlpool_close(&ctx_whirlpool, hash);
//print_hash(hash,32);
//for(int i=0;i<32;i++)hash[i]=0;
sph_sha512_init(&ctx_sha512);
sph_sha512(&ctx_sha512,(const void*) hash, dataLen);
sph_sha512_close(&ctx_sha512,(void*) hash);
//print_hash(hash,32);
sph_haval256_5_init(&ctx_haval);
sph_haval256_5(&ctx_haval,(const void*) hash, dataLen);
sph_haval256_5_close(&ctx_haval, hash);
//print_hash(hash,32);

memset(&hash[8], 0, dataLen - 32);

sph_blake512_init(&ctx_blake);
sph_blake512(&ctx_blake, hash, dataLen);
sph_blake512_close(&ctx_blake, hash);

//print_hash(hash,32);

sph_bmw512_init(&ctx_bmw);
sph_bmw512(&ctx_bmw, hash, dataLen);
sph_bmw512_close(&ctx_bmw, hash);

sph_groestl512_init(&ctx_groestl);
sph_groestl512(&ctx_groestl, hash, dataLen);
sph_groestl512_close(&ctx_groestl, hash);

sph_skein512_init(&ctx_skein);
sph_skein512(&ctx_skein, hash, dataLen);
sph_skein512_close(&ctx_skein, hash);

sph_jh512_init(&ctx_jh);
sph_jh512(&ctx_jh, hash, dataLen);
sph_jh512_close(&ctx_jh, hash);

sph_keccak512_init(&ctx_keccak);
sph_keccak512(&ctx_keccak, hash, dataLen);
sph_keccak512_close(&ctx_keccak, hash);

sph_luffa512_init(&ctx_luffa);
sph_luffa512(&ctx_luffa, hash, dataLen);
sph_luffa512_close(&ctx_luffa, hash);

sph_cubehash512_init(&ctx_cubehash);
sph_cubehash512(&ctx_cubehash, hash, dataLen);
sph_cubehash512_close(&ctx_cubehash, hash);

sph_shavite512_init(&ctx_shavite);
sph_shavite512(&ctx_shavite, hash, dataLen);
sph_shavite512_close(&ctx_shavite, hash);

sph_simd512_init(&ctx_simd);
sph_simd512(&ctx_simd, hash, dataLen);
sph_simd512_close(&ctx_simd, hash);

sph_echo512_init(&ctx_echo);
sph_echo512(&ctx_echo, hash, dataLen);
sph_echo512_close(&ctx_echo, hash);

sph_hamsi512_init(&ctx_hamsi);
sph_hamsi512(&ctx_hamsi, hash, dataLen);
sph_hamsi512_close(&ctx_hamsi, hash);

sph_fugue512_init(&ctx_fugue);
sph_fugue512(&ctx_fugue, hash, dataLen);
sph_fugue512_close(&ctx_fugue, hash);

sph_shabal512_init(&ctx_shabal);
sph_shabal512(&ctx_shabal, hash, dataLen);
sph_shabal512_close(&ctx_shabal, hash);

sph_whirlpool_init(&ctx_whirlpool);
sph_whirlpool(&ctx_whirlpool, hash, dataLen);
sph_whirlpool_close(&ctx_whirlpool, hash);

sph_sha512_init(&ctx_sha512);
sph_sha512(&ctx_sha512,(const void*) hash, dataLen);
sph_sha512_close(&ctx_sha512,(void*) hash);

//print_hash(hash,32);
sph_haval256_5_init(&ctx_haval);
sph_haval256_5(&ctx_haval,(const void*) hash, dataLen);
sph_haval256_5_close(&ctx_haval, hash);
//print_hash(hash,8);
memcpy(output, hash, 32);
}

static bool init[MAX_GPUS] = { 0 };


void print_hash(unsigned int *data,int size){
for(int i=0;i        gpulog(LOG_WARNING, 0,"%x ",data[i]);
gpulog(LOG_WARNING, 0,"-------------");
}


extern "C" int scanhash_x17(int thr_id, struct work* work, uint32_t max_nonce, unsigned long *hashes_done){

int dev_id = device_map[thr_id];

uint32_t *pdata = work->data;
uint32_t *ptarget = work->target;
const uint32_t first_nonce = pdata[19];
/*
uint32_t default_throughput = 1<<20;

if (strstr(device_name[dev_id], "GTX 970")) default_throughput+=256*256*6;
if (strstr(device_name[dev_id], "GTX 980")) default_throughput =1<<22;

uint32_t throughput = cuda_default_throughput(thr_id, default_throughput); // 19=256*256*8;
*/
uint32_t default_throughput;
if(device_sm[dev_id]<=500) default_throughput = 1<<20;
else if(device_sm[dev_id]<=520) default_throughput = 1<<21;
else if(device_sm[dev_id]>520) default_throughput = (1<<22) + (1<<21);

if((strstr(device_name[dev_id], "1070")))default_throughput = 1<<20;
if((strstr(device_name[dev_id], "1080")))default_throughput = 1<<20;

uint32_t throughput = cuda_default_throughput(thr_id, default_throughput); // 19=256*256*8;
if (init[thr_id]) throughput = min(throughput, max_nonce - first_nonce);

throughput&=0xFFFFFF70; //multiples of 128 due to simd_echo kernel

if (opt_benchmark)
((uint32_t*)ptarget)[7] = 0xff;

gpulog(LOG_INFO,thr_id,"target %x %x %x",ptarget[5], ptarget[6], ptarget[7]);
        gpulog(LOG_INFO,thr_id,"target %llx",*(uint64_t*)&ptarget[6]);

if (!init[thr_id])
{
cudaSetDevice(device_map[thr_id]);
if (opt_cudaschedule == -1 && gpu_threads == 1) {
cudaDeviceReset();
// reduce cpu usage
cudaSetDeviceFlags(cudaDeviceScheduleBlockingSync);
// cudaDeviceSetCacheConfig(cudaFuncCachePreferShared);
}
gpulog(LOG_INFO,thr_id, "Intensity set to %g, %u cuda threads", throughput2intensity(throughput), throughput);

// x11_simd_echo_512_cpu_init(thr_id, throughput);
x15_whirlpool_cpu_init(thr_id, throughput, 0);
groestl512_cpu_init(thr_id, throughput);
x11_simd512_cpu_init(thr_id, throughput);
//for(;;);
CUDA_SAFE_CALL(cudaMalloc(&d_hash[thr_id], 8 * sizeof(uint64_t) * throughput));
CUDA_SAFE_CALL(cudaMalloc(&d_resNonce[thr_id], NBN * sizeof(uint32_t)));
h_resNonce[thr_id] = (uint32_t*) malloc(NBN  * 8 * sizeof(uint32_t));
if(h_resNonce[thr_id] == NULL){
gpulog(LOG_ERR,thr_id,"Host memory allocation failed");
exit(EXIT_FAILURE);
}
init[thr_id] = true;
}

uint32_t _ALIGN(64) endiandata[20];
for (int k=0; k < 20; k++)
be32enc(&endiandata[k], pdata[k]);
// endiandata[k]=0;
// print_hash(endiandata,20);
quark_blake512_cpu_setBlock_80(thr_id, endiandata);
cudaMemset(d_resNonce[thr_id], 0xff, NBN*sizeof(uint32_t));
// x11_simd512_cpu_init(thr_id, throughput);
// for(;;);
do {
// Hash with CUDA


quark_blake512_cpu_hash_80(thr_id, throughput, pdata[19], d_hash[thr_id]);//A
quark_groestl512_cpu_hash_128(thr_id, throughput, d_hash[thr_id]);

quark_skein512_cpu_hash_64(thr_id, throughput, NULL, d_hash[thr_id]);
quark_jh512_cpu_hash_64(thr_id, throughput, NULL, d_hash[thr_id]);//A //fast
// keccak_xevan_cpu_hash_64_A(thr_id, throughput,  d_hash[thr_id]);//A

//cudaMemset(d_hash[thr_id], 0x00, 16*sizeof(uint32_t));
// x11_luffa512_cpu_hash_64(thr_id, throughput, d_hash[thr_id]); //P
//cudaMemcpy(h_resNonce[thr_id], &d_hash[thr_id][0], 16*sizeof(uint32_t), cudaMemcpyDeviceToHost);
//print_hash(h_resNonce[thr_id],16);
//cudaMemset(d_hash[thr_id], 0x00, 16*sizeof(uint32_t));

x11_luffa512_cpu_hash_128(thr_id, throughput, d_hash[thr_id]);//A
//cudaMemcpy(h_resNonce[thr_id], &d_hash[thr_id][0], 16*sizeof(uint32_t), cudaMemcpyDeviceToHost);
//print_hash(h_resNonce[thr_id],16);
//for(;;);

x11_cubehash512_cpu_hash_64(thr_id, throughput, d_hash[thr_id]); //A 256
xevan_shavite512_cpu_hash_64(thr_id, throughput, d_hash[thr_id]);//P slow r2
                x11_simd512_cpu_hash_64(thr_id, throughput, d_hash[thr_id]);  //A slow r3

//                cudaMemset(d_hash[thr_id], 0x00, 16*sizeof(uint32_t));


// xevan_echo512_cpu_hash_64(thr_id, throughput, d_hash[thr_id]); //slow r1

//                cudaMemcpy(h_resNonce[thr_id], &d_hash[thr_id][0], 16*sizeof(uint32_t), cudaMemcpyDeviceToHost);
// print_hash(h_resNonce[thr_id],16);


  //              cudaMemset(d_hash[thr_id], 0x00, 16*sizeof(uint32_t));

x11_echo512_cpu_hash_64(thr_id, throughput, d_hash[thr_id]);//A


//                cudaMemcpy(h_resNonce[thr_id], &d_hash[thr_id][0], 16*sizeof(uint32_t), cudaMemcpyDeviceToHost);
 //               print_hash(h_resNonce[thr_id],16);

//for(;;);

                x13_hamsi512_cpu_hash_64(thr_id, throughput, d_hash[thr_id]); //fast
x13_fugue512_cpu_hash_64(thr_id, throughput, d_hash[thr_id]); //fast ++
x14_shabal512_cpu_hash_64(thr_id, throughput, d_hash[thr_id]); //fast
xevan_whirlpool_cpu_hash_64(thr_id, throughput, d_hash[thr_id]); //opt2
xevan_sha512_cpu_hash_64(thr_id, throughput, d_hash[thr_id]); //fast
xevan_haval512_cpu_hash_64(thr_id, throughput, d_hash[thr_id]); //fast


// xevan_blake512_cpu_hash_64(thr_id, throughput,  d_hash[thr_id]);//BAD
quark_blake512_cpu_hash_128(thr_id, throughput,  d_hash[thr_id]);//BAD

//
                quark_bmw512_cpu_hash_64x(thr_id, throughput, NULL, d_hash[thr_id]);
//                xevan_groestl512_cpu_hash(thr_id, throughput, d_hash[thr_id]);
quark_groestl512_cpu_hash_128(thr_id, throughput, d_hash[thr_id]);

//                xevan_skein512(thr_id, throughput, d_hash[thr_id]);
                quark_skein512_cpu_hash_64(thr_id, throughput, NULL, d_hash[thr_id]);

                quark_jh512_cpu_hash_64(thr_id, throughput, NULL, d_hash[thr_id]);
//                keccak_xevan_cpu_hash_64_A(thr_id, throughput,  d_hash[thr_id]);
//                x11_luffa512_cpu_hash_64(thr_id, throughput, d_hash[thr_id]);
                x11_luffa512_cpu_hash_128(thr_id, throughput, d_hash[thr_id]);//A

                x11_cubehash512_cpu_hash_64(thr_id, throughput, d_hash[thr_id]);
                xevan_shavite512_cpu_hash_64(thr_id, throughput, d_hash[thr_id]);//move to shared
                x11_simd512_cpu_hash_64(thr_id, throughput, d_hash[thr_id]);

//                xevan_echo512_cpu_hash_64(thr_id, throughput, d_hash[thr_id]);
                x11_echo512_cpu_hash_64(thr_id, throughput, d_hash[thr_id]);

                x13_hamsi512_cpu_hash_64(thr_id, throughput, d_hash[thr_id]);
                x13_fugue512_cpu_hash_64(thr_id, throughput, d_hash[thr_id]);
                x14_shabal512_cpu_hash_64(thr_id, throughput, d_hash[thr_id]);
                xevan_whirlpool_cpu_hash_64(thr_id, throughput, d_hash[thr_id]);
                xevan_sha512_cpu_hash_64(thr_id, throughput, d_hash[thr_id]);







/*
for(int i = 10000;i< 10016;i++){
                cudaMemcpy(h_resNonce[thr_id], &d_hash[thr_id][16*i], 16*sizeof(uint32_t), cudaMemcpyDeviceToHost);
print_hash(h_resNonce[thr_id],8);
}
for(;;);

*/
xevan_haval512_cpu_hash_64_final(thr_id, throughput, d_hash[thr_id],d_resNonce[thr_id],*(uint64_t*)&ptarget[6]);

cudaMemcpy(h_resNonce[thr_id], d_resNonce[thr_id], NBN*sizeof(uint32_t), cudaMemcpyDeviceToHost);

if (h_resNonce[thr_id][0] != UINT32_MAX){
const uint32_t Htarg = ptarget[7];
const uint32_t startNounce = pdata[19];
uint32_t vhash64[8];
be32enc(&endiandata[19], startNounce + h_resNonce[thr_id][0]);
x17hash(vhash64, endiandata);
// *hashes_done = pdata[19] - first_nonce + throughput + 1;
// pdata[19] = startNounce + h_resNonce[thr_id][0];
gpulog(LOG_WARNING, 0,"NONCE FOUND ");
// return 1;
if (vhash64[7] <= Htarg && fulltest(vhash64, ptarget)) {
int res = 1;
*hashes_done = pdata[19] - first_nonce + throughput + 1;
work_set_target_ratio(work, vhash64);
pdata[19] = startNounce + h_resNonce[thr_id][0];
if (h_resNonce[thr_id][1] != UINT32_MAX) {
pdata[21] = startNounce+h_resNonce[thr_id][1];
if(!opt_quiet)
gpulog(LOG_BLUE,dev_id,"Found 2nd nonce: %08x", pdata[21]);
be32enc(&endiandata[19], pdata[21]);
x17hash(vhash64, endiandata);
if (bn_hash_target_ratio(vhash64, ptarget) > work->shareratio[0]){
work_set_target_ratio(work, vhash64);
xchg(pdata[19],pdata[21]);
}
res++;
}
return res;
}
else {
gpulog(LOG_WARNING, thr_id, "result for %08x does not validate on CPU!", h_resNonce[thr_id][0]);
cudaMemset(d_resNonce[thr_id], 0xff, NBN*sizeof(uint32_t));
}
}

pdata[19] += throughput;
} while (!work_restart[thr_id].restart && ((uint64_t)max_nonce > (uint64_t)throughput + pdata[19]));

*hashes_done = pdata[19] - first_nonce + 1;

return 0;
}

// cleanup
extern "C" void free_x17(int thr_id)
{
if (!init[thr_id])
return;

cudaDeviceSynchronize();

free(h_resNonce[thr_id]);
cudaFree(d_resNonce[thr_id]);
cudaFree(d_hash[thr_id]);

x11_simd_echo_512_cpu_free(thr_id);
x15_whirlpool_cpu_free(thr_id);
cudaDeviceSynchronize();
init[thr_id] = false;
}
sr. member
Activity: 266
Merit: 250
http://paste.ubuntu.com/25680610/

this is results of output, replacing x17.cu and recompile with one posted palgin.


Thank you, will check it!

Pages:
Jump to: