Author

Topic: CCminer(SP-MOD) Modded NVIDIA Maxwell / Pascal kernels. - page 1235. (Read 2347426 times)

legendary
Activity: 2660
Merit: 1106
Just a note:

ccminer-52.exe to run with 900 series by tpruvot 1.4.6 => fastest to 900 series till now (8.2-8.3MH/s X11 per card)

ccminermod.exe by sp_ => fastest to run with 750Ti => 2.85MH/s X11 per card

Cheers
legendary
Activity: 1512
Merit: 1000
quarkchain.io
Good improovement , Ill test 900s later Smiley
legendary
Activity: 2660
Merit: 1106
980 doing 8.2MH/s X11.
But miner crashes when closing.
legendary
Activity: 3122
Merit: 1003
before you take all the credits, i released a new version 1.4.6 :



https://github.com/tpruvot/ccminer/releases/






From  https://github.com/tsiv/ccminer/releases  s3  algo  ~9mh/s 750ti  overclocked some

EDIT: at 60 watts per
legendary
Activity: 1484
Merit: 1082
ccminer/cpuminer developer
before you take all the credits, i released a new version 1.4.6 :



https://github.com/tpruvot/ccminer/releases/
sr. member
Activity: 285
Merit: 250
I have hit 7000 KH/S with a 970!! Using 250 clock OC with 37 mv overvoltage
This is using SP's first release, I feel when I have his latest release it can go even higher Cheesy

price per performance ratio for the 970 is unmatched right now

Good work!

sr. member
Activity: 285
Merit: 250
I have the 970, posted the numbers
full member
Activity: 160
Merit: 100
Would be very interested in this, but I'd be going to buy some 970s from Micro Center to do so. I checked a few pages back and didn't see any hard numbers on the 970s (only 980s) in terms of kh/s and wattage. I have a bunch of rigs with various 270Xs but I'm looking to downscale and convert to nvidia cards due to their increased efficiency.
sp_
legendary
Activity: 2884
Merit: 1087
Team Black developer
The sharemem tweaks was a dead end. But managed to squeze another 1.5% faster than the Schleicher implementation in the echo hash.

This code can be optimized:

         uint32_t t;
         t = ((ab & 0x80808080) >> 7);
         uint32_t abx = t<<4 ^ t<<3 ^ t<<1 ^ t;
         t = ((bc & 0x80808080) >> 7);
         uint32_t bcx = t<<4 ^ t<<3 ^ t<<1 ^ t;
         t = ((cd & 0x80808080) >> 7);
         uint32_t cdx = t<<4 ^ t<<3 ^ t<<1 ^ t;

         abx ^= ((ab & 0x7F7F7F7F) << 1);
         bcx ^= ((bc & 0x7F7F7F7F) << 1);
         cdx ^= ((cd & 0x7F7F7F7F) << 1);

because

(ab & 0x7F7F7F7F)=ab^((ab & 0x80808080)

saves a register/moves/deadcode, and with the proper configuration 1.5% more hash in the ECHO.
hero member
Activity: 672
Merit: 500
x11: 2680khash (ccminer-djm34)
x13: 2030khash (ccminer-djm34)
x15: 1800khash (ccminer-djm34)

zotac 750ti stock

x11: 2820khash (ccminer-sp)
x13: 2090khash (ccminer-sp)
x15: 1880khash (ccminer-sp)


I will test 970 and 980 tomorrow.
Sadly my Suarez lost the game in his first show. Sad , upset
sp_
legendary
Activity: 2884
Merit: 1087
Team Black developer
Not Skein, I ment Shavite. Was mixing the X'es here. They both share the implementation in cuda_x11_aes.cu. The method aes_round()

I want to do to 2 table reads into one. but this will require 256kb of shared mem. so I need to split the bits and do seperate code for the upper bits combinations.
 
sharedMemory[__byte_perm(x0, 0, 0x4440)]^sharedMemory[__byte_perm(x1, 0, 0x4441) + 256],

legendary
Activity: 1400
Merit: 1050
I re-wrote two new hashing algorithms yesterday, it has some bugs so the numbers are not ready yet, but it looks like 3,5 MHASH on the 750ti on x11 with 38 watt in the wall per card. 9.2 watt per MHASH.
Impressive ! Smiley

But it I cant't get it to work properly:(. The idea was to reduce the number of sharemem access in echo and skein (ahs). ccminer is currently it is doing a lookup for each byte, but i want to use the increased and improved sharemem in the maxwell to lookup more bits. The 1kb table can become 32kb or 48kb.  Block latency should stay low as the probabillity of hitting 2 equal adresses is lower with 48kb combinations.
hmm, not sure what would bring sharedmem on skein... there is no big look up table, everything is just calculated for every nonce and threads so there is nothing to share between threads in the first place 
Strangely, I tried it while working on skein-1024 and the performance were just terrible and it was a major slow down compared to the version not using it...
sp_
legendary
Activity: 2884
Merit: 1087
Team Black developer
I re-wrote two new hashing algorithms yesterday, it has some bugs so the numbers are not ready yet, but it looks like 3,5 MHASH on the 750ti on x11 with 38 watt in the wall per card. 9.2 watt per MHASH.
Impressive ! Smiley

But it I cant't get it to work properly:(. The idea was to reduce the number of sharemem access in echo and skein (ahs). ccminer is currently it is doing a lookup for each byte, but i want to use the increased and improved sharemem in the maxwell to lookup more bits. The 1kb table can become 32kb or 48kb.  Block latency should stay low as the probabillity of hitting 2 equal adresses is lower with 48kb combinations.
legendary
Activity: 1512
Merit: 1000
quarkchain.io
You can send me a pm with your email adress and I will send you the betaminer. (sundays build).

nice work , I PMd you already Smiley
hero member
Activity: 789
Merit: 501
I re-wrote two new hashing algorithms yesterday, it has some bugs so the numbers are not ready yet, but it looks like 3,5 MHASH on the 750ti on x11 with 38 watt in the wall per card. 9.2 watt per MHASH.

Impressive ! Smiley
sp_
legendary
Activity: 2884
Merit: 1087
Team Black developer
You can send me a pm with your email adress and I will send you the betaminer. (sundays build).
sr. member
Activity: 248
Merit: 250
Can someone give the download url for miner?  Smiley
legendary
Activity: 2660
Merit: 1106
I re-wrote two new hashing algorithms yesterday, it has some bugs so the numbers are not ready yet, but it looks like 3,5 MHASH on the 750ti on x11 with 38 watt in the wall per card. 9.2 watt per MHASH.

Beautiful, just adding to the post, your latest miner don´t crash as the old ones when closing the app, now I can close it without problems.
sp_
legendary
Activity: 2884
Merit: 1087
Team Black developer
I re-wrote two new hashing algorithms yesterday, it has some bugs so the numbers are not ready yet, but it looks like 3,5 MHASH on the 750ti on x11 with 38 watt in the wall per card. 9.2 watt per MHASH.
legendary
Activity: 2660
Merit: 1106
Hit 9MH/s on my new 290X, clocks 1110/1625.

That is very good. I hope you don't release it. 90% of the GPU mining rigs are from AMD. Smiley I wil try to push the 980 above 10MHASH, and then keep it private if the speed get's bether.

9.45MH/s on 290X - race you to 10MH!  Grin

What´s the Power consumption?

Last I checked, over 270X, 280X, 290X, and 7950 all undervolted, I got 22.13MH/s for 610W.

EDIT: I'll go test high performance numbers against the stock miner with wattages in a bit.

Wow, what a mix of cards! Nice results.
I´m getting 490-500W for 3x980 + 1 750Ti stock voltages and Overclocked for 25,3MH/s@X11.

I built her for testing - Pitcairn, Tahiti, and Hawaii - just need the R9 285 for Tonga.

Cool, testing in dif. chips.
Can´t wait to see 980 mining at 10MH/s!
Jump to: