Author

Topic: CCminer(SP-MOD) Modded NVIDIA Maxwell / Pascal kernels. - page 1230. (Read 2347426 times)

legendary
Activity: 1484
Merit: 1082
ccminer/cpuminer developer
Thanks for the commit, with that i will be able to profile it on linux to mesure the real improvement.

X11 is a series of 11 algos, and optimizing one sometimes doesnt impact in the right way the overall perf (for Xn funcs at least)

ive seen that with constant arrays...
sp_
legendary
Activity: 2898
Merit: 1087
Team Black developer
My God, that CubeHash... you should need a license to practice C programming, I swear...

The compiler does a good job. The whole hash is done in 45 registers with no adress calculations. My cubehash change was just to help the compiler do the job. The implementation is almost equal to the
original in ccminer but mine is around 9% faster.
sp_
legendary
Activity: 2898
Merit: 1087
Team Black developer
In the echo hash I replaced 4 shift and 3 eor operations by 1 mul, I also removed 1 andmask by using an eor trick.

OLD:
         uint32_t t;
         t = ((ab & 0x80808080) >> 7);
         uint32_t abx = t<<4 ^ t<<3 ^ t<<1 ^ t;
         t = ((bc & 0x80808080) >> 7);
         uint32_t bcx = t<<4 ^ t<<3 ^ t<<1 ^ t;
         t = ((cd & 0x80808080) >> 7);
         uint32_t cdx = t<<4 ^ t<<3 ^ t<<1 ^ t;
         abx ^= ((ab & 0x7F7F7F7F) << 1);
          bcx ^= ((bc & 0x7F7F7F7F) << 1);
         cdx ^= ((cd & 0x7F7F7F7F) << 1);


NEW:
         uint32_t t, t2, t3;
         t = (ab & 0x80808080);
         t2 = (bc & 0x80808080);
         t3 = (cd & 0x80808080);

         uint32_t abx = (t >> 7) * 27 ^ ((ab^t) << 1);
         uint32_t bcx = (t2 >> 7) * 27 ^ ((bc^t2) << 1);
         uint32_t cdx = (t3 >> 7) * 27 ^ ((cd^t3) << 1);
sp_
legendary
Activity: 2898
Merit: 1087
Team Black developer
just curious any advantage replacing the tables in whirlpool by rotate ?
I tried this sometime ago but wasn't impressed by the result... (haven't tried on the 980 though... considering it has more shared memory and as it doesn't change anything in terms of random access...) ?

I tried. (AES)  Rotate the precalculation buffer reduced the sharemem needed by 1/4th. Tried to lookup more bits per memory access, but on the 750TI, when using more than 32kb sharemem it will slow down the chip.
On the 980 the sharemem has been increased to 96kb,  But I guess only 48kb is usable without slow down. So my first attemt failed to improve the speed.
legendary
Activity: 1400
Merit: 1050
Here is my first commit:

https://github.com/sp-hash/ccminer

+50KHASH on the 750ti.  (improved cubehash,echo,groestl and simd)

This is forked from the 1.4.6 release.
just curious any advantage replacing the tables in whirlpool by rotate ?
I tried this sometime ago but wasn't impressed by the result... (haven't tried on the 980 though... considering it has more shared memory and as it doesn't change anything in terms of random access...) ?
legendary
Activity: 1400
Merit: 1050
My God, that CubeHash... you should need a license to practice C programming, I swear...
5 dim matrices... I didn't even know it was possible  Grin
sp_
legendary
Activity: 2898
Merit: 1087
Team Black developer
Here is my first commit:

https://github.com/sp-hash/ccminer

+50KHASH on the 750ti.  (improved cubehash,echo,groestl and simd)

This is forked from the 1.4.6 release.
legendary
Activity: 1484
Merit: 1082
ccminer/cpuminer developer
sp_, you should join us on #ccminer (freenode)
legendary
Activity: 3122
Merit: 1003
I received 0.003 indeed Smiley, yes its my IRL name :p

I will create a list of the gifts, i reduced the target amount by 1$
cool....I didnt know that was you.  Cheesy
sp_
legendary
Activity: 2898
Merit: 1087
Team Black developer
I have also spent months optimizing all the 11 x'es. I have released 5 betas with different optimalizations. Each exe has just a couple of the kernels changed and not all included.  My optimized code could be merged into any of the clone and become the fastest miner on the planet. But your's is the cleanest and best fork. Now I am preparing my first checkin.
legendary
Activity: 1484
Merit: 1082
ccminer/cpuminer developer
The 1% was the last tsiv commit, its the first thing i made months ago, X11/X15 tuning for the 750 ti (before cuda 6.5 beta program)
sp_
legendary
Activity: 2898
Merit: 1087
Team Black developer
I will probobly checkin 4 optimized kernals tonight. We are not optimizing the same parts of the code, your 14.6 with my 14.5 changes will boost the total HASH.

My problem is: Some of the optimalisations crash the miner / hurt performance. Also run fast on 750ti and slower on 970/980
Today I will checkin "safe" optimaliztions in Groest, Echo, simd an jh512. Much more than 1% as you claim in your post.
legendary
Activity: 1484
Merit: 1082
ccminer/cpuminer developer
I received 0.003 indeed Smiley, yes its my IRL name :p

I will create a list of the gifts, i reduced the target amount by 1$
legendary
Activity: 3122
Merit: 1003
I dont like to ask but i need so please read : https://pledgie.com/campaigns/27288

and sp_, i expect a few from you too Wink
Epsylon3 are you tpruvot?  anyhow sent him or you some btc i hope all do the same.  Wink

Edit: why doesn't he or you show donors of btc? that would be great
legendary
Activity: 1484
Merit: 1082
ccminer/cpuminer developer
I dont like to ask but i need so please read : https://pledgie.com/campaigns/27288

and sp_, i expect a few from you too Wink
sp_
legendary
Activity: 2898
Merit: 1087
Team Black developer
I've recieved the 970 and 980 cards now. The 980 card seems to have an issue because the fan is spinning at 100%. Perhaps I need to upgrade the bios? Or some setting on the motherboard? Or deliver it back to the shop. To make the build stable I had to remove some of the optimalisations.
Will work some more on it and give out a new beta. Current stable hash at stock clocks is 6600 MHASH on the 970 and 7700 on the 980. (x11) No lost nounces, and no crashes.
full member
Activity: 126
Merit: 100
Waves address 3PP4npcijfECsxrf4Ms4rSxPuZq1kamGn7H
Hello,
My laptop has an old GPU Nvidia GT630M (Compute 2.1), could you also add support for Compute 2.0 and 2.1 card as well? I would be very much appreciated!
hero member
Activity: 672
Merit: 500
There seems to be a bug in the latest release as well. x13 and x15 get a few out of range errors x11 as well, but 1 out of 100 or something.
I need to merge my changes to one kernal at the time to find out the broken x.
im back for a short trip , any new miner to push my 750ti up to 3mh? i will donate if it works.
sp_
legendary
Activity: 2898
Merit: 1087
Team Black developer
I did, but when changing the parameters in the benchmark to 0xFFFF (really low diff) a few nounces in calculated wrong in x13 and x15 even on the 750ti. The 52 build is untested. I don't have 970 and 980 cards to test yet.

About the crashes, this miner seems to crash once in a while when it looses connection to the pool. If you have been mining@nicehash latetly you have noticed that is has been under heavy DDOS attack's for days now.


"We are currently being hit by 8+ Gbps DDoS attack which is not so easy to mitigate ... we'll keep you updated when we'll have more information available."

member
Activity: 146
Merit: 10
Miner fast, beautiful, but until that curve.
Launched on 4x750Ti & 5x750Ti, watched a little and left the farm to work. After 12 hours, it became clear that the farm stand. Miner window hanging, loading cards zero. Miner ceased to find solutions within 1-2 hours after launch. At the same time on another farm that uses ccminer 1.2 cards continue to work...
Unfortunately, I must go back to the old versions miner.
Jump to: