Author

Topic: CCminer(SP-MOD) Modded NVIDIA Maxwell / Pascal kernels. - page 777. (Read 2347664 times)

sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
Faster but not profitable. I didn`t reach 5mhash yet
Well, did you modify SIMD in it? And if you feel like sharing, how much was gained from SIMD alone in X11 speed percentage?
Why don't you stick with AMD. Disassemble the kochur bins and check for yourself.
I am, thanks. What I'm interested in is how much there is to be gained from SIMD. While the architecture is different, many things are similar - if there's an unexpectedly massive improvement from SIMD on Nvidia GPUs, it is quite likely there is on AMD.
Also, why so defensive? I have no intention of enroaching on your turf, here - I could do more CUDA if I wanted, but for now it does not interest me. You don't need to see me as a threat.

You are no competition to me...
jr. member
Activity: 64
Merit: 1
Perhaps compile the ether miner for 32 bit's will help? Cached Pointersizes will go from 64bit to 32 (and double the tlb limit?) You need to remove the cpu verfication code because it use 64bit libraries I think..

Thought of that but it's going to be troublesome. You only have a 4GB address space, with windows already sucking up ~half. Then you have to load the 1.3 GB DAG from disk, and allocate 1.3GB of GPU RAM (which, AFAIK sits in the same space, although it isn't pinned to host). This doens't fit. So then you would have to read the DAG from disk in small chunks and copy it cover to GPU RAM. And when that's all done, you will have to pass on all solutions to a special light version of ethminer, that does light verification, is it can't load a DAG into RAM for the same reasons. Or you simply don't verify and risk some Boo's.

Then, when that's all done, you're not even sure if it fixes the problem. You could try getting a 32-bit version of dagSimCL to work.

I believe Epsylon3/tpruvot has been trying to get a 32-bit version of ethminer to work a while back. Can't find the source anymore.

Hi Genoil,

This is a small contribution from me to your work on ethminer --- 0xc3e7bda79b60fb4f34fbe467a8c3c3084e9008bbdc9014d8ade8e8c22cd1352a

I am using your miner and making some ETH and I thought that you deserve part of it (as I did with all devs improving the miners we are using).

Keep up the good work!

Thanks! Are you by any chance mining on Windows? Or if you built on Linux, can you find me the commit hash? I have an issue with 9x0 with the latest source, and I'm trying to figure out at which point it went wrong.

Sent you a PM.
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
Faster but not profitable. I didn`t reach 5mhash yet

Well, did you modify SIMD in it? And if you feel like sharing, how much was gained from SIMD alone in X11 speed percentage?

Why don't you stick with AMD. Disassemble the kochur bins and check for yourself.
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
Faster but not profitable. I didn`t reach 5mhash yet
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
I said +30% in the quark algo. Quark doesn't have the SIMD512 and you should know it.
No shit - I assumed if you've got a private Quark, you've also got a private X11. If not, sorry, the question is pointless in that case.

Somebody does. X11 is not profitable with the public bins.

a wellcoded x11 should do 4-5MHASH on the 750ti
sr. member
Activity: 438
Merit: 250
Perhaps compile the ether miner for 32 bit's will help? Cached Pointersizes will go from 64bit to 32 (and double the tlb limit?) You need to remove the cpu verfication code because it use 64bit libraries I think..

Thought of that but it's going to be troublesome. You only have a 4GB address space, with windows already sucking up ~half. Then you have to load the 1.3 GB DAG from disk, and allocate 1.3GB of GPU RAM (which, AFAIK sits in the same space, although it isn't pinned to host). This doens't fit. So then you would have to read the DAG from disk in small chunks and copy it cover to GPU RAM. And when that's all done, you will have to pass on all solutions to a special light version of ethminer, that does light verification, is it can't load a DAG into RAM for the same reasons. Or you simply don't verify and risk some Boo's.

Then, when that's all done, you're not even sure if it fixes the problem. You could try getting a 32-bit version of dagSimCL to work.

I believe Epsylon3/tpruvot has been trying to get a 32-bit version of ethminer to work a while back. Can't find the source anymore.

Hi Genoil,

This is a small contribution from me to your work on ethminer --- 0xc3e7bda79b60fb4f34fbe467a8c3c3084e9008bbdc9014d8ade8e8c22cd1352a

I am using your miner and making some ETH and I thought that you deserve part of it (as I did with all devs improving the miners we are using).

Keep up the good work!

Thanks! Are you by any chance mining on Windows? Or if you built on Linux, can you find me the commit hash? I have an issue with 9x0 with the latest source, and I'm trying to figure out at which point it went wrong.
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
I said +30% in the quark algo. Quark doesn't have the SIMD512 and you should know it.
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
I can opensource my 5% faster miner, but then all the donators will be angry wouldn't they. Works on linux as well.
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
joblo, does your quark optimisation work at the end? not sure I understand your conversation with sp_ fully: where does the +30% come from?
My private quark is +30% up from release 74. The buyable private is +5%
If you didn't work on SIMD, I'm suprised and disappointed.
If I did I wouldn't opensource it would I. What is the point? Why don't you opensource yours..
Because I didn't do it yet. And I don't want you to open source it - I just wanted confirmation that you worked on it.
Last opensource checkin in 9 december.
So Yes I have worked on it.
https://github.com/sp-hash/ccminer/commits/windows/x11/cuda_x11_simd512.cu
No, no, I wanted to know if some of your private optimizations that AREN'T for sale include optimizations to SIMD.

Why don't you disassemble it and check.
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
What's with all the vanillacoin talk? It's not profitable to mine and has FPGAs working for it right now.
It's changing algo.
it's now on blake so a quick FPGA work ;-)

Yes. mining it on NVIDIA is a waste.
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
joblo, does your quark optimisation work at the end? not sure I understand your conversation with sp_ fully: where does the +30% come from?
My private quark is +30% up from release 74. The buyable private is +5%
If you didn't work on SIMD, I'm suprised and disappointed.
If I did I wouldn't opensource it would I. What is the point? Why don't you opensource yours..
Because I didn't do it yet. And I don't want you to open source it - I just wanted confirmation that you worked on it.

Last opensource checkin in 9 december.

So Yes I have worked on it.

https://github.com/sp-hash/ccminer/commits/windows/x11/cuda_x11_simd512.cu

legendary
Activity: 2716
Merit: 1094
Black Belt Developer
What's with all the vanillacoin talk? It's not profitable to mine and has FPGAs working for it right now.

It's changing algo.

it's now on blake so a quick FPGA work ;-)
legendary
Activity: 1764
Merit: 1024
What's with all the vanillacoin talk? It's not profitable to mine and has FPGAs working for it right now.
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
joblo, does your quark optimisation work at the end? not sure I understand your conversation with sp_ fully: where does the +30% come from?
My private quark is +30% up from release 74. The buyable private is +5%
If you didn't work on SIMD, I'm suprised and disappointed.

If I did I wouldn't opensource it would I. What is the point? Why don't you opensource yours..
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
joblo, does your quark optimisation work at the end? not sure I understand your conversation with sp_ fully: where does the +30% come from?

My private quark is +30% up from release 74. The buyable private is +5%
legendary
Activity: 1154
Merit: 1001
joblo, does your quark optimisation work at the end? not sure I understand your conversation with sp_ fully: where does the +30% come from?

Joblo's optimization impacts CPU validation of any found shares. This is usually insignificant, but since he's also mining with all CPU cores, it did have an impact for him. It was that his CPU mining was slowing down ccminer.

Joblo: You're invited for a beer over at #ccminer @freenode: there's friendlier dev talk there, some collaboration now and then, and certainly a lot less BS  Wink
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
joblo, does your quark optimisation work at the end? not sure I understand your conversation with sp_ fully: where does the +30% come from?
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
While you where trolling my thread I added another 0.4% in the decred algo.
I will try to do 5% and include it in my donation miner.
Since I forked cpuminer I've increased performance up to 92 % (x13), 75% (x15), 36% (qubit)
and 27% (quark). I can't take credit for all of it because it was just plugging in faster
functions that already existed. But all the gains in quark are mine.

This is good.
sp_
legendary
Activity: 2954
Merit: 1087
Team Black developer
Used allanmac's code (https://gist.github.com/allanmac/f91b67c112bcba98649d) from devtalk.nvidia.com to test TLB thrashing and compiled it with nvcc -m32. It didn't alleviate the TLB thrashing issue; 970 still dives like a stone past 2GB. Not sure if this is representative of memory bandwidth in Ethash though, just sharing to see if anyone can tweak and improve the TLB situation.

Did you try in a 32bit OS?
legendary
Activity: 1470
Merit: 1114
While you where trolling my thread I added another 0.4% in the decred algo.
I will try to do 5% and include it in my donation miner.

Since I forked cpuminer I've increased performance up to 92 % (x13), 75% (x15), 36% (qubit)
and 27% (quark). I can't take credit for all of it because it was just plugging in faster
functions that already existed. But all the gains in quark are mine.
Jump to: