Author

Topic: [ANN] ccminer 2.3 - opensource - GPL (tpruvot) - page 141. (Read 500113 times)

legendary
Activity: 1470
Merit: 1114
That leaves cache performance as the most likley cause for both issues. If the total memory requirements of all
threads exceeds the available cache it will significantly affect cache performance. It's a step function as each cache
level overflows.

^^^^^ This

Seems like going too cheap with a CPU for a mining rig isn't a good idea.

True. But it's very difficult to roi with cpu anyways, 'cause of botnets.

My comments were aimed at the CPU performance of GPU mining. My CPU mining comparison was made
only to illustrate a possible similar problem.

Howerer, I agree that it's very difficult, if not impossible, to ROI with CPU mining.
legendary
Activity: 1154
Merit: 1001
@joblo,
Cryptonight on CPU is a particular case. There's a 2MB scratchpad per thread (or something else which proper name I don't recall). For whatever CPU you have, the ideal number of threads will always be cache-size/2. Most i7's have 8MB cache, so optimal threads = 4.

As far as the rest of the details that you posted, way over my head.  /searching    Cheesy
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
That leaves cache performance as the most likley cause for both issues. If the total memory requirements of all
threads exceeds the available cache it will significantly affect cache performance. It's a step function as each cache
level overflows.

^^^^^ This

Seems like going too cheap with a CPU for a mining rig isn't a good idea.

True. But it's very difficult to roi with cpu anyways, 'cause of botnets.
legendary
Activity: 1470
Merit: 1114
^ Also maybe worthwhile checking if multiple ccminer instances will help.
So you could launch instance 1 with -d 0,1,2, and instance 2 with -d 3,4,5.
In addition, ensure that cpu affinity is reserving specific cpu cores for instance one, and other cpu cores for instance 2.

I had to think about this a bit. ccminer should already create multiple CPU threads spread over all cores.
This can be confirmed by using the -D option to enable debug output.  Even so, with three threads per core
it could introduce scheduling latency. That combined with the small cache could easilly cause a 10% degradation
in performance.

The algo probbaly factors into it as well. Does the degradation occur with other algos?

I have seen some odd performnance differences while testing cryptonight on cpuminer that I still don't understand.
At first i thought it was due to some affinity tricks but I haven't found anything in the code to explain it.
In short cryptonight performs radically differently on different CPUs/OSs. On a 6700K running Linux I  get best
performance CPU mining with 4 threads. More threads causes the total hashrate to drop to as low as half the 4 thread
rate. Most other algos perform much better with more threads. The CPUs also run pretty cool on cryptonight suggesting
they are often stalled waiting for data (ie memory bound).

There shouldn't be any scheduling delays because the number of running threads is less than the available virtual
cores. Any thread contention would occur during execution and be mitigated by hyperthreading.

That leaves cache performance as the most likley cause for both issues. If the total memory requirements of all
threads exceeds the available cache it will significantly affect cache performance. It's a step function as each cache
level overflows.

Seems like going too cheap with a CPU for a mining rig isn't a good idea.
member
Activity: 106
Merit: 10
hmm..
my miner crash after some hours,this is the error report
Code:
Cuda error in func 'decred_cpu_setBlock_52' at line 321 : unknown error. 

any idea about this?

turn down intensity it was doing that with mine.
member
Activity: 67
Merit: 10
hmm..
my miner crash after some hours,this is the error report
Code:
Cuda error in func 'decred_cpu_setBlock_52' at line 321 : unknown error. 

any idea about this?
legendary
Activity: 1154
Merit: 1001
^ Also maybe worthwhile checking if multiple ccminer instances will help.
So you could launch instance 1 with -d 0,1,2, and instance 2 with -d 3,4,5.
In addition, ensure that cpu affinity is reserving specific cpu cores for instance one, and other cpu cores for instance 2.
legendary
Activity: 1470
Merit: 1114
I only get 445 Mh/s per 750 Ti. What am I missing? Do I need a certain driver or cuda toolkit or something?

Edit: apparently I get close to 500 if I only use one card but with 6 cards and a dualcore Pentium G3240 I only get abou 445 per card.
I presume because the first round of blake is done on the CPU as per sp_ said.

It could be the CPU that is he bottleneck (you could check usage) but try increasing your pagefile size, assuming you're on Windows.

16GB pagefile with ~27% CPU usage but it's still slower than 1 card.

Small cache maybe?
legendary
Activity: 2002
Merit: 1051
ICO? Not even once.
I only get 445 Mh/s per 750 Ti. What am I missing? Do I need a certain driver or cuda toolkit or something?

Edit: apparently I get close to 500 if I only use one card but with 6 cards and a dualcore Pentium G3240 I only get abou 445 per card.
I presume because the first round of blake is done on the CPU as per sp_ said.

It could be the CPU that is he bottleneck (you could check usage) but try increasing your pagefile size, assuming you're on Windows.

16GB pagefile with ~27% CPU usage but it's still slower than 1 card.
legendary
Activity: 1470
Merit: 1114
I only get 445 Mh/s per 750 Ti. What am I missing? Do I need a certain driver or cuda toolkit or something?

Edit: apparently I get close to 500 if I only use one card but with 6 cards and a dualcore Pentium G3240 I only get abou 445 per card.
I presume because the first round of blake is done on the CPU as per sp_ said.

It could be the CPU that is he bottleneck (you could check usage) but try increasing your pagefile size, assuming you're on Windows.
legendary
Activity: 2002
Merit: 1051
ICO? Not even once.
I only get 445 Mh/s per 750 Ti. What am I missing? Do I need a certain driver or cuda toolkit or something?

Edit: apparently I get close to 500 if I only use one card but with 6 cards and a dualcore Pentium G3240 I only get abou 445 per card.
I presume because the first round of blake is done on the CPU as per sp_ said.
full member
Activity: 224
Merit: 100
can this version be improved?
sr. member
Activity: 318
Merit: 250
hi, what is the speed of a single 750ti card on Decred?

Around 500mh depending on clocks.
sr. member
Activity: 440
Merit: 250
hi, what is the speed of a single 750ti card on Decred?
full member
Activity: 224
Merit: 100
ah sry i'll ask in the other thread
legendary
Activity: 1154
Merit: 1001
^ Please move that discussion to the appropriate thread? It's bad enough to get the advertisements leaking over to this end  Roll Eyes
full member
Activity: 224
Merit: 100
My version is faster but have some issues in solomining. If you mine on a pool you get 10-15% more coins with 1.5.74 / 1.5.78 in most algos. My private is up to 10% faster than the public(quark,lyra2v2).

I submit stale shares as default. If you have a slow connection you might get rejects. But my private farm is at 99.5% accepted.The nicehashminer is using my version 1.5.74 on maxwell cards because it is the fastest.

but your version is missing the most profitable algo, decred and ethereum, so it's pointless, when you add those?
legendary
Activity: 1400
Merit: 1050
My version is faster but have some issues in solomining. If you mine on a pool you get 10-15% more coins with 1.5.74 / 1.5.78 in most algos. My private is up to 10% faster than the public(quark,lyra2v2).

I submit stale shares as default. If you have a slow connection you might get rejects. But my private farm is at 99.5% accepted.The nicehashminer is using my version 1.5.74 on maxwell cards because it is the fastest.
and _sp is the most modest dev
sp_
legendary
Activity: 2926
Merit: 1087
Team Black developer
My version is faster but have some issues in solomining. If you mine on a pool you get 10-15% more coins with 1.5.74 / 1.5.78 in most algos. My private is up to 10% faster than the public(quark,lyra2v2).

I submit stale shares as default. If you have a slow connection you might get rejects. But my private farm is at 99.5% accepted.The nicehashminer is using my version 1.5.74 on maxwell cards because it is the fastest.
legendary
Activity: 1154
Merit: 1001

Generally speaking, this version right here is considered by some (such as myself) to be the most efficient. SP_'s versions are a wee bit faster for some algorithms, but end up producing more invalid hashes or rejects, which then results in overall worse returns.

SP does have some private versions for sale, I think there's multiple posts about it in just about every page on the linked thread. Those might be worth it if you're into that sort of thing, and if you have enough hash power to really make up for the expense...
Jump to: