Author

Topic: [ANN]: cpuminer-opt v3.8.8.1, open source optimized multi-algo CPU miner - page 169. (Read 444067 times)

member
Activity: 83
Merit: 10
Okay then, explain this: https://gist.github.com/hmage/2a1fdbd7bdad252cd08c9b4166c5727a

on Core i5-4570S:
Code:
hmage@dhmd:~/test$ cat /proc/cpuinfo |fgrep name|head -1
model name      : Intel(R) Core(TM) i5-4570S CPU @ 2.90GHz
hmage@dhmd:~/test$ gcc dereference_bench.c -O2 -o dereference_bench && ./dereference_bench
      workfunc(): 0.002082 microseconds per call, 480308.777k per second
  workloopfunc(): 0.001774 microseconds per call, 563746.643k per second

on Core i7-4770:
Code:
hmage@vhmd:~$ cat /proc/cpuinfo |fgrep name|head -1
model name      : Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
hmage@vhmd:~$ gcc dereference_bench.c -O2 -o dereference_bench && ./dereference_bench
      workfunc(): 0.001776 microseconds per call, 562932.922k per second
  workloopfunc(): 0.001506 microseconds per call, 664150.879k per second


Dereferencing on every call _is_ a big performance hit, unless you have another explanation.

Latency numbers every programmer should know -- https://gist.github.com/hellerbarde/2843375

Oh, I already know, you get angry.

It looks to me that it was you who got angry. I apologise for my blunt approach.
legendary
Activity: 1470
Merit: 1114
I have given you the benefit of the doubt and tried to probe you for more info in areas where I didn't have the confidence
to call you out. But so far it's come up empty. When you challenge me on one of my strengths you'd better be well
prepared.

I don't care if I challenge you or not, I'm not here for your entertainment.

10 runs of cpuminer-opt are giving results that are consistently less than 10 runs of cpuminer-multi on the algos listed above. Simple as that.

You're free to ignore this fact, of course. But I thought it'd be nice if you knew it.

When I give you constructive feedback you seem to get angry which is counterproductive. I thank you for your work
but it was not enough to draw any conclusions. A 2% diffreence is statistically insignificant. But let's assume it is.

You suggested it was caused by the use of function pointers by algo-gate. I countered that my measurements when
algo-gate was implemented showed an improvement. That disproves you theory, one that was not supported by any
evidence BTW. So if the difference is real it must be caused by something else. There are a lot of possibilities.
Differences in CPU architecture (I don't mean capabilities) can cause measurable differences between algos. Cache
size and organization, execution environment, memory interface, etc can all cause different algos to perform differently
on different CPUs. If you look at HOdl it performs well on an i7 but poorly on an i5 due to the smaller cache. As it turns
out it was specifically optimized for the size of the i7 cache.

You need to do your research, get your facts straight and present a coherent case it you want to get any attention,
especially when you are criticizing someone's work. I have a thick skin, thicker than yours apparently, so I can take
it and give it back. Put your self in my position, how would you react to someone taking pot shots about what you're
doing wrong and how you should do things. Oh, I already know, you get angry.
member
Activity: 83
Merit: 10
I have given you the benefit of the doubt and tried to probe you for more info in areas where I didn't have the confidence
to call you out. But so far it's come up empty. When you challenge me on one of my strengths you'd better be well
prepared.

I don't care if I challenge you or not, I'm not here for your entertainment.

10 runs of cpuminer-opt are giving results that are consistently less than 10 runs of cpuminer-multi on the algos listed above. Simple as that.

You're free to ignore this fact, of course. But I thought it'd be nice if you knew it.
legendary
Activity: 1470
Merit: 1114
member
Activity: 83
Merit: 10
Your suggestion would add the overhead of a function call and return on every iteration to save a pointer deref.
Looks like a bad trade to me.

I meant to move the dereference outside of iteration completely. Have the iteration cycle code for each algo so it doesn't go through dereferencing.

Note - all of this is speculation, I still didn't measure exactly where the slowdown is and why it's slower. I'm just reporting that for some reason non-AES versions of algos are slower in cpuminer-opt compared to cpuminer-multi. This needs further investigation.

One same CPU, these algos are slower on cpuminer-opt compared to cpuminer-multi:
Code:

    "groestl"       =>    1109819 / 1000, // cpuminer-opt
    "groestl"       =>    1125917 / 1000, // cpuminer-nicehash
    "keccak"        =>    6964234 / 1000, // cpuminer-opt
    "keccak"        =>    8332952 / 1000, // cpuminer-nicehash
    "luffa"         =>    2728931 / 1000, // cpuminer-opt
    "luffa"         =>    3177996 / 1000, // cpuminer-nicehash
    "lyra2"         =>     716945 / 1000, // cpuminer-opt
    "lyra2"         =>     921109 / 1000, // cpuminer-nicehash
    "neoscrypt"     =>      27583 / 1000, // cpuminer-opt
    "neoscrypt"     =>      28891 / 1000, // cpuminer-nicehash
    "pentablake"    =>    3479320 / 1000, // cpuminer-opt
    "pentablake"    =>    3609862 / 1000, // cpuminer-nicehash
    "pluck"         =>       1722 / 1000, // cpuminer-opt
    "pluck"         =>       1818 / 1000, // cpuminer-nicehash
    "s3"            =>    1086149 / 1000, // cpuminer-opt
    "s3"            =>    1201897 / 1000, // cpuminer-nicehash
    "scrypt"        =>      91557 / 1000, // cpuminer-opt
    "scrypt"        =>      99702 / 1000, // cpuminer-nicehash
    "sha256d"       =>   53122339 / 1000, // cpuminer-opt
    "sha256d"       =>   54669375 / 1000, // cpuminer-nicehash
    "shavite3"      =>    2232258 / 1000, // cpuminer-opt
    "shavite3"      =>    2343704 / 1000, // cpuminer-nicehash
    "skein"         =>    6405675 / 1000, // cpuminer-opt
    "skein"         =>    6586806 / 1000, // cpuminer-nicehash
    "skein2"        =>    7985012 / 1000, // cpuminer-opt
    "skein2"        =>    8167405 / 1000, // cpuminer-nicehash

I'm using this version of cpuminer-multi — https://github.com/nicehash/cpuminer-multi
legendary
Activity: 1470
Merit: 1114
I've noticed another performance regression compared to cpuminer-multi.

Algos that have very high number of calls per second tend to be slower on cpuminer-opt because of algo-gate callback functions.

When calling through function pointers, the pointer needs to be dereferenced first before jumping, when the function in question is fast enough, the dereferencing could be reducing performance vs direct function call.

One of the ways to fix that is to put dereferencing outside the loop.

pseudocode before:
Code:
func = &hash_sha256;
while(true) { func(); }

pseudocode after:
Code:
funcloop = &hashloop_sha256;
funcloop();

hashloop_sha256() {while(true) {hash_sha256());}

This moves deferencing to be done only once at start of the loop.

Have you measured a regression? My measurements between 3.0.7 (pre algo-gate)  and 3.1 showed a modest
improvement in performance accross the board.

Your suggestion would add the overhead of a function call and return on every iteration to save a pointer deref.
Looks like a bad trade to me.
member
Activity: 83
Merit: 10
I've noticed another performance regression compared to cpuminer-multi.

Algos that have very high number of calls per second tend to be slower on cpuminer-opt because of algo-gate callback functions.

When calling through function pointers, the pointer needs to be dereferenced first before jumping, when the function in question is fast enough, the dereferencing could be reducing performance vs direct function call.

One of the ways to fix that is to put dereferencing outside the loop.

pseudocode before:
Code:
func = &hash_sha256;
while(true) { func(); }

pseudocode after:
Code:
funcloop = &hashloop_sha256;
funcloop();

hashloop_sha256() {while(true) {hash_sha256());}

This moves deferencing to be done only once at start of the loop.
legendary
Activity: 1470
Merit: 1114
I also use hyperthreading. I haven't intentionally touched either algo in several releases and 3.2 was a restructuring release with
no intended change in functionality. If you see it again, let me know and I'll take another look. Buit it's pretty clear from the code
that the last TOTAL rate displays the same data as the last line.

Yes, the output is same as total, just in format that's easier to parse, no problem with that fact.

The problem was that right before the end, it would spit thousands of lines in a second with ever increasing hashrate that would inflate the total result.

Do you keep an archive of older versions of cpuminer-opt? I'd like to check older version and I foolishly deleted my local copy.

A few of the DL links are still active. If you want a specific release let me know and I'll reactivate it. Keep in mind
there have been some problem releases along the way which you probbaly want to avoid. The post for each release
is still in the thread and should help you find the most stable ones.

Personally I don't think it's worth the effort to go back. If the problem reoccurs with the current release you can
collect more data and we can pursue it from there.



member
Activity: 83
Merit: 10
I also use hyperthreading. I haven't intentionally touched either algo in several releases and 3.2 was a restructuring release with
no intended change in functionality. If you see it again, let me know and I'll take another look. Buit it's pretty clear from the code
that the last TOTAL rate displays the same data as the last line.

Yes, the output is same as total, just in format that's easier to parse, no problem with that fact.

The problem was that right before the end, it would spit thousands of lines in a second with ever increasing hashrate that would inflate the total result.

Do you keep an archive of older versions of cpuminer-opt? I'd like to check older version and I foolishly deleted my local copy.
legendary
Activity: 1470
Merit: 1114
Edit3: I ran the test several times with both algos and alway produced the correct result.
Could your script be misinterpreting? Without further information on how to reproduce
I consider this issue closed.

I was running it without the script and was getting the wrong output.

I'm talking about the last line: 463916

This is the line I'm parsing in the script.

Maybe it's because I have hyperthreading enabled on my CPU? Could be a thread synchronization issue.

It doesn't always happen for me either. I've been trying to reproduce the problem for you with v3.2.1 and asciinema and so far no luck.

I was experiencing problem on 3.1.17. Maybe you fixed it in v3.2.1? I see you've changed up stuff that could be relevant since then.

I also use hyperthreading. I haven't intentionally touched either algo in several releases and 3.2 was a restructuring release with
no intended change in functionality. If you see it again, let me know and I'll take another look. Buit it's pretty clear from the code
that the last TOTAL rate displays the same data as the last line.

From looking at the code the time_limit stuff seems in an odd place, the end of the loop would seem more appropriate with the
rest of the display code. I may consider moving it on speculation if the problem returns.
member
Activity: 83
Merit: 10
Edit3: I ran the test several times with both algos and alway produced the correct result.
Could your script be misinterpreting? Without further information on how to reproduce
I consider this issue closed.

I was running it without the script and was getting the wrong output.

I'm talking about the last line: 463916

This is the line I'm parsing in the script.

Maybe it's because I have hyperthreading enabled on my CPU? Could be a thread synchronization issue.

It doesn't always happen for me either. I've been trying to reproduce the problem for you with v3.2.1 and asciinema and so far no luck.

I was experiencing problem on 3.1.17. Maybe you fixed it in v3.2.1? I see you've changed up stuff that could be relevant since then.
legendary
Activity: 1470
Merit: 1114
cpuminer-opt v3.2.1 is available.

zr5: fixed duplicate shares
decred: fixed invalid extranonce2 suffix
x17: pool tested and fully supported

https://drive.google.com/file/d/0B0lVSGQYLJIZWHUxY1BtMWdLd00/view?usp=sharing
legendary
Activity: 1470
Merit: 1114
X17 algo is now supported by cpuminer-opt. It has been tested at zpool with v3.1.18.

The message warning that the algo has not been tested can be ignored. It will be removed from the
next release.
legendary
Activity: 1470
Merit: 1114
I've noticed that some algos misreport their benchmark results if you run it under --benchmark with --time-limit.

A good example is c11:

Code:
cpuminer-opt --benchmark --time-limit 120 -a c11
This will run and will show steady 700kHs on my i7 4770, but at the end it will spur up and report 1791kHs.

I am automating the benchmarks and caught this too late -- cpuminers show a single-line result as last line in hashes per second.

I extract that and put into mining profitability calculation as seen here: https://hmage.net/minerstats.php#cpu_i7_4770_oc
I then graph that to see the trends here: http://grafana.hmage.net/dashboard/db/miner-stats

As a result, I will need to manually rerun all benchmarks that showed cpuminer-opt as favourable.

[EDIT]: Same with sib -- last line reports 1333333, but real speed is 486350.

I haven't paid any attention to benchmark other than a sanity test before connecting to a pool.
Didn't realize anyone was using it for anything useful.

I know of one change I made that can affect first and last result, I moved the share submission up
before the hash display. The change never produced a significant improvement but I never backed
it out. I'll look into it further when I solve the ZR5 problem,

Edit: I presume the error you are seeing is in the "Benchmark:" hashrate display. I don't see how the change
I mentioned above could have caused this. Furthermore this display uses a unique formatting function.
I don't think it's my bug. I had made another change which would show a lower benchmark rate because I
was counting the entire miner thread loop time instead of just the scan time but, again, I don't see how that change
could have produce the error you are seeing.

The "Benchmark" rate should be the same number as the previous "Total" rate because it is using the same variable.
This suggests a formatting error. Anyway I'll have to investigate further tomorrow.

Edit2: This is what I get, I don't see the problem.

Code:
[2016-05-11 09:39:03] CPU #5: 246.93 kH, 47.87 kH/s
[2016-05-11 09:39:03] CPU #2: 122.98 kH, 50.54 kH/s
[2016-05-11 09:39:03] CPU #0: 55.23 kH, 48.31 kH/s
[2016-05-11 09:39:04] CPU #6: 52.29 kH, 48.96 kH/s
[2016-05-11 09:39:04] CPU #1: 167.29 kH, 55.89 kH/s
[2016-05-11 09:39:04] CPU #3: 260.19 kH, 46.91 kH/s
[2016-05-11 09:39:04] CPU #7: 49.19 kH, 57.17 kH/s
[2016-05-11 09:39:04] Total: 1206.99 kH, 406.06 kH/s
[2016-05-11 09:39:06] CPU #7: 262.14 kH, 115.02 kH/s
[2016-05-11 09:39:06] Total: 1419.94 kH, 463.92 kH/s
[2016-05-11 09:39:07] CPU #5: 262.14 kH, 67.54 kH/s
[2016-05-11 09:39:08] CPU #4: 262.14 kH, 52.88 kH/s
[2016-05-11 09:39:08] CPU #0: 262.14 kH, 55.62 kH/s
[2016-05-11 09:39:08] Benchmark: 463.92 kH/s
463916


Edit3: I ran the test several times with both algos and alway produced the correct result.
Could your script be misinterpreting? Without further information on how to reproduce
I consider this issue closed.
member
Activity: 83
Merit: 10
I've noticed that some algos misreport their benchmark results if you run it under --benchmark with --time-limit.

A good example is c11:

Code:
cpuminer-opt --benchmark --time-limit 120 -a c11
This will run and will show steady 700kHs on my i7 4770, but at the end it will spur up and report 1791kHs.

I am automating the benchmarks and caught this too late -- cpuminers show a single-line result as last line in hashes per second.

I extract that and put into mining profitability calculation as seen here: https://hmage.net/minerstats.php#cpu_i7_4770_oc
I then graph that to see the trends here: http://grafana.hmage.net/dashboard/db/miner-stats

As a result, I will need to manually rerun all benchmarks that showed cpuminer-opt as favourable.

[EDIT]: Same with sib -- last line reports 1333333, but real speed is 486350.
legendary
Activity: 1470
Merit: 1114
sr. member
Activity: 312
Merit: 250
cpuminer-opt v3.2 is released. This is a restructuring release with no new algos or optimizations.

   - algo_gate is now used to select RPC version in many instances
   - Significant restructuring of algo_gate with realignment and
     renaming of many functions to be more descriptive and logical.
   - Some gate functions were removed or replaced with variables.
   - Code cleanup, fixed some compile warnings.

https://drive.google.com/file/d/0B0lVSGQYLJIZTzR2WU1NRGpYRnM/view?usp=sharing

Nice!

I think that this version has a serious bug - on ZR5 it submits one share and after that all rejects due to duplicate share.
Code:
         **********  cpuminer-opt 3.2  ***********
     A CPU miner with multi algo support and optimized for CPUs
     with AES_NI extension.
     BTC donation address: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT
     Forked from TPruvot's cpuminer-multi with credits
     to Lucas Jones, elmad, palmd, djm34, pooler, ig0tik3d,
     Wolf0 and Jeff Garzik.

Checking CPU capatibility...
        Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
   CPU arch supports AES_NI...YES.
   SW built for AES_NI........YES.
   Algo supports AES_NI.......YES.
Start mining with AES_NI optimizations...

[2016-05-10 09:47:54] Starting Stratum on stratum+tcp://ziftrpool.io:3032
[2016-05-10 09:47:54] 8 miner threads started, using 'zr5' algorithm.
[2016-05-10 09:47:57] Stratum difficulty set to 0.001
[2016-05-10 09:47:57] ziftrpool.io:3032 zr5 block 613501
[2016-05-10 09:47:58] CPU #0: 62.55 kH, 84.38 kH/s
[2016-05-10 09:47:59] accepted: 1/1 (100%), 62.55 kH, 84.38 kH/s yes!
[2016-05-10 09:47:59] CPU #0: 62.55 kH, 106.76 kH/s
[2016-05-10 09:47:59] accepted: 1/2 (50%), 62.55 kH, 106.76 kH/s nooooo
[2016-05-10 09:47:59] reject reason: duplicate share
[2016-05-10 09:48:00] CPU #0: 62.55 kH, 107.13 kH/s
[2016-05-10 09:48:00] accepted: 1/3 (33%), 62.55 kH, 107.13 kH/s nooooo
[2016-05-10 09:48:00] reject reason: duplicate share
[2016-05-10 09:48:00] CPU #0: 62.55 kH, 107.00 kH/s
[2016-05-10 09:48:00] accepted: 1/4 (25%), 62.55 kH, 107.00 kH/s nooooo
[2016-05-10 09:48:00] reject reason: duplicate share
[2016-05-10 09:48:01] CPU #0: 62.55 kH, 107.15 kH/s
[2016-05-10 09:48:01] accepted: 1/5 (20%), 62.55 kH, 107.15 kH/s nooooo
[2016-05-10 09:48:01] reject reason: duplicate share
[2016-05-10 09:48:01] CPU #0: 62.55 kH, 106.72 kH/s
[2016-05-10 09:48:01] accepted: 1/6 (17%), 62.55 kH, 106.72 kH/s nooooo
[2016-05-10 09:48:01] reject reason: duplicate share
[2016-05-10 09:48:02] CPU #0: 62.55 kH, 107.09 kH/s
[2016-05-10 09:48:02] accepted: 1/7 (14%), 62.55 kH, 107.09 kH/s nooooo
[2016-05-10 09:48:02] reject reason: duplicate share
[2016-05-10 09:48:03] CPU #0: 62.55 kH, 106.65 kH/s
[2016-05-10 09:48:03] accepted: 1/8 (12%), 62.55 kH, 106.65 kH/s nooooo
[2016-05-10 09:48:03] reject reason: duplicate share
[2016-05-10 09:48:03] CPU #0: 62.55 kH, 107.12 kH/s
[2016-05-10 09:48:03] accepted: 1/9 (11%), 62.55 kH, 107.12 kH/s nooooo
[2016-05-10 09:48:03] reject reason: duplicate share
[2016-05-10 09:48:04] CPU #0: 62.55 kH, 107.12 kH/s
^C[2016-05-10 09:48:04] SIGINT received, exiting
legendary
Activity: 1470
Merit: 1114
cpuminer-opt v3.2 is released. This is a restructuring release with no new algos or optimizations.

   - algo_gate is now used to select RPC version in many instances
   - Significant restructuring of algo_gate with realignment and
     renaming of many functions to be more descriptive and logical.
   - Some gate functions were removed or replaced with variables.
   - Code cleanup, fixed some compile warnings.

https://drive.google.com/file/d/0B0lVSGQYLJIZTzR2WU1NRGpYRnM/view?usp=sharing
legendary
Activity: 1470
Merit: 1114
I have problem with CentOs 6.5 (final). I try lo install with build.sh command but i have problems (i'm windows user.. i don't know more about Cheesy ). Someone know how to install on CentOS 6.5? Specific parameters on configure command?

Thank you. great job

There should be no special procedure for centos. If this is your first time it is likely you are missing some
dependencies. Some new ones were added recently to support new algos. Post your errors if you still have
problems.

I went on Ubuntu... after some dep install and fix I compiled it! but i think i have a problem.. :

Code:
         **********  cpuminer-opt 3.1.18  *********** 
     A CPU miner with multi algo support and optimized for CPUs
     with AES_NI extension.
     BTC donation address: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT
     Forked from TPruvot's cpuminer-multi with credits
     to Lucas Jones, elmad, palmd, djm34, pooler, ig0tik3d,
     Wolf0 and Jeff Garzik.

Checking CPU capatibility...
        Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz
   CPU arch supports AES_NI...YES.
   SW built for AES_NI........YES.
   Algo supports AES_NI.......YES.
Start mining with AES_NI optimizations...

[2016-05-02 15:03:08] 2 miner threads started, using 'hodl' algorithm.
[2016-05-02 15:03:08] Starting Stratum on stratum+tcp://hodl.suprnova.cc:4693
[2016-05-02 15:03:17] Stratum difficulty set to 1
[2016-05-02 15:04:46] hodl.suprnova.cc:4693 hodl block 48575
[2016-05-02 15:04:46] CPU #0: 206 H, 2.31 H/s
[2016-05-02 15:04:46] CPU #1: 193 H, 2.17 H/s
[2016-05-02 15:05:09] CPU #1: 21 H, 0.90 H/s
[2016-05-02 15:05:10] accepted: 1/1 (100%), 227 H, 3.21 H/s yes!
[2016-05-02 15:05:14] CPU #0: 30 H, 1.09 H/s
[2016-05-02 15:05:14] accepted: 2/2 (100%), 51


That's a pretty low hashrate, you should be getting about 10 times that rate.
You're only running 2 threads, is that intentional? Does the hashrate remain that
low or does it increase and stabilize over time? The only thing I can think of that would
cause it is RAM. How much do you have and are you swapping?
full member
Activity: 239
Merit: 100
I have problem with CentOs 6.5 (final). I try lo install with build.sh command but i have problems (i'm windows user.. i don't know more about Cheesy ). Someone know how to install on CentOS 6.5? Specific parameters on configure command?

Thank you. great job

There should be no special procedure for centos. If this is your first time it is likely you are missing some
dependencies. Some new ones were added recently to support new algos. Post your errors if you still have
problems.

I went on Ubuntu... after some dep install and fix I compiled it! but i think i have a problem.. :

Code:
         **********  cpuminer-opt 3.1.18  *********** 
     A CPU miner with multi algo support and optimized for CPUs
     with AES_NI extension.
     BTC donation address: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT
     Forked from TPruvot's cpuminer-multi with credits
     to Lucas Jones, elmad, palmd, djm34, pooler, ig0tik3d,
     Wolf0 and Jeff Garzik.

Checking CPU capatibility...
        Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz
   CPU arch supports AES_NI...YES.
   SW built for AES_NI........YES.
   Algo supports AES_NI.......YES.
Start mining with AES_NI optimizations...

[2016-05-02 15:03:08] 2 miner threads started, using 'hodl' algorithm.
[2016-05-02 15:03:08] Starting Stratum on stratum+tcp://hodl.suprnova.cc:4693
[2016-05-02 15:03:17] Stratum difficulty set to 1
[2016-05-02 15:04:46] hodl.suprnova.cc:4693 hodl block 48575
[2016-05-02 15:04:46] CPU #0: 206 H, 2.31 H/s
[2016-05-02 15:04:46] CPU #1: 193 H, 2.17 H/s
[2016-05-02 15:05:09] CPU #1: 21 H, 0.90 H/s
[2016-05-02 15:05:10] accepted: 1/1 (100%), 227 H, 3.21 H/s yes!
[2016-05-02 15:05:14] CPU #0: 30 H, 1.09 H/s
[2016-05-02 15:05:14] accepted: 2/2 (100%), 51
Jump to: