Author

Topic: [ANN]: cpuminer-opt v3.8.8.1, open source optimized multi-algo CPU miner - page 168. (Read 444040 times)

legendary
Activity: 1470
Merit: 1114
The miners I tried used getwork gbt version 112. This fork should work similarly. I'm willing to do the testing. I've been solo mining ESPERS.

I appreciate your offer to test. Looking forward to your results. I hope I haven't broken anything.
legendary
Activity: 3416
Merit: 1912
The Concierge of Crypto
The miners I tried used getwork gbt version 112. This fork should work similarly. I'm willing to do the testing. I've been solo mining ESPERS.
legendary
Activity: 1470
Merit: 1114
What do you need to know? Everything should be in the README.md file. Debian should be no problem, nor any other
major distro. I didn't mention it because I assumed most Windows users would find Ubuntu less intimidating.

So, here's the tentative plan:

1. set up a new VM with all cores and 4 or 8 GB of RAM
2. install Debian on it, probably a net install of Debian 8
3. run build.sh
4. install whatever coin wallet software and set it up so I can mine to localhost.

This is almost what I do with my Windows VM; installed Win 10 on it, downloaded the wallet software, and ran three different versions of cpuminer on it. (I come from the ESPERS thread, there have been 3 versions of cpuminers there I think.)

I run Windows Server 2012 R2, so I use Hyper-V on one of my little boxes. I got it cheap, used, something like $600 USD for a dual quad core 48 GB ram rack server.

Only stratum has been tested so I have no idea how it will behave if you try anything else.
legendary
Activity: 3416
Merit: 1912
The Concierge of Crypto
What do you need to know? Everything should be in the README.md file. Debian should be no problem, nor any other
major distro. I didn't mention it because I assumed most Windows users would find Ubuntu less intimidating.

So, here's the tentative plan:

1. set up a new VM with all cores and 4 or 8 GB of RAM
2. install Debian on it, probably a net install of Debian 8
3. run build.sh
4. install whatever coin wallet software and set it up so I can mine to localhost.

This is almost what I do with my Windows VM; installed Win 10 on it, downloaded the wallet software, and ran three different versions of cpuminer on it. (I come from the ESPERS thread, there have been 3 versions of cpuminers there I think.)

I run Windows Server 2012 R2, so I use Hyper-V on one of my little boxes. I got it cheap, used, something like $600 USD for a dual quad core 48 GB ram rack server.
legendary
Activity: 1470
Merit: 1114
cpuminer-opt v3.2.4 with support for hmq1725 (espers).

https://drive.google.com/file/d/0B0lVSGQYLJIZY1BVV1RGZFlJclU/view?usp=sharing

it's 56% faster than cpuminer-hmq1725 on CPUs with AES_NI, 17% faster with SSE2.
I recommend using the GPU port at suprnova, the CPU port has a rough start before the diff
settles.


Is there a way to get windows version?


You can use Virtualbox to create a virtual Linux machine, same performance if you allocate
all CPUs to the VM.
hero member
Activity: 774
Merit: 500
cpuminer-opt v3.2.4 with support for hmq1725 (espers).

https://drive.google.com/file/d/0B0lVSGQYLJIZY1BVV1RGZFlJclU/view?usp=sharing

it's 56% faster than cpuminer-hmq1725 on CPUs with AES_NI, 17% faster with SSE2.
I recommend using the GPU port at suprnova, the CPU port has a rough start before the diff
settles.


Is there a way to get windows version?
legendary
Activity: 1470
Merit: 1114
cpuminer-opt v3.2.4 with support for hmq1725 (espers).

https://drive.google.com/file/d/0B0lVSGQYLJIZY1BVV1RGZFlJclU/view?usp=sharing

it's 56% faster than cpuminer-hmq1725 on CPUs with AES_NI, 17% faster with SSE2.
I recommend using the GPU port at suprnova, the CPU port has a rough start before the diff
settles.
legendary
Activity: 1470
Merit: 1114
is this better than the wolfo version for hodl and esper?

It should be equal to Wolf0 on hodl and esper is not implemented yet.

Edit: I'm taking a look at espers and i think I can improve it.
legendary
Activity: 2590
Merit: 1022
Leading Crypto Sports Betting & Casino Platform
is this better than the wolfo version for hodl and esper?
legendary
Activity: 1470
Merit: 1114
cpuminer-opt v3.2.3 is released.

More restructuring, code cleanup and bug fixes. This should be the best release yet.

https://drive.google.com/file/d/0B0lVSGQYLJIZMWdsV21XM0tob0U/view?usp=sharing
legendary
Activity: 1470
Merit: 1114
I still have a lot to learn about c/c++. I got burned by pointer arithmetic this week. It seemed only logical
to me that  "p + n" would be a byte offset while "p[ i ]" would be scaled. Surprise, the're both scaled.

My next issue is how to consolidate the definitions of frequently used text strings. In my native language
it's a simple matter of defining the strings in a header file and referencing them in many source files. This approach
causes multi-def warnings in c/c++.

I often see #define macros but they result in the strings being copied by every reference.

Does anyone know of a way in c/c++ to have one definition with multiple references that don't make copies?

-fmerge-constants
Attempt to merge identical constants (string constants and floating-point constants) across compilation units.
This option is the default for optimized compilation if the assembler and linker support it. Use -fno-merge-constants to inhibit this behavior.

Enabled at levels -O, -O2, -O3, -Os.

Thanks Pallas.

The description of this option indicates it tries to merge multiple explicit definitions of the same constant which means
I'm worrying about nothing. I'm trying to merge explicitly defined identical constants by making a single
definition while it seems the compiler will do it transparently. Outsmarted by the compiler again, I just wish things
wouldn't break when the compiler overrides my code.
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
I still have a lot to learn about c/c++. I got burned by pointer arithmetic this week. It seemed only logical
to me that  "p + n" would be a byte offset while "p[ i ]" would be scaled. Surprise, the're both scaled.

My next issue is how to consolidate the definitions of frequently used text strings. In my native language
it's a simple matter of defining the strings in a header file and referencing them in many source files. This approach
causes multi-def warnings in c/c++.

I often see #define macros but they result in the strings being copied by every reference.

Does anyone know of a way in c/c++ to have one definition with multiple references that don't make copies?

-fmerge-constants
Attempt to merge identical constants (string constants and floating-point constants) across compilation units.
This option is the default for optimized compilation if the assembler and linker support it. Use -fno-merge-constants to inhibit this behavior.

Enabled at levels -O, -O2, -O3, -Os.
legendary
Activity: 1470
Merit: 1114
I still have a lot to learn about c/c++. I got burned by pointer arithmetic this week. It seemed only logical
to me that  "p + n" would be a byte offset while "p[ i ]" would be scaled. Surprise, the're both scaled.

My next issue is how to consolidate the definitions of frequently used text strings. In my native language
it's a simple matter of defining the strings in a header file and referencing them in many source files. This approach
causes multi-def warnings in c/c++.

I often see #define macros but they result in the strings being copied by every reference.

Does anyone know of a way in c/c++ to have one definition with multiple references that don't make copies?
legendary
Activity: 1470
Merit: 1114
Download cpuminer-opt v3.2.2:

https://drive.google.com/file/d/0B0lVSGQYLJIZX1F4dHd2NlBHSXc/view?usp=sharing

I finally found the root cause for the zr5 bug, I still don't understand why it seems to
work in v3.2.1 since the original bug from v3.2 was still present. This release is what
v3.2 should have been.
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
I agree, what counts is going ahead in the way of knowledge.
Everybody does it his way. Some just stand still but that's not the kind of people usually posting here.
legendary
Activity: 1470
Merit: 1114
It was my assumption that you would have already done that. We both made assumptions, not a good idea.

Yeap. I have only glanced briefly at the source code.

Anyway, I should apologise for my behaviour, it was unprofessional and that lead to less productive results. You weren't perfect either but everyone has faults since everyone is human and every suggestion or problem report felt like court trial just on how much work needed to be done on my end compared to what I saw being done on your end regarding the issue or suggestion (you always asked to do research or just more data without seemingly doing any research on your own before you pass your judgement). I really like your work so far and very appreciate it, though, and don't want to distract you from that more than I already did.

Your perception of a court trial is pretty accurate. I was thinking something similar, a lawyer gets one crack at presenting
a case. If the lawyer comes to court unprepared the case gets tossed and he doesn't get another chance.

Although I'm atheist a Bible passage comes to mind. Let he who is without sin throw the first stone. The implication being
that no one is without sin. I simply picked up the stones and threw them back.

An apology is not required, coming to an understanding and learning from it is more important, and applies to both of us.
Nevertheless you offered one and I accept. For my part I'm not one to apologize for my actions, too stubborn, I guess.
But in hindsight I think the timing was bad. I had just released v3.2 and had broken zr5 which was embarassing and was
trying to focus on that issue. In fact I am not pleased with the overall quality of my releases, too many bad ones.
I expect better of myself. Am I losing my edge or is it because I forgot what it was like to be on a steep learning curve
after so long being a subject matter expert? Yeah, I'm arrogant too.

No hard feelings. Cheers.
member
Activity: 83
Merit: 10
It was my assumption that you would have already done that. We both made assumptions, not a good idea.

Yeap. I have only glanced briefly at the source code.

Anyway, I should apologise for my behaviour, it was unprofessional and that lead to less productive results. You weren't perfect either but everyone has faults since everyone is human and every suggestion or problem report felt like court trial just on how much work needed to be done on my end compared to what I saw being done on your end regarding the issue or suggestion (you always asked to do research or just more data without seemingly doing any research on your own before you pass your judgement). I really like your work so far and very appreciate it, though, and don't want to distract you from that more than I already did.
legendary
Activity: 1470
Merit: 1114
I did acknowledge the overhead of the deref but was at a loss to explain why I observed a performance
gain.

You didn't provide numbers, unfortunately, and you didn't provide a way to recreate the benchmarks to verify your claims either, since there's no archive of older versions of cpuminer-opt to build against. If it were on github, for example, that would have been easier to test.

Each scan takes seconds to run so the overhead of one extra pointer deref every few
seconds is immeasurable. Even if you go up a level to the miner_thread loop. There are maybe 20
gated fuction calls every loop. 20 extra derefs every few seconds is still immeasurable.

That was the info I was looking for, thank you.

This whole debate was too long just because either I didn't communicate clearly enough that I am assuming it is done on every hash call or because you didn't recognize that when reading. Pseudocode should have been a big hint at that.

Either way, this debate is pointless, 20 calls a second isn't something to worry about. The observed slowdown must be caused by other factors.


I think you hit the nail on the head when you said you made an assumption. That was, IMO, your biggest mistake and why I
kept repeating that you need to do your homework before bringing it to my attention, Had you done that you would have realized
yourself that the deref overhead was trivial and any observed performance diff was due to something else.

It was my assumption that you would have already done that. We both made assumptions, not a good idea.

I didn't have numbers because there was no way to run a controlled test with the necessary level of precision and accuracy.
And it's also why I suggested it wasn't worth your effort to go back and restest previous releases.
member
Activity: 83
Merit: 10
I did acknowledge the overhead of the deref but was at a loss to explain why I observed a performance
gain.

You didn't provide numbers, unfortunately, and you didn't provide a way to recreate the benchmarks to verify your claims either, since there's no archive of older versions of cpuminer-opt to build against. If it were on github, for example, that would have been easier to test.

Each scan takes seconds to run so the overhead of one extra pointer deref every few
seconds is immeasurable. Even if you go up a level to the miner_thread loop. There are maybe 20
gated fuction calls every loop. 20 extra derefs every few seconds is still immeasurable.

That was the info I was looking for, thank you.

This whole debate was too long just because either I didn't communicate clearly enough that I am assuming it is done on every hash call or because you didn't recognize that when reading. Pseudocode should have been a big hint at that.

Either way, this debate is pointless, 20 calls a second isn't something to worry about. The observed slowdown must be caused by other factors.
legendary
Activity: 1470
Merit: 1114
Okay then, explain this: https://gist.github.com/hmage/2a1fdbd7bdad252cd08c9b4166c5727a

on Core i5-4570S:
Code:
hmage@dhmd:~/test$ cat /proc/cpuinfo |fgrep name|head -1
model name      : Intel(R) Core(TM) i5-4570S CPU @ 2.90GHz
hmage@dhmd:~/test$ gcc dereference_bench.c -O2 -o dereference_bench && ./dereference_bench
      workfunc(): 0.002082 microseconds per call, 480308.777k per second
  workloopfunc(): 0.001774 microseconds per call, 563746.643k per second

on Core i7-4770:
Code:
hmage@vhmd:~$ cat /proc/cpuinfo |fgrep name|head -1
model name      : Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
hmage@vhmd:~$ gcc dereference_bench.c -O2 -o dereference_bench && ./dereference_bench
      workfunc(): 0.001776 microseconds per call, 562932.922k per second
  workloopfunc(): 0.001506 microseconds per call, 664150.879k per second


Dereferencing on every call _is_ a big performance hit, unless you have another explanation.

Oh, I already know, you get angry.

It looks to me that it was you who got angry. I apologise for my blunt approach.

A little impatient maybe but not really angry. I try to stick to the issues.

Yes, deferencing a pointer to call a function adds overhead but it has to be taken in context.
How often does that occur in the big picture? Take scanhash, for example, the lowest level function
that is gated. Each scan takes seconds to run so the overhead of one extra pointer deref every few
seconds is immeasurable. Even if you go up a level to the miner_thread loop. There are maybe 20
gated fuction calls every loop. 20 extra derefs every few seconds is still immeasurable.

Any change of program flow has overhead, that's why function inlining and loop unrolling exist.
But if the code size of an unrolled loop overflows the cache you may end up losing more performance
from cache misses than you gained from inlining.

This might answer your question:

https://bitcointalksearch.org/topic/m.13770966

I clearly stated I did not predict a performance gain from algo-gate and if you dig deeper you may find
where I did acknowledge the overhead of the deref but was at a loss to explain why I observed a performance
gain. Maybe my observations were just noise, maybe some other change is responsible for the increase in
performance in spite of the gate. I just don't know. There are too many variables that can't be controlled so
I dismiss such observations without a solid case to back it up.

Finally what it comes down to, like any decision, is a balance. Algo-gate was never about performance it was
about a better architecture that made it easier for developpers to add new algos to the miner with minimal
disruption to the existing code. I judged the performnce cost to be negligible.
Jump to: