[ANN]: cpuminer-opt v3.8.8.1, open source optimized multi-algo CPU miner - page 168.

joblo

legendary

Activity: 1470

Merit: 1114

Quote from: Dabs on May 19, 2016, 08:24:00 PM

The miners I tried used getwork gbt version 112. This fork should work similarly. I'm willing to do the testing. I've been solo mining ESPERS.

I appreciate your offer to test. Looking forward to your results. I hope I haven't broken anything.

Dabs

legendary

Activity: 3416

Merit: 1912

The Concierge of Crypto

The miners I tried used getwork gbt version 112. This fork should work similarly. I'm willing to do the testing. I've been solo mining ESPERS.

joblo

legendary

Activity: 1470

Merit: 1114

Quote from: Dabs on May 19, 2016, 06:26:01 PM

Quote from: joblo on May 19, 2016, 05:56:43 PM

What do you need to know? Everything should be in the README.md file. Debian should be no problem, nor any other
major distro. I didn't mention it because I assumed most Windows users would find Ubuntu less intimidating.

So, here's the tentative plan:

1. set up a new VM with all cores and 4 or 8 GB of RAM
2. install Debian on it, probably a net install of Debian 8
3. run build.sh
4. install whatever coin wallet software and set it up so I can mine to localhost.

This is almost what I do with my Windows VM; installed Win 10 on it, downloaded the wallet software, and ran three different versions of cpuminer on it. (I come from the ESPERS thread, there have been 3 versions of cpuminers there I think.)

I run Windows Server 2012 R2, so I use Hyper-V on one of my little boxes. I got it cheap, used, something like $600 USD for a dual quad core 48 GB ram rack server.

Only stratum has been tested so I have no idea how it will behave if you try anything else.

Dabs

legendary

Activity: 3416

Merit: 1912

The Concierge of Crypto

Quote from: joblo on May 19, 2016, 05:56:43 PM

What do you need to know? Everything should be in the README.md file. Debian should be no problem, nor any other
major distro. I didn't mention it because I assumed most Windows users would find Ubuntu less intimidating.

So, here's the tentative plan:

1. set up a new VM with all cores and 4 or 8 GB of RAM
2. install Debian on it, probably a net install of Debian 8
3. run build.sh
4. install whatever coin wallet software and set it up so I can mine to localhost.

This is almost what I do with my Windows VM; installed Win 10 on it, downloaded the wallet software, and ran three different versions of cpuminer on it. (I come from the ESPERS thread, there have been 3 versions of cpuminers there I think.)

I run Windows Server 2012 R2, so I use Hyper-V on one of my little boxes. I got it cheap, used, something like $600 USD for a dual quad core 48 GB ram rack server.

joblo

legendary

Activity: 1470

Merit: 1114

Quote from: monoxide on May 19, 2016, 04:25:57 PM

Quote from: joblo on May 19, 2016, 01:18:05 PM

cpuminer-opt v3.2.4 with support for hmq1725 (espers).

https://drive.google.com/file/d/0B0lVSGQYLJIZY1BVV1RGZFlJclU/view?usp=sharing

it's 56% faster than cpuminer-hmq1725 on CPUs with AES_NI, 17% faster with SSE2.
I recommend using the GPU port at suprnova, the CPU port has a rough start before the diff
settles.

Is there a way to get windows version?

You can use Virtualbox to create a virtual Linux machine, same performance if you allocate
all CPUs to the VM.

monoxide

hero member

Activity: 774

Merit: 500

Quote from: joblo on May 19, 2016, 01:18:05 PM

cpuminer-opt v3.2.4 with support for hmq1725 (espers).

https://drive.google.com/file/d/0B0lVSGQYLJIZY1BVV1RGZFlJclU/view?usp=sharing

it's 56% faster than cpuminer-hmq1725 on CPUs with AES_NI, 17% faster with SSE2.
I recommend using the GPU port at suprnova, the CPU port has a rough start before the diff
settles.

Is there a way to get windows version?

joblo

legendary

Activity: 1470

Merit: 1114

cpuminer-opt v3.2.4 with support for hmq1725 (espers).

https://drive.google.com/file/d/0B0lVSGQYLJIZY1BVV1RGZFlJclU/view?usp=sharing

it's 56% faster than cpuminer-hmq1725 on CPUs with AES_NI, 17% faster with SSE2.
I recommend using the GPU port at suprnova, the CPU port has a rough start before the diff
settles.

joblo

legendary

Activity: 1470

Merit: 1114

Quote from: Ayers on May 19, 2016, 12:49:00 AM

is this better than the wolfo version for hodl and esper?

It should be equal to Wolf0 on hodl and esper is not implemented yet.

Edit: I'm taking a look at espers and i think I can improve it.

Ayers

legendary

Activity: 2590

Merit: 1022

Leading Crypto Sports Betting & Casino Platform

is this better than the wolfo version for hodl and esper?

joblo

legendary

Activity: 1470

Merit: 1114

cpuminer-opt v3.2.3 is released.

More restructuring, code cleanup and bug fixes. This should be the best release yet.

https://drive.google.com/file/d/0B0lVSGQYLJIZMWdsV21XM0tob0U/view?usp=sharing

joblo

legendary

Activity: 1470

Merit: 1114

Quote from: pallas on May 15, 2016, 10:13:51 AM

Quote from: joblo on May 15, 2016, 09:22:35 AM

I still have a lot to learn about c/c++. I got burned by pointer arithmetic this week. It seemed only logical
to me that "p + n" would be a byte offset while "p[ i ]" would be scaled. Surprise, the're both scaled.

My next issue is how to consolidate the definitions of frequently used text strings. In my native language
it's a simple matter of defining the strings in a header file and referencing them in many source files. This approach
causes multi-def warnings in c/c++.

I often see #define macros but they result in the strings being copied by every reference.

Does anyone know of a way in c/c++ to have one definition with multiple references that don't make copies?

-fmerge-constants
Attempt to merge identical constants (string constants and floating-point constants) across compilation units.
This option is the default for optimized compilation if the assembler and linker support it. Use -fno-merge-constants to inhibit this behavior.

Enabled at levels -O, -O2, -O3, -Os.

Thanks Pallas.

The description of this option indicates it tries to merge multiple explicit definitions of the same constant which means
I'm worrying about nothing. I'm trying to merge explicitly defined identical constants by making a single
definition while it seems the compiler will do it transparently. Outsmarted by the compiler again, I just wish things
wouldn't break when the compiler overrides my code.

pallas

legendary

Activity: 2716

Merit: 1094

Black Belt Developer

Quote from: joblo on May 15, 2016, 09:22:35 AM

I still have a lot to learn about c/c++. I got burned by pointer arithmetic this week. It seemed only logical
to me that "p + n" would be a byte offset while "p[ i ]" would be scaled. Surprise, the're both scaled.

My next issue is how to consolidate the definitions of frequently used text strings. In my native language
it's a simple matter of defining the strings in a header file and referencing them in many source files. This approach
causes multi-def warnings in c/c++.

I often see #define macros but they result in the strings being copied by every reference.

Does anyone know of a way in c/c++ to have one definition with multiple references that don't make copies?

-fmerge-constants
Attempt to merge identical constants (string constants and floating-point constants) across compilation units.
This option is the default for optimized compilation if the assembler and linker support it. Use -fno-merge-constants to inhibit this behavior.

Enabled at levels -O, -O2, -O3, -Os.

joblo

legendary

Activity: 1470

Merit: 1114

I still have a lot to learn about c/c++. I got burned by pointer arithmetic this week. It seemed only logical
to me that "p + n" would be a byte offset while "p[ i ]" would be scaled. Surprise, the're both scaled.

My next issue is how to consolidate the definitions of frequently used text strings. In my native language
it's a simple matter of defining the strings in a header file and referencing them in many source files. This approach
causes multi-def warnings in c/c++.

I often see #define macros but they result in the strings being copied by every reference.

Does anyone know of a way in c/c++ to have one definition with multiple references that don't make copies?

joblo

legendary

Activity: 1470

Merit: 1114

Download cpuminer-opt v3.2.2:

https://drive.google.com/file/d/0B0lVSGQYLJIZX1F4dHd2NlBHSXc/view?usp=sharing

I finally found the root cause for the zr5 bug, I still don't understand why it seems to
work in v3.2.1 since the original bug from v3.2 was still present. This release is what
v3.2 should have been.

pallas

legendary

Activity: 2716

Merit: 1094

Black Belt Developer

I agree, what counts is going ahead in the way of knowledge.
Everybody does it his way. Some just stand still but that's not the kind of people usually posting here.

joblo

legendary

Activity: 1470

Merit: 1114

Quote from: hmage on May 12, 2016, 05:34:38 PM

Quote from: joblo on May 12, 2016, 04:22:55 PM

It was my assumption that you would have already done that. We both made assumptions, not a good idea.

Yeap. I have only glanced briefly at the source code.

Anyway, I should apologise for my behaviour, it was unprofessional and that lead to less productive results. You weren't perfect either but everyone has faults since everyone is human and every suggestion or problem report felt like court trial just on how much work needed to be done on my end compared to what I saw being done on your end regarding the issue or suggestion (you always asked to do research or just more data without seemingly doing any research on your own before you pass your judgement). I really like your work so far and very appreciate it, though, and don't want to distract you from that more than I already did.

Your perception of a court trial is pretty accurate. I was thinking something similar, a lawyer gets one crack at presenting
a case. If the lawyer comes to court unprepared the case gets tossed and he doesn't get another chance.

Although I'm atheist a Bible passage comes to mind. Let he who is without sin throw the first stone. The implication being
that no one is without sin. I simply picked up the stones and threw them back.

An apology is not required, coming to an understanding and learning from it is more important, and applies to both of us.
Nevertheless you offered one and I accept. For my part I'm not one to apologize for my actions, too stubborn, I guess.
But in hindsight I think the timing was bad. I had just released v3.2 and had broken zr5 which was embarassing and was
trying to focus on that issue. In fact I am not pleased with the overall quality of my releases, too many bad ones.
I expect better of myself. Am I losing my edge or is it because I forgot what it was like to be on a steep learning curve
after so long being a subject matter expert? Yeah, I'm arrogant too.

No hard feelings. Cheers.

hmage

member

Activity: 83

Merit: 10

Quote from: joblo on May 12, 2016, 04:22:55 PM

It was my assumption that you would have already done that. We both made assumptions, not a good idea.

Yeap. I have only glanced briefly at the source code.

Anyway, I should apologise for my behaviour, it was unprofessional and that lead to less productive results. You weren't perfect either but everyone has faults since everyone is human and every suggestion or problem report felt like court trial just on how much work needed to be done on my end compared to what I saw being done on your end regarding the issue or suggestion (you always asked to do research or just more data without seemingly doing any research on your own before you pass your judgement). I really like your work so far and very appreciate it, though, and don't want to distract you from that more than I already did.

joblo

legendary

Activity: 1470

Merit: 1114

Quote from: hmage on May 12, 2016, 03:06:01 PM

Quote from: joblo on May 12, 2016, 12:46:02 PM

I did acknowledge the overhead of the deref but was at a loss to explain why I observed a performance
gain.

You didn't provide numbers, unfortunately, and you didn't provide a way to recreate the benchmarks to verify your claims either, since there's no archive of older versions of cpuminer-opt to build against. If it were on github, for example, that would have been easier to test.

Quote from: joblo on May 12, 2016, 12:46:02 PM

Each scan takes seconds to run so the overhead of one extra pointer deref every few
seconds is immeasurable. Even if you go up a level to the miner_thread loop. There are maybe 20
gated fuction calls every loop. 20 extra derefs every few seconds is still immeasurable.

That was the info I was looking for, thank you.

This whole debate was too long just because either I didn't communicate clearly enough that I am assuming it is done on every hash call or because you didn't recognize that when reading. Pseudocode should have been a big hint at that.

Either way, this debate is pointless, 20 calls a second isn't something to worry about. The observed slowdown must be caused by other factors.

I think you hit the nail on the head when you said you made an assumption. That was, IMO, your biggest mistake and why I
kept repeating that you need to do your homework before bringing it to my attention, Had you done that you would have realized
yourself that the deref overhead was trivial and any observed performance diff was due to something else.

It was my assumption that you would have already done that. We both made assumptions, not a good idea.

I didn't have numbers because there was no way to run a controlled test with the necessary level of precision and accuracy.
And it's also why I suggested it wasn't worth your effort to go back and restest previous releases.

hmage

member

Activity: 83

Merit: 10

Quote from: joblo on May 12, 2016, 12:46:02 PM

I did acknowledge the overhead of the deref but was at a loss to explain why I observed a performance
gain.

You didn't provide numbers, unfortunately, and you didn't provide a way to recreate the benchmarks to verify your claims either, since there's no archive of older versions of cpuminer-opt to build against. If it were on github, for example, that would have been easier to test.

Quote from: joblo on May 12, 2016, 12:46:02 PM

Each scan takes seconds to run so the overhead of one extra pointer deref every few
seconds is immeasurable. Even if you go up a level to the miner_thread loop. There are maybe 20
gated fuction calls every loop. 20 extra derefs every few seconds is still immeasurable.

That was the info I was looking for, thank you.

This whole debate was too long just because either I didn't communicate clearly enough that I am assuming it is done on every hash call or because you didn't recognize that when reading. Pseudocode should have been a big hint at that.

Either way, this debate is pointless, 20 calls a second isn't something to worry about. The observed slowdown must be caused by other factors.

joblo

legendary

Activity: 1470

Merit: 1114

Quote from: hmage on May 12, 2016, 12:15:37 PM

Okay then, explain this: https://gist.github.com/hmage/2a1fdbd7bdad252cd08c9b4166c5727a

on Core i5-4570S:

Code:

hmage@dhmd:~/test$ cat /proc/cpuinfo |fgrep name|head -1
model name      : Intel(R) Core(TM) i5-4570S CPU @ 2.90GHz
hmage@dhmd:~/test$ gcc dereference_bench.c -O2 -o dereference_bench && ./dereference_bench
      workfunc(): 0.002082 microseconds per call, 480308.777k per second
  workloopfunc(): 0.001774 microseconds per call, 563746.643k per second

on Core i7-4770:

Code:

hmage@vhmd:~$ cat /proc/cpuinfo |fgrep name|head -1
model name      : Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
hmage@vhmd:~$ gcc dereference_bench.c -O2 -o dereference_bench && ./dereference_bench
      workfunc(): 0.001776 microseconds per call, 562932.922k per second
  workloopfunc(): 0.001506 microseconds per call, 664150.879k per second

Dereferencing on every call _is_ a big performance hit, unless you have another explanation.

Quote from: joblo on May 12, 2016, 11:04:48 AM

Oh, I already know, you get angry.

It looks to me that it was you who got angry. I apologise for my blunt approach.

A little impatient maybe but not really angry. I try to stick to the issues.

Yes, deferencing a pointer to call a function adds overhead but it has to be taken in context.
How often does that occur in the big picture? Take scanhash, for example, the lowest level function
that is gated. Each scan takes seconds to run so the overhead of one extra pointer deref every few
seconds is immeasurable. Even if you go up a level to the miner_thread loop. There are maybe 20
gated fuction calls every loop. 20 extra derefs every few seconds is still immeasurable.

Any change of program flow has overhead, that's why function inlining and loop unrolling exist.
But if the code size of an unrolled loop overflows the cache you may end up losing more performance
from cache misses than you gained from inlining.

This might answer your question:

https://bitcointalksearch.org/topic/m.13770966

I clearly stated I did not predict a performance gain from algo-gate and if you dig deeper you may find
where I did acknowledge the overhead of the deref but was at a loss to explain why I observed a performance
gain. Maybe my observations were just noise, maybe some other change is responsible for the increase in
performance in spite of the gate. I just don't know. There are too many variables that can't be controlled so
I dismiss such observations without a solid case to back it up.

Finally what it comes down to, like any decision, is a balance. Algo-gate was never about performance it was
about a better architecture that made it easier for developpers to add new algos to the miner with minimal
disruption to the existing code. I judged the performnce cost to be negligible.

Topic: [ANN]: cpuminer-opt v3.8.8.1, open source optimized multi-algo CPU miner - page 168. (Read 444131 times)