Author

Topic: [ANN]: cpuminer-opt v3.8.8.1, open source optimized multi-algo CPU miner - page 187. (Read 444067 times)

full member
Activity: 231
Merit: 150
legendary
Activity: 1470
Merit: 1114
Progress update.

I found 4% more hash in quark and I've tested some of the more obscure algos so another
3.0 update is coming before 3.1. I'll take anorther day to look for more low hanging fruit
and to a full suite of testing before releasing. I want this to be super stable.

Then I will start on windows, I promise.

V 3.0.7 almost ready.

Edit

I was checking some stats while testing and here is how much has been gained since
the project forked.

quark + 27%
qubit  + 36
x13    + 92
x15    + 76

It's come a long way.
legendary
Activity: 1470
Merit: 1114
4% boost in quark coming due to implementation of fast reinit_groestl. Maybe it
will work in ccminer too. It' was simple, just clone init_groestl and remove the constant
initializations. It speeds up the init every time groestl is run. Just make sure to do
a full init the first time.

ccminer only to the init pass once per algo.  Until it finds a solution for all the hashfunctions

Quote
Edit: only worked for quark because quark runs twice in the chain. Only need to do a
reinit before the second run. Full init works but is slower. No init is even faster but
never finds blocks.

ccminer only run groestl once.
Quote
In cpuminer it only appears once in the code but it runs twice.
Why don't you share your code on github so I can add 100%  Grin
[/quote]

You can get it from google drive, help yourself. I do intend to get on github when
things settle down. There are lots of algos that need AES_NI implementations,
so far only groestl and echo. A couple more have SSE2 and the rest are dead slow.

Wolf's cryptonight is nice. CPU runs cool and performance is higher than the other algos
when compared with ccminer. For mining XMR a CPU is better.
sp_
legendary
Activity: 2926
Merit: 1087
Team Black developer
4% boost in quark coming due to implementation of fast reinit_groestl. Maybe it
will work in ccminer too. It' was simple, just clone init_groestl and remove the constant
initializations. It speeds up the init every time groestl is run. Just make sure to do
a full init the first time.

ccminer only to the init pass once per algo.  Until it finds a solution for all the hashfunctions

Quote
Edit: only worked for quark because quark runs twice in the chain. Only need to do a
reinit before the second run. Full init works but is slower. No init is even faster but
never finds blocks.

ccminer only run groestl once.

Why don't you share your code on github so I can add 100%  Grin
legendary
Activity: 1470
Merit: 1114
Why is c11 slower than x11?

I haven't been able to integrate the groestl AES_NI  optimizations yet. I'm having
that problem with many algos that use groestl. Only x11 and quark are working.
It could do wonders on some other algos especially groestl itself. If I can get it
working it coould be a 100% boost.

4% boost in quark coming due to implementation of fast reinit_groestl. Maybe it
will work in ccminer too. It' was simple, just clone init_groestl and remove the constant
initializations. It speeds up the init every time groestl is run. Just make sure to do
a full init the first time.

Edit: only worked for quark because quark runs twice in the chain. Only need to do a
reinit before the second run. Full init works but is slower. No init is even faster but
never finds blocks.

Some of my improvements have come from optimizing the ctx init, avoiding
doing it for nothing. I haven't looked a ccminer but there may be opportunities there.
If it works for you don't forget where you got the idea. Wink
sp_
legendary
Activity: 2926
Merit: 1087
Team Black developer
Why is c11 slower than x11?
legendary
Activity: 1470
Merit: 1114
I found my hash!

X11 back up to peak, it has been down a bit for as couple of releases but it's back.
I also restored cryptonight performance to the same level as Wolf0.

v3.0.6 should be the best release yet, the most algos, the highest performance
and the widest HW support.

It's baking in the oven.

This out turned out well. I'm running out of excused to delay windows support.

https://drive.google.com/file/d/0B0lVSGQYLJIZSmFXUnZrdDFkTjg/view?usp=sharing

Edit: It has been observed that better performance is achieved when using the CPU name
instead of native for the arch argument. It doesn't seem to work for more recent CPU,
the only accepte dvalue is corei7-avx, rather than haswell, sandybridge etc.
On older CPUs such as the core2 it may make a difference. YMMV

Just when I thought there was no more optimising to do I found 4% more in quark. I'll
see if it works on other algos.
legendary
Activity: 1470
Merit: 1114
I found my hash!

X11 back up to peak, it has been down a bit for as couple of releases but it's back.
I also restored cryptonight performance to the same level as Wolf0.

v3.0.6 should be the best release yet, the most algos, the highest performance
and the widest HW support.

It's baking in the oven.

This out turned out well. I'm running out of excused to delay windows support.

https://drive.google.com/file/d/0B0lVSGQYLJIZSmFXUnZrdDFkTjg/view?usp=sharing

Edit: It has been observed that better performance is achieved when using the CPU name
instead of native for the arch argument. It doesn't seem to work for more recent CPU,
the only accepte dvalue is corei7-avx, rather than haswell, sandybridge etc.
On older CPUs such as the core2 it may make a difference. YMMV
legendary
Activity: 1470
Merit: 1114
I found my hash!

X11 back up to peak, it has been down a bit for as couple of releases but it's back.
I also restored cryptonight performance to the same level as Wolf0.

v3.0.6 should be the best release yet, the most algos, the highest performance
and the widest HW support.

It's baking in the oven.
legendary
Activity: 1470
Merit: 1114
Thanks Joblo, I'll test release 3.0.5 later today :-)

I'm noticing cryptonight is about 10% slower than Wolf0's build. Let me know if you can
confirm it. I only made two significant changes that shouldn't have had such an impact.

Edit: never mind I found it. Will wait a while before a new release in case something else
pops up.
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
Thanks Joblo, I'll test release 3.0.5 later today :-)
legendary
Activity: 1470
Merit: 1114
I am about to drop support for x86_64 target for two reasons. There is only one generation
of Intel 64 bit CPU it applies to no one has expressed an interest in it.

The x86_64 kernels will live on in cpuminer-multi and I will maintain the x86_64 kernels
ifthey are the best available for a particular algo.. Wil like be dropped in he next release.
legendary
Activity: 1470
Merit: 1114
Tossed away everything I had done with cryptonight and started form scratch.
It took 2 hours to get my first share 180 KH/s.

V3.0.4 coming right up.

Done



Edit: Thanks Wolf0

Edit: I may have overstated the hash rate, more like 170.

I think the addition of cryptonight broke the core2 compile. I've disabled the download
link for now.

Edit: v3.0.5

https://drive.google.com/file/d/0B0lVSGQYLJIZTXdKVHNpdGRuTW8/view?usp=sharing
legendary
Activity: 1470
Merit: 1114
Tossed away everything I had done with cryptonight and started form scratch.
It took 2 hours to get my first share 180 KH/s.

V3.0.4 coming right up.

Done

Get v3.0.5

Edit: Thanks Wolf0

Edit: I may have overstated the hash rate, more like 170.
legendary
Activity: 1470
Merit: 1114
Honestly I don't think it makes any noticeable difference.
Initialisation is a very little part of the whole hash computation.

I found a measurable difference (<1%) taking the initialisation out of the loop especially the longer
chains with many ctxs to manage. I expect this to have even less. Low priority, I've got better things
to work on.
legendary
Activity: 1470
Merit: 1114
InteI GPU mining with cgminer.

I got the source compiled and it too only does blake256. I tried it and got all rejects
on blake but it works on blakecoin, woopee. But wait, it's not mining with the IGPU
but with my Nvidia GPUs. Crap.

A device query doesn't find the Intel IGPU.

Thoughts?
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
Honestly I don't think it makes any noticeable difference.
Initialisation is a very little part of the whole hash computation.
legendary
Activity: 1470
Merit: 1114
I owuld open discussion  with devs and dev wannabes regarding initializing structs.

It may be trivial but considering my inexperience with c/c++ I'm not clear and
a couple of points.

initializing of contexts is a frequent thing in cpuminer and is done by assigning
each field one by one, very inefficient.

There are some shortcuts where a copy if an initialized contex is saved and used
to reinitialze using memcpy, much more efficient.

However I am considering whether assigning a nul version of the struct would be
better.

Ex:

typedef struct {blah, blah,blah} ctx_t;
// init null_ctx at compile time
const ctx_t null_ctx = { null_blah, null_blah, null_blah };
// init my_ctx at run time
ctx_t my_ctx = null_ctx;

Am I on the right track? It might not beat memset but it can handle
non-zero null fields.
legendary
Activity: 1470
Merit: 1114
Time for another story.

The company I worked for produced lage scale mission critical control systems
to institutional customers. They had an emergency response team that provided
24 hour service with a guarantee the customer was talking to an engineer within
5 minutes of the initial call.

The training for this team was interesting and included getting a call from a senior
engineer posing as a cleaner.

He would call and report that hes just a cleaner working near the computer room
and heard alarms. He went into the room and saw a poster with the phone number
to call in case of an emergency.

You can imagine how it went from there. It even involved getting the cleaner to change
circuit packs. One particular trainer would exploit any ambiguity or imprecision in the
instructions to deliverately do the wrong thing. Lots of fun.
legendary
Activity: 1470
Merit: 1114
Jump to: