Author

Topic: [ANN]: cpuminer-opt v3.8.8.1, open source optimized multi-algo CPU miner - page 197. (Read 444067 times)

legendary
Activity: 1470
Merit: 1114
Progress update

I've worked through some of the windows compile errors I was getting frustrated sochanged gears
and started working on sse2 support.

sse2 qubit works and will be included in the first release.

I can get sse2 working on one algo that uses groestl at a time. If I include the sse2 groestl files in two algo
at the same time I get multiple definition linker errors. The included files are full of macros so that kind of
explains it. I may try turning them into functions so the files that include them don't pull the coded into
themselves. If that works it will probably have a performance impact, hopefully not too big. I

This affects all the x algos and quark.

I'm going to change my approach slightly. Instead of cloning the macros into a function I'll just wrap
them in a function. I'm more optimistic about the perrformance impact of this. There will be a cost
in the overhead of the function call/return but this design opens up other optimization opportunities.
It remains to be seen what the net effect will be. If there is a net gain this change could be applied
to other macro based sse2 sub-algos, which are still used by the aes_ni kernels ultimately speeding
up aes_ni kernels as well.

These changes are what caused the compile problem above, hopefuly the'l magically disappear.
it's exciting times for a cpu miner.

Well the compoile errors didn't magically disappear. It's still a better design so I'll keep it.

I just declared and defined wrappers in the same files where the macros were defined, eliminating the need
for grso-nomacro files. And the wrappers do nothing else but call the macros. When I can get it to compile
it should work.
legendary
Activity: 1470
Merit: 1114
Progress update

I've worked through some of the windows compile errors I was getting frustrated sochanged gears
and started working on sse2 support.

sse2 qubit works and will be included in the first release.

I can get sse2 working on one algo that uses groestl at a time. If I include the sse2 groestl files in two algo
at the same time I get multiple definition linker errors. The included files are full of macros so that kind of
explains it. I may try turning them into functions so the files that include them don't pull the coded into
themselves. If that works it will probably have a performance impact, hopefully not too big. I

This affects all the x algos and quark.

I'm going to change my approach slightly. Instead of cloning the macros into a function I'll just wrap
them in a function. I'm more optimistic about the perrformance impact of this. There will be a cost
in the overhead of the function call/return but this design opens up other optimization opportunities.
It remains to be seen what the net effect will be. If there is a net gain this change could be applied
to other macro based sse2 sub-algos, which are still used by the aes_ni kernels ultimately speeding
up aes_ni kernels as well.

These changes are what caused the compile problem above, hopefuly the'l magically disappear.
it's exciting times for a cpu miner.
legendary
Activity: 1470
Merit: 1114
I have a tricky compile error that I'm sure exerienced c coders can easilly solve but has
me stumped.

The functions involved are declared in grso-nomacro.h and defined in grso-nomacro.c.
They are called by x11_sse2.c.

Data follows.

Code:
algo/x11_sse2.c:124:36: error: expected expression before ‘BitSequence’
         update_grso( &ctx.groestl, BitSequence* hashbuf, (const BitSequence*)hash, 512 );
                                    ^
algo/x11_sse2.c:124:36: error: too few arguments to function ‘update_grso’
In file included from algo/x11_sse2.c:35:0:
algo/sse2/groestl/grso-nomacro/grso-nomacro.h:10:6: note: declared here
 void update_grso ( grsoState* sts_grs, BitSequence_gr* hashbuf, const BitSequence_gr* hash, DataLength_gr databitlen );
      ^
algo/x11_sse2.c:125:35: error: expected expression before ‘BitSequence’
         final_grso( &ctx.groestl,  BitSequence* hashbuf,(const BitSequence*)hash );
                                   ^
algo/x11_sse2.c:125:35: error: too few arguments to function ‘final_grso’
In file included from algo/x11_sse2.c:35:0:
algo/sse2/groestl/grso-nomacro/grso-nomacro.h:12:6: note: declared here
 void final_grso ( grsoState* sts_grs, BitSequence_gr* hashbuf, const BitSequence_gr* hash );
      ^

grso-nomacro.h

Code:
typedef unsigned char      BitSequence_gr;
typedef unsigned long long DataLength_gr;

void init_grso ( grsoState* sts_grs );

void update_grso ( grsoState* sts_grs, BitSequence_gr* hashbuf, const BitSequence_gr* hash, DataLength_gr databitlen );

void final_grso ( grsoState* sts_grs, BitSequence_gr* hashbuf, const BitSequence_gr* hash );


grso-nomacro.c

Code:
void update_grso ( grsoState* sts_grs, BitSequence_gr* hashbuf, const BitSequence_gr* hash, DataLength_gr databitlen )
{ /* function code */ }
void final_grso ( grsoState* sts_grs, BitSequence_gr* hashbuf, const BitSequence_gr* hash )
{/* function code */ }

x11_sse2_c

Code:
        update_grso( &ctx.groestl, BitSequence* hashbuf, (const BitSequence*)hash, 512 );
        final_grso( &ctx.groestl,  BitSequence* hashbuf,(const BitSequence*)hash );
legendary
Activity: 1470
Merit: 1114

I spent the last few hours working on getting this compiled. I'm using a VM within windows 7.
I tried 4 different flavors of Linux and either it doesn't like Linux Mint 32/64 bit or Ubuntu 64 or
I'm missing some dependents it needs to compile correctly & never makes the cpumimer executable.
What OS and version are you using? I'll give it another go once I get that info from you.
My Linux skills are pretty rusty "never was all that great  to stat with" but I'm fairly sure I was doing everything correctly.

I also use Fedora 20 x64. It has to be 64 bit but other than that Mint should work. A VM is not the issue I build
windows in a VM.

You'll need libcurl-devel and some form of ssl development package (F20 has openssl-devel).
That's about all I can offer without more info. Make sure you follow the instruction in the README.md.

Code:
./autogen.sh # only needed if building from git repo
./configure CFLAGS="-O3 -march=native" --with-crypto --with-curl
make

Take care, -O3 is an upper case O not a zero.

If you can compile cpuminer-multi-1.1 you should be able to compile cpuminer-1.9-RC. I you can't compile
either it's something at your end.

Post your error messages for more help.

member
Activity: 72
Merit: 10
Progress update

I've worked through some of the windows compile errors I was getting frustrated sochanged gears
and started working on sse2 support.

sse2 qubit works and will be included in the first release.

I can get sse2 working on one algo that uses groestl at a time. If I include the sse2 groestl files in two algo
at the same time I get multiple definition linker errors. The included files are full of macros so that kind of
explains it. I may try turning them into functions so the files that include them don't pull the coded into
themselves. If that works it will probably have a performance impact, hopefully not too big. I

This affects all the x algos and quark.

Here are some updated hash rates from my i7-4790K 4 GHZ showing the sse2 performance.
This shows the difference between the aes_ni optimized kernels and sse2 on the same cpu.
Actually running it on an older cpu will probably have even lower performance.
I can't test sse2 on a real sse2 limited cpu because my core2 pc runs windows.

                   aes_ni   sse2     sse2/aes_ni
x11              707k      529      .75
x13              320        
x14
x15              280
quark         1080      907        .84
qubit          1045      755        .72
 

I spent the last few hours working on getting this compiled. I'm using a VM within windows 7.
I tried 4 different flavors of Linux and either it doesn't like Linux Mint 32/64 bit or Ubuntu 64 or
I'm missing some dependents it needs to compile correctly & never makes the cpumimer executable.
What OS and version are you using? I'll give it another go once I get that info from you.
My Linux skills are pretty rusty "never was all that great  to stat with" but I'm fairly sure I was doing everything correctly.

Hey,

I used Fedora release 20 64bit with GCC v4.8.3 to compile and it worked fine. The release version and the GCC version is outdate
but hey it works. (:
full member
Activity: 231
Merit: 150
Progress update

I've worked through some of the windows compile errors I was getting frustrated sochanged gears
and started working on sse2 support.

sse2 qubit works and will be included in the first release.

I can get sse2 working on one algo that uses groestl at a time. If I include the sse2 groestl files in two algo
at the same time I get multiple definition linker errors. The included files are full of macros so that kind of
explains it. I may try turning them into functions so the files that include them don't pull the coded into
themselves. If that works it will probably have a performance impact, hopefully not too big. I

This affects all the x algos and quark.

Here are some updated hash rates from my i7-4790K 4 GHZ showing the sse2 performance.
This shows the difference between the aes_ni optimized kernels and sse2 on the same cpu.
Actually running it on an older cpu will probably have even lower performance.
I can't test sse2 on a real sse2 limited cpu because my core2 pc runs windows.

                   aes_ni   sse2     sse2/aes_ni
x11              707k      529      .75
x13              320        
x14
x15              280
quark         1080      907        .84
qubit          1045      755        .72
 

I spent the last few hours working on getting this compiled. I'm using a VM within windows 7.
I tried 4 different flavors of Linux and either it doesn't like Linux Mint 32/64 bit or Ubuntu 64 or
I'm missing some dependents it needs to compile correctly & never makes the cpumimer executable.
What OS and version are you using? I'll give it another go once I get that info from you.
My Linux skills are pretty rusty "never was all that great  to stat with" but I'm fairly sure I was doing everything correctly.
legendary
Activity: 1470
Merit: 1114
Progress update

I've worked through some of the windows compile errors I was getting frustrated sochanged gears
and started working on sse2 support.

sse2 qubit works and will be included in the first release.

I can get sse2 working on one algo that uses groestl at a time. If I include the sse2 groestl files in two algo
at the same time I get multiple definition linker errors. The included files are full of macros so that kind of
explains it. I may try turning them into functions so the files that include them don't pull the coded into
themselves. If that works it will probably have a performance impact, hopefully not too big. I

This affects all the x algos and quark.

Here are some updated hash rates from my i7-4790K 4 GHZ showing the sse2 performance.
This shows the difference between the aes_ni optimized kernels and sse2 on the same cpu.
Actually running it on an older cpu will probably have even lower performance.
I can't test sse2 on a real sse2 limited cpu because my core2 pc runs windows.

                   aes_ni   sse2     sse2/aes_ni
x11              707k      529      .75
x13              320       
x14
x15              280
quark         1080      907        .84
qubit          1045      755        .72
 
legendary
Activity: 1470
Merit: 1114
It seems my very first decision on this project was the wrong one. I forked from ccminer-multi-1.2pre
instead of 1.1. I also failed to confirm 1.2pre would compile in windows.

Now I'm in a bind and it will delay windows support.

Hopefully the conflicts are only in files I haven't touched, if so I should be able to release with windows
support without too much delay. Otherwise I will go ahead and release with only linux support.

you are doing a superb job as it stands joblo ...

windows support ( as much as i dont like the os ) is important and can be sorted ...

the main thing is to get the core working and stable ... the port can be done shortly after ...

keep up the good work mate ...

#crysx

Thanks for the encouragement.
legendary
Activity: 1470
Merit: 1114
Hi joblo,

Currently testing the cpuminer on an Intel(R) Xeon(R) CPU E3-1230 V2 @ 3.30GHz ..
Running using Fedora release 20 64bit .. a bit outdated as I'm too lazy to upgrade it as there are a lot of things running on it..
So far it is doing quite well but I keep getting "Result does not validate on CPU!" ..

Tested with x11, Quark and Qubit algos.

Will keep you updated as each test takes a while to get any accepted..

theLosers106.


The invalid nonces also occur in the parent applications so I'm not too worried about them.
I included these messages only in the RC build just to make sure they don't get out of control.
It does represent wasted hash (no one likes to waste hash) but the hash rate at the pool seems
in line with the client. It's hard to tell with share submits so infrequent due to the low hash rate.
I've observed the pool reports varies between -50% and 200% of the stable rate in cpuminer. To me
this seems in line.

I recommend you read the realease-notes and the info displayed on startup. It might answer
some other questions that may arise.
legendary
Activity: 1470
Merit: 1114
It seems my very first decision on this project was the wrong one. I forked from ccminer-multi-1.2pre
instead of 1.1. I also failed to confirm 1.2pre would compile in windows.

Now I'm in a bind and it will delay windows support.

Hopefully the conflicts are only in files I haven't touched, if so I should be able to release with windows
support without too much delay. Otherwise I will go ahead and release with only linux support.

Fortunately it was another noob mistake, tried to compile debug win32. Back on track.
member
Activity: 72
Merit: 10
Hi joblo,

Currently testing the cpuminer on an Intel(R) Xeon(R) CPU E3-1230 V2 @ 3.30GHz ..
Running using Fedora release 20 64bit .. a bit outdated as I'm too lazy to upgrade it as there are a lot of things running on it..
So far it is doing quite well but I keep getting "Result does not validate on CPU!" ..

EDIT2: Tested with x11, X13, X15, Quark and Qubit algos.

Will keep you updated as each test takes a while to get any accepted..

theLosers106.

legendary
Activity: 2912
Merit: 1091
--- ChainWorks Industries ---
It seems my very first decision on this project was the wrong one. I forked from ccminer-multi-1.2pre
instead of 1.1. I also failed to confirm 1.2pre would compile in windows.

Now I'm in a bind and it will delay windows support.

Hopefully the conflicts are only in files I haven't touched, if so I should be able to release with windows
support without too much delay. Otherwise I will go ahead and release with only linux support.

you are doing a superb job as it stands joblo ...

windows support ( as much as i dont like the os ) is important and can be sorted ...

the main thing is to get the core working and stable ... the port can be done shortly after ...

keep up the good work mate ...

#crysx
legendary
Activity: 1470
Merit: 1114
It seems my very first decision on this project was the wrong one. I forked from ccminer-multi-1.2pre
instead of 1.1. I also failed to confirm 1.2pre would compile in windows.

Now I'm in a bind and it will delay windows support.

Hopefully the conflicts are only in files I haven't touched, if so I should be able to release with windows
support without too much delay. Otherwise I will go ahead and release with only linux support.
legendary
Activity: 1470
Merit: 1114
I'm still somewhat puzzled about
the different performance profile for your neoscrypt ccminer kernel vs DJM34's. Yours works better on Maxwell
but DJM34's hashes 12% faster than your on my 780ti. Considering the nature of neoscrypt it's likely the HW
configuration is the reason for the difference, but I can't help to think think that understanding why this performance
reversal happened. I did some analysis when your kernel was released, and tried mixing up parts of each kernel
to try to affect performance but had no success. As someone who is intimately familiar with both versions I
was hoping to jog your mind and maybe spark an idea for further optimization.

It's because DJM34's version has a seperate kernal for compute 3.5 devices. It uses memshift varable of 4 since a cacheline on the 780ti is bigger than on the maxwell's.

Bigger cache line and a memory intensive algo. Makes sense.

Pallas suggested moving this to your thread but discussions about kepler aren't really on topic for SP_MOD..
Well it's my thread, I don't think there is much more to discuss but if it takes off we can start it's own thread.

I'm satisfied with this explanation so the only solution^h^h^h^h^h workaround is a form of hybrid.
I have built such a hybrid but it is bloated because of a 2 dimensional growth in neoscrypt code.
Because I make the kernel selection at run time both versions of neoscrypt get buillt into all three
versions of cuda. Only the 3.5 cuda will select DJM34 neo and the maxwell code will only select the Pallas neo.
I'd like to move the check to compile time but haven't been motivated enough to implement it without
a ccminer fork willing to host it.

If there is interest I can take another look once things settle down a bit with cpuminer.
sp_
legendary
Activity: 2926
Merit: 1087
Team Black developer
I'm still somewhat puzzled about
the different performance profile for your neoscrypt ccminer kernel vs DJM34's. Yours works better on Maxwell
but DJM34's hashes 12% faster than your on my 780ti. Considering the nature of neoscrypt it's likely the HW
configuration is the reason for the difference, but I can't help to think think that understanding why this performance
reversal happened. I did some analysis when your kernel was released, and tried mixing up parts of each kernel
to try to affect performance but had no success. As someone who is intimately familiar with both versions I
was hoping to jog your mind and maybe spark an idea for further optimization.

It's because DJM34's version has a seperate kernal for compute 3.5 devices. It uses memshift varable of 4 since a cacheline on the 780ti is bigger than on the maxwell's.
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
I can test on linux with recent amd cpu and older intel without aes-ni.

Thanks Pallas. The first release won't support CPUs without AES_NI and will just exit. However, for testing
For the test build it wil still try to run. My only goal is to look for false positives and false negatives that
indicate the AES_NI check isn't 100% accurate. I'm also curious to see what will happen if an older CPU
tries to run AES_NI code.

I've got all that covered for the test cycle but I'll keep your offer in mind for the second release which will
support non-AES_NI CPUs.


by "recent amd cpu" I mean one that supports aes-ni, an FX processor.

Also thanks for all your work, I'm sure there's much more than I'm aware of.

I'm still somewhat puzzled about
the different performance profile for your neoscrypt ccminer kernel vs DJM34's. Yours works better on Maxwell
but DJM34's hashes 12% faster than your on my 780ti. Considering the nature of neoscrypt it's likely the HW
configuration is the reason for the difference, but I can't help to think think that understanding why this performance
reversal happened. I did some analysis when your kernel was released, and tried mixing up parts of each kernel
to try to affect performance but had no success. As someone who is intimately familiar with both versions I
was hoping to jog your mind and maybe spark an idea for further optimization.

let's move this to the ccminer thread
legendary
Activity: 1470
Merit: 1114
I can test on linux with recent amd cpu and older intel without aes-ni.

Thanks Pallas. The first release won't support CPUs without AES_NI and will just exit. However, for testing
For the test build it wil still try to run. My only goal is to look for false positives and false negatives that
indicate the AES_NI check isn't 100% accurate. I'm also curious to see what will happen if an older CPU
tries to run AES_NI code.

I've got all that covered for the test cycle but I'll keep your offer in mind for the second release which will
support non-AES_NI CPUs.

Also thanks for all your work, I'm sure there's much more than I'm aware of.

I'm still somewhat puzzled about
the different performance profile for your neoscrypt ccminer kernel vs DJM34's. Yours works better on Maxwell
but DJM34's hashes 12% faster than your on my 780ti. Considering the nature of neoscrypt it's likely the HW
configuration is the reason for the difference, but I can't help to think think that understanding why this performance
reversal happened. I did some analysis when your kernel was released, and tried mixing up parts of each kernel
to try to affect performance but had no success. As someone who is intimately familiar with both versions I
was hoping to jog your mind and maybe spark an idea for further optimization.
hero member
Activity: 672
Merit: 500
Hello Joblo, I've seen on the other thread you have asked for info about AES-NI on AMD.
AMD might be hard buy but at least they have a somewhat coherent feature set. AES-NI is available on all recent processors AFAIK including low power Kabini as well as some low-power options are half a decade old. In general, every chip above 30 bucks has it including those in SoHo NAS. Thank you Intel for market segmentation!

Is people CPU mining qubit those days? A radeon 7750 will do twice the qubit hash at 10% less power.
By contrast the x11 performance looks cool.
legendary
Activity: 2912
Merit: 1091
--- ChainWorks Industries ---
im here and commenting so that i get updates on the thread Wink ...

great work mate ...

#crysx
legendary
Activity: 2716
Merit: 1094
Black Belt Developer
I can test on linux with recent amd cpu and older intel without aes-ni.
Jump to: