Author

Topic: [ANN]: cpuminer-opt v3.8.8.1, open source optimized multi-algo CPU miner - page 196. (Read 444131 times)

legendary
Activity: 1470
Merit: 1114
Thanks for the explanations, i plan indeed to include argon2 algo in a near future. but first i need to tag the 1.2 (which is my linux branch with the --show-diff stuff)

Then i need to rebase my multi pool branch in the next version.

Sorry if i was a bit "direct" in my answer. Ive read your posts this week and didnt see any source until today Wink

You are welcome on the project. I already know (at least a part) of these SSE implementations, seen in Ig0tik projects... But for me it was not stable enough to be imported. I want to keep at least VStudio and MinGW compatibility, im not focused on linux or windows unlike what sp say for ccminer, i want it to works correctly on both Wink

IMO, SSE or AVX implementation should only be imported if there is more than 20% improvement, and if the code doesnt break windows or linux support. (can be hard with VStudio not supporting ASM in x64 mode) but they have low level instructions for AVX (seen in nicehash axiom project)

I'm reluctant to distribute binaries but will do it as there is demand, if I ever get Windows to compile.

Igotid did the sse2 optimizations? Thanks I'll add him to the credits.

I think this will work out just as I envisioned and am glad to get your support. I respect your opinion about importing
lower quality optimizations. I'll stay focussed on speed while you can continue with your approach. our fork can
coexist nicely.

Don't worry about being direct. Some times is the best or only way to get the message accross. I can also be direct
at times if I think someone is being closed minded. I'm ok with it as long as it doesn't get personal. As heated as some
of the discussion got on the other thread I was still ok with most of it. It came close to the line but never crossed it IMO.

I'll stay out of the base code for now and let you finish with 1.2. I'll go ahead and merge cryptonight and neoscrypt
and release it. Then I'll do argon2 for my next release. When you get some other alt alt algos in working condition
I can start integrating the optimized kernels.

I'll probably be using yiimp for testing them but I don't have any wallets. I could mine to your wallet address and you
can keep whatever is produced.

I'm looking forward to keeping the lights on for cpu mining.

Happy new year and happy mining.
legendary
Activity: 1484
Merit: 1082
ccminer/cpuminer developer
Thanks for the explanations, i plan indeed to include argon2 algo in a near future. but first i need to tag the 1.2 (which is my linux branch with the --show-diff stuff)

Then i need to rebase my multi pool branch in the next version.

Sorry if i was a bit "direct" in my answer. Ive read your posts this week and didnt see any source until today Wink

You are welcome on the project. I already know (at least a part) of these SSE implementations, seen in Ig0tik projects... But for me it was not stable enough to be imported. I want to keep at least VStudio and MinGW compatibility, im not focused on linux or windows unlike what sp say for ccminer, i want it to works correctly on both Wink

IMO, SSE or AVX implementation should only be imported if there is more than 20% improvement, and if the code doesnt break windows or linux support. (can be hard with VStudio not supporting ASM in x64 mode) but they have low level instructions for AVX (seen in nicehash axiom project)

your way to copy the scan_hash functions is correct, we can tune the -march flag then in the Makefile.am (like what is made in ccminer) Its the same in vstudio, an unit can be optimized specifically for AVX

just... check my linux branch, i made a big change in the way the work structure is passed to these functions. It was already required by some algos (like ZR5), and this method seems better for the future
legendary
Activity: 1470
Merit: 1114
Do you plan to release one for Cryptonight algorithm?

cpuminer already has cryptonight built in but I haven't looked at it yet. it's the code as tpruvot's version.
I wanted to get the first version out quickly to implement the available optimizations.

Give cryptonight it a try and let me know how it goes.

I did the original CN code - I kinda doubt it's getting much faster...

Open or private?

Both. It's open now.

Where can I get it? I've already given you credits in the startup header.

My github... I think. Let me look.

Yup: https://github.com/wolf9466/cpuminer-multi

Looks good based on the file name (aesni in it). I'll investigate and if faster and with your
permission I'll add it to my fork, with credit.

I don't have any wallets for this algo so I can mine to your adress while testing. It probaby
won't be a lot but I'll let it run for a bit.

I'm pretty tied up right now, more hash coming for the big 4 and I'm still trying to workaround
macros causing multiple def errors on link. I think I'm getting there, I converted them all to inline
functions, cloned them and gave the clones unique names. I can't help thinking there is a better
way because of my inexperience withc/c++.

I don't know if an inline function is as fast as a macro, it should be, all the work is at compile time.

Which leads me to a question. Using functions means I have to access the context struct members using
'->' instead of '.'.  This says to me there is an added level of indirection. Does this also apply to inline
functions or is the compiler smart enough to optimize it out and code it like it was a ',' as it would be
if the code was truly inline?

It will be interesting to see if there is a penalty in converting to inliners as well as dealing with struct*
instead of struct. On the bright side splitting everything up into the standard init, update & close functions
allows me to take the init out of the loop.

Edit: I'm stumped about these multi defs. I must be missing something fundamental. I think I'll drop it for
now. I think I can get a little more out of aesni. After I release that I'll take a look at your cryptonight.
It looks like it's using some slower sub-algos so I should be able to improve on it.


Compiler errors with Groestl? Yeah, you're gonna kick yourself when you figure out why. I had to take a bit more than a cursory look at that optimized Groestl before I saw it.

You're teasing me. Are you suggesting there's a bug in the macros?

I'm suggesting you're misusing the implementation such that there's symbols defined in multiple places! Cheesy

Uh, yeah, the compiler told me that already. my preoblem is my experience is on a comletely different kind of
system with a different panguage, OS and HW architecture. And my understanding of the basic concepts is
based on that implementation. It's like when someone speaks in a foreign language, they still think in their native
language and have to translate in their mind in real time. That would be the third level of competence.

This project is getting more exciting every day.

Oh crap I'm starting to sound like Tom Cruise on scientology.

[rant]
PS Going Clear is a great movie. I was approached by scientology back in my mid teens. I had very little money at the
time and since the're all about money nothing came of it. I took their personality test and it seemed pretty acurate.
Then came the pitch. They had courses available that could help me with the deficiencies in my personality. They were
new and didn't have the machine yet. The courses they offered cost hundreds of dollars, something no 15 YO has
unless the're selling drugs. I wa initially inteague by thewoird science in their name. It soon became clear (pun intended)
that the science angle was just a smoke screen for another scam artist using religion to get rich. I was already questioning
my religion at the time and I think it help convert me to atheism. Not atheism as a religion, which is an oxymoron,
but truly the lack of any supreme creator. Religionns cause wars and stifle progress. If we hadn't woken up from the
dark ages the world would still be flat and we'd still be using abacusses to count.
[/rant
member
Activity: 81
Merit: 1002
It was only the wind.
Do you plan to release one for Cryptonight algorithm?

cpuminer already has cryptonight built in but I haven't looked at it yet. it's the code as tpruvot's version.
I wanted to get the first version out quickly to implement the available optimizations.

Give cryptonight it a try and let me know how it goes.

I did the original CN code - I kinda doubt it's getting much faster...

Open or private?

Both. It's open now.

Where can I get it? I've already given you credits in the startup header.

My github... I think. Let me look.

Yup: https://github.com/wolf9466/cpuminer-multi

Looks good based on the file name (aesni in it). I'll investigate and if faster and with your
permission I'll add it to my fork, with credit.

I don't have any wallets for this algo so I can mine to your adress while testing. It probaby
won't be a lot but I'll let it run for a bit.

I'm pretty tied up right now, more hash coming for the big 4 and I'm still trying to workaround
macros causing multiple def errors on link. I think I'm getting there, I converted them all to inline
functions, cloned them and gave the clones unique names. I can't help thinking there is a better
way because of my inexperience withc/c++.

I don't know if an inline function is as fast as a macro, it should be, all the work is at compile time.

Which leads me to a question. Using functions means I have to access the context struct members using
'->' instead of '.'.  This says to me there is an added level of indirection. Does this also apply to inline
functions or is the compiler smart enough to optimize it out and code it like it was a ',' as it would be
if the code was truly inline?

It will be interesting to see if there is a penalty in converting to inliners as well as dealing with struct*
instead of struct. On the bright side splitting everything up into the standard init, update & close functions
allows me to take the init out of the loop.

Edit: I'm stumped about these multi defs. I must be missing something fundamental. I think I'll drop it for
now. I think I can get a little more out of aesni. After I release that I'll take a look at your cryptonight.
It looks like it's using some slower sub-algos so I should be able to improve on it.


Compiler errors with Groestl? Yeah, you're gonna kick yourself when you figure out why. I had to take a bit more than a cursory look at that optimized Groestl before I saw it.
legendary
Activity: 1470
Merit: 1114
don't know what you are trying to do... but im still the most active dev on the project... which is not "deprecated" or "obsolete" as you are saying

CPU specific optimizations are free to be made by all. but its specific and not the goal of my fork which is meant to works on most platforms (including arm)

Where are your commits ? common... Are you just tweeking one algo for a special cpu/os and claiming you made the project (like sp) ?

There is still a lot to do on this project, like Wolf said, to handle at runtime specific variants of the algos (AVX, SSE, AES) and not at compile time which require a lot of binaries

edit: oh sorry, didnt see you put the sources, ok its a good start... i see so often infected binaries on the cloud Wink

but advice, change the project name, original cpuminer is pooler's project and their version number is already 2.4.2 -> https://github.com/pooler/cpuminer

Hi Tanguy.

Thanks for your input, as the precedessor of my fork I have a lot of respect for your opinions on the matter. Also in respect
to Pooler I'll remame it and bump the lelease number to 3.0 to avoid overlap.

I'd like to think of our forks coexisting where I would focus on the older algos while you would continue with the emerging
algos and infrastructure improvements. I might do some of that also. I'm targetting lyra2v2 as the next most important
algo to work on. I might take a look at argon2 also, it also has its roots in cpuminer-multi and pooler design so it should
be easy to integrate. It will also gove me some practive adding a ne algo the optiopns list. I'll keep you informed of what
I'm up to so we don't trip over each other. I love that the modules are almost plug and play.

I fear I've already introduced some bloat as I have different files for each variant of each algo. There is a lot of cloned code
in each file but my problems with multiple definitions at link time drove me do do that. Once I understand what the hell
I'm doing wrong I can merge the common code and shrink the code size significantly. I also like the idea of a single binary
does that does all the cpu identification and kernel selection transparently. Maybe some of it can be applied to ccminer also.

I'm not intentionally tweaking for specific CPUs, I want to maintain full backward compatibility but some of my optimizations
may help on some CPUs and hurt on others. Things like inlining are affected by cache size. Even with my CPU some hash
functions were faster inlined while others weren't. Quark for example is faster if the hash function and alomst everything below
it is inlined while x11 is faster if the hash function is not inlined, but everything below it is. I'm trying to work my way
from the bottom inlining and testing and i stop inlining when the inlined code segment gets too large and cache performance
starts to suffer. I'm almost done that.

It's funny you mentioned SP. While  doing some tinkering with static and inlining I was wondering if that's what SP was
doing. Maybe I am but I'm not asking for donations, just giving back to the community in a non-monetary way. I want to
cooperate with you and the other developpers and avoid the sniping that goes on in the other thread.

I didn't mean to offend with my comments about CPU mining being obsolete, I'm just trying to be realistic and manage
my expectations. I thought CPU mining was obsolete before I discovered cpuminer-multi and got interested again.
Now I'm excited by what I have been able to do in the past coule of weeks. A good morale boost with mining revenue
so low lately.

I hear you about the name. I could stick with cpuminer-multi if you're ok with that or come up with a new variant.

Thanks again for your hard work with so many projects and I look forward to maybe graduating to ccminer some day.
legendary
Activity: 1484
Merit: 1082
ccminer/cpuminer developer
don't know what you are trying to do... but im still the most active dev on the project... which is not "deprecated" or "obsolete" as you are saying

CPU specific optimizations are free to be made by all. but its specific and not the goal of my fork which is meant to works on most platforms (including arm)

Where are your commits ? common... Are you just tweeking one algo for a special cpu/os and claiming you made the project (like sp) ?

There is still a lot to do on this project, like Wolf said, to handle at runtime specific variants of the algos (AVX, SSE, AES) and not at compile time which require a lot of binaries

edit: oh sorry, didnt see you put the sources, ok its a good start... i see so often infected binaries on the cloud Wink

but advice, change the project name, original cpuminer is pooler's project and their version number is already 2.4.2 -> https://github.com/pooler/cpuminer
member
Activity: 81
Merit: 1002
It was only the wind.
Do you plan to release one for Cryptonight algorithm?

cpuminer already has cryptonight built in but I haven't looked at it yet. it's the code as tpruvot's version.
I wanted to get the first version out quickly to implement the available optimizations.

Give cryptonight it a try and let me know how it goes.

I did the original CN code - I kinda doubt it's getting much faster...

Open or private?

Both. It's open now.

Where can I get it? I've already given you credits in the startup header.

My github... I think. Let me look.

Yup: https://github.com/wolf9466/cpuminer-multi
legendary
Activity: 1470
Merit: 1114
The rapid deveopment is likely to slow down so here are some things I'm planning to
do after I get through with cryptonight and neoscrypt.

Qubit sse2 could be improved if I could get the sse2 optimized luffa to work. It works on
x* but not qubit. No shares ever submitted.

Quark aesni might be improved if I could remove init from the scanhash loop. Groestl runs
twice on quark and I suspect that's the reason it doesn't work.

lyrav2 produces 90% rejects and I'd like to dig into that. It submits some valid shares so it
must be doing something right. That's going to require digging into the guts of the algo which
is going to be a challenge for me.

Some of you may think I'm crazy doing all this for CPU mining which most people consider
obsolete. Other than the fact that I have the time and enjoy doing this kind of work I'm
learning a hell of a lot about crypto.  I think I've risen to the second level of competence.*

I'm hoping the knowledge I gain will eventually be applicable to cuda, were the real action
is. When things slow down here I'll take a run at ccminer. Look out SP, here I come.

* the 4 levels of competence, origin unknown

1. unconcious incompetence, you don't know what you don't know

2. concious incompetence, you know enough to know you don't know much

3. concious competence, you are competent if you think about it

4. unconcious competence, you are competent without thinking about it
legendary
Activity: 1470
Merit: 1114
Things have been moving fast. I was preparing to release a new version withmore hash
for aesni cpus in quark, qubit and x*. Increases are generally in the 1% range. However,
some new opportunities came up.

Wolf0, thanks Wolf, has offered up his cryptonight miner which has aesni optimizations.
I know someone has been asking about cryptonight.

I also found a neoscrypt miner laying around on my hard drive that is faster than the current one.
It appears to be by John Doering.

Both use the same cpuminer-multi architecture so should be easy to integrate. In the meanine
here are the latest numbers...

sh rates with Intel Core i7-4790K, 4 GHz, 8 threads.

                     quark        qubit       x11     x13     x15     neoscrypt
                     -------         ------       -----     -----     -----     --------------
2.0.1 aesni   1096kh/s   1046      722      344     291         -
2.0 aesni      1080          1041      707      338     290         -
sse2               906            765        -           -          -          30
x86_64           557            427       266      179    165         -
predecessor   904            427       684      179    165        35*

* not yet integrated
member
Activity: 81
Merit: 1002
It was only the wind.
Do you plan to release one for Cryptonight algorithm?

cpuminer already has cryptonight built in but I haven't looked at it yet. it's the code as tpruvot's version.
I wanted to get the first version out quickly to implement the available optimizations.

Give cryptonight it a try and let me know how it goes.

I did the original CN code - I kinda doubt it's getting much faster...

Open or private?

Both. It's open now.
legendary
Activity: 1470
Merit: 1114
Do you plan to release one for Cryptonight algorithm?

cpuminer already has cryptonight built in but I haven't looked at it yet. it's the code as tpruvot's version.
I wanted to get the first version out quickly to implement the available optimizations.

Give cryptonight it a try and let me know how it goes.

I did the original CN code - I kinda doubt it's getting much faster...

Open or private?

Both. It's open now.

Where can I get it? I've already given you credits in the startup header.

My github... I think. Let me look.

Yup: https://github.com/wolf9466/cpuminer-multi

Looks good based on the file name (aesni in it). I'll investigate and if faster and with your
permission I'll add it to my fork, with credit.

I don't have any wallets for this algo so I can mine to your adress while testing. It probaby
won't be a lot but I'll let it run for a bit.

I'm pretty tied up right now, more hash coming for the big 4 and I'm still trying to workaround
macros causing multiple def errors on link. I think I'm getting there, I converted them all to inline
functions, cloned them and gave the clones unique names. I can't help thinking there is a better
way because of my inexperience withc/c++.

I don't know if an inline function is as fast as a macro, it should be, all the work is at compile time.

Which leads me to a question. Using functions means I have to access the context struct members using
'->' instead of '.'.  This says to me there is an added level of indirection. Does this also apply to inline
functions or is the compiler smart enough to optimize it out and code it like it was a ',' as it would be
if the code was truly inline?

It will be interesting to see if there is a penalty in converting to inliners as well as dealing with struct*
instead of struct. On the bright side splitting everything up into the standard init, update & close functions
allows me to take the init out of the loop.

Edit: I'm stumped about these multi defs. I must be missing something fundamental. I think I'll drop it for
now. I think I can get a little more out of aesni. After I release that I'll take a look at your cryptonight.
It looks like it's using some slower sub-algos so I should be able to improve on it.


Compiler errors with Groestl? Yeah, you're gonna kick yourself when you figure out why. I had to take a bit more than a cursory look at that optimized Groestl before I saw it.

You're teasing me. Are you suggesting there's a bug in the macros?
legendary
Activity: 1470
Merit: 1114
Do you plan to release one for Cryptonight algorithm?

cpuminer already has cryptonight built in but I haven't looked at it yet. it's the code as tpruvot's version.
I wanted to get the first version out quickly to implement the available optimizations.

Give cryptonight it a try and let me know how it goes.

I did the original CN code - I kinda doubt it's getting much faster...

Open or private?

Both. It's open now.

Where can I get it? I've already given you credits in the startup header.

My github... I think. Let me look.

Yup: https://github.com/wolf9466/cpuminer-multi

Looks good based on the file name (aesni in it). I'll investigate and if faster and with your
permission I'll add it to my fork, with credit.

I don't have any wallets for this algo so I can mine to your adress while testing. It probaby
won't be a lot but I'll let it run for a bit.

I'm pretty tied up right now, more hash coming for the big 4 and I'm still trying to workaround
macros causing multiple def errors on link. I think I'm getting there, I converted them all to inline
functions, cloned them and gave the clones unique names. I can't help thinking there is a better
way because of my inexperience withc/c++.

I don't know if an inline function is as fast as a macro, it should be, all the work is at compile time.

Which leads me to a question. Using functions means I have to access the context struct members using
'->' instead of '.'.  This says to me there is an added level of indirection. Does this also apply to inline
functions or is the compiler smart enough to optimize it out and code it like it was a ',' as it would be
if the code was truly inline?

It will be interesting to see if there is a penalty in converting to inliners as well as dealing with struct*
instead of struct. On the bright side splitting everything up into the standard init, update & close functions
allows me to take the init out of the loop.

Edit: I'm stumped about these multi defs. I must be missing something fundamental. I think I'll drop it for
now. I think I can get a little more out of aesni. After I release that I'll take a look at your cryptonight.
It looks like it's using some slower sub-algos so I should be able to improve on it.
member
Activity: 81
Merit: 1002
It was only the wind.
Do you plan to release one for Cryptonight algorithm?

cpuminer already has cryptonight built in but I haven't looked at it yet. it's the code as tpruvot's version.
I wanted to get the first version out quickly to implement the available optimizations.

Give cryptonight it a try and let me know how it goes.

I did the original CN code - I kinda doubt it's getting much faster...
legendary
Activity: 1470
Merit: 1114
Do you plan to release one for Cryptonight algorithm?

cpuminer already has cryptonight built in but I haven't looked at it yet. it's the code as tpruvot's version.
I wanted to get the first version out quickly to implement the available optimizations.

Give cryptonight it a try and let me know how it goes.

I did the original CN code - I kinda doubt it's getting much faster...

Open or private?

Both. It's open now.

Where can I get it? I've already given you credits in the startup header.
legendary
Activity: 1470
Merit: 1114
Do you plan to release one for Cryptonight algorithm?

cpuminer already has cryptonight built in but I haven't looked at it yet. it's the code as tpruvot's version.
I wanted to get the first version out quickly to implement the available optimizations.

Give cryptonight it a try and let me know how it goes.

I did the original CN code - I kinda doubt it's getting much faster...

Open or private?
member
Activity: 63
Merit: 10
Forgive my stupidity, but can I use this to mine BCN faster than what I am mining right now with my quad core CPU?  I am thinking of getting more into this, but I need to learn more before I invest more money.  Are there any BCN miners out there that I can buy?
legendary
Activity: 1470
Merit: 1114
Do you plan to release one for Cryptonight algorithm?

cpuminer already has cryptonight built in but I haven't looked at it yet. it's the code as tpruvot's version.
I wanted to get the first version out quickly to implement the available optimizations.

Give cryptonight it a try and let me know how it goes.
sr. member
Activity: 840
Merit: 252
Do you plan to release one for Cryptonight algorithm?
legendary
Activity: 1470
Merit: 1114
I am preparing the first publlic release of cpuminer-2.0.

Unless I make significant progress on windows compilation or sse support it will
be released in the next day or two pending results from the beta testers.

The current features:

Optimized AES-NI kernels for the X algos, quark and qubit.
Optimized SSE2 kernels for quark & qubit
Other miscelaneous kernel optimizations
CPU capabilities checking on startup to select the appropriate kernel
CPU info displayed on startup

Possible additions:

Windows support
SSE2 support for X kernels
Jump to: