Author

Topic: [ANN]: cpuminer-opt v3.8.8.1, open source optimized multi-algo CPU miner - page 149. (Read 444131 times)

legendary
Activity: 1470
Merit: 1114
[rambling rant removed]

Holy crap calm down. What does Nicehash and IP limiting have to do with cpuminer-opt? On second thought don't
answer that.

I've spent considerable time trying to help you with your compile problem and you've been needling me
from day one about conflating old CPUs with server class CPUs. I know damn well what a server class CPU is.
Some server class CPUs, not necessarily Intel or x86, have the ability to disable functional units if they aren't needed
in a specific environment. It was all irrelevant because all I wanted was to rule out a CPU issue and did so with your first
post with the miner output identifying your CPU's capabilities. I didn't know or care whether your CPU was bigger than
my CPU I just wanted to confirm it had AVX2. And I thought we had moved on.

Then you needle me again about cross compiling. I also let that go at the time but decided to bring up these issues
in a lighthearted way once the main issue was mostly understood. But you respond with an incoherent rant.

There was no point in bringing up semantics, and the strict definition of cross compiling or that I didn't use it correctly
or in context. There was no ambiguity in my language, a rarity it seems even with native english speakers. Iam not
referring to you, you have been very clear.

What was your point in mentioning these trivial issues if you didn't expect me to respond to your jabs?
hero member
Activity: 589
Merit: 507
I don't buy nor sell anything here and never will.
That's what I was referring to when I wrote "compiler produced AVX2". AVX(2) provides SIMD instructions an it's unlikely something
like a printf would use it. A memcpy wouldn't use it because of the overhead of loading/storing the data to/from the ymm regs.
It's only useful for vector arith, and apparently the compiler isn't smart enough to convert conventionally coded array processing
loops to AVX2. I'm not even sure *if* the compiler can optimize in this fashion, the existance of so much hand coded AVX2 suggests
otherwise.

I wasn't talking about what is or isn't likely but what is theoretically possible. With -O flags and -march=native the compiler can use all instruction sets it has to its disposal and if and how it decides to translate a higher language into a machine code is only up to it, so if it feels like it it will use AVX2 for printf and that's that.

Since we're playing semantic games would you care to explain your concerns with my use of the term cross compiling?
IMO cross compiling can mean any compilation not done on the target machine and not executable on the build machine.

Of course you can have any opinion you want, that's your freedom, but that doesn't change the fact that cross compiling is when a different host/target is desired. Furthermore, your own personal definition doesn't fit this narrative either because we were talking about compiling on avx2 cpu and producing sse code. In such a case the code is executable on the build machine so not even by your very own definition would it be cross compiling.

And my comment about you maybe using a core2 was based on the symptoms you decribed and that some server CPUs can
be optimized for efficiency by removing/disabling unneeded features like floating point, AES or AVX.

No, core2 is not and never was a server cpu and I was very clearly talking about real servers in the data centers, not some home desktop computers that you call "servers". I started the whole discussion with complaining about Nicehash not working on my servers anymore and later I confirmed someone else's suggestion that it may be because I am using the same BTC address on all my machines, there are tens of them and each has a different IP. That in itself basically rules out any possible doubt because I could hardly have tens of computers in my living room and even then no ordinary household ISP would give me tens of public IPs. They don't even give a single public IP per household around here anymore by default, you have to ask for it and pay for it. And even furthermore, if I was to have tens of mining "servers" (i.e. desktops) in my home, full of power-hungry GPUs, I wouldn't be able to power them all because an ordinary household has a 20A fuse, that would translate into some 3500 Watts if the power supplies had 80% efficiency. You can maybe power 5 relatively decent mining machines with that but definitely not tens.


Now I don't want to sound harsh but this is enough. I very much support your efforts and try to help you and am eager to discuss any technical topic but I will not play this game, especially not at this level. I don't actually mind word games but here you have no real argument in any of what you just said and yet it seems as if you were trying to kind of get back at me regardless and that is just pointless and for you humiliating and for both of us a waste of time. I wouldn't dare to say a word about anything related to your AVX/AVX2 optimizing but when I do say something I try to pay attention on my wording so if you want to look for my errors you would have to try harder but if you really did that you would only confirm a very immature personality. I merely, totally innocently, rightfully and truthfully pointed out, in the brackets even, that cross compiling is something else than what you said and that triggered you and turned you into a rogue word warrior trying to reclaim his supremacy? Come on, kids do that, not adults.
legendary
Activity: 1470
Merit: 1114

Theoretically yes if there exists any earlier executed code that contained compiler produced AVX2 instructions from regular source.
That isn't likely since the capabilities check is done ealy in main.

Let's not forget that we ask gcc to compile and optimize (all those -O2 -O3 -Ofast) for the cpu it's being run on. So regardless whether you actually include any explicit AVX/AVX2 assembler in the code, even a simple printf("hi"); may produce AVX2 instruction(s) if the compiler feels like it. That's the whole point of the compiler compiling for the given cpu (-march=native) - it's allowed to use all the capabilities (and thus instruction sets) of the cpu.


That's what I was referring to when I wrote "compiler produced AVX2". AVX(2) provides SIMD instructions an it's unlikely something
like a printf would use it. A memcpy wouldn't use it because of the overhead of loading/storing the data to/from the ymm regs.
It's only useful for vector arith, and apparently the compiler isn't smart enough to convert conventionally coded array processing
loops to AVX2. I'm not even sure *if* the compiler can optimize in this fashion, the existance of so much hand coded AVX2 suggests
otherwise.

Since we're playing semantic games would you care to explain your concerns with my use of the term cross compiling?
IMO cross compiling can mean any compilation not done on the target machine and not executable on the build machine.

And my comment about you maybe using a core2 was based on the symptoms you decribed and that some server CPUs can
be optimized for efficiency by removing/disabling unneeded features like floating point, AES or AVX.

Not true. memcpy() and friends CAN and do use SSE/AVX - if the source/dests are aligned properly.

It doesn't really matter in this context whether it crashes before or after the warning message.

I'll take your word for it, but it doesn't seem to make much sense. It is essentially load, move, store, 256 bits wide. Where
are the savings? I presume it takes longer to load data into the ymm regs than general purpose ones. The same amount of data
has to be moved around in memory. Using AVX seems to make sense if you're going to do a lot of processing of the data while
in vector format.

There's my strawman, rip it apart.

Speaking of alignment I need to fix that up in my avx code. I used all loadu/storeu for convenience.

If it's aligned, then the load/stores don't take nearly as long. Also keep in mind that there's no such thing as a mov memaddr, memaddr opcode in x86 that I know of. Therefore, it's gotta go in a register (this is simplified, I know about things like DMA, but they don't come into play for the purposes of this discussion) and if it's aligned, it makes one hell of a lot more sense to stuff it in an AVX register, because it's a lot wider than a GPR. Even better if you're doing some kind of gather-scatter shit, possibly.

OK I'm with you now, There is a significant savings in instruction count. As long as AVX instructions execute as fast as regular ones
and the code is not data bound it will be faster.
legendary
Activity: 1470
Merit: 1114

Theoretically yes if there exists any earlier executed code that contained compiler produced AVX2 instructions from regular source.
That isn't likely since the capabilities check is done ealy in main.

Let's not forget that we ask gcc to compile and optimize (all those -O2 -O3 -Ofast) for the cpu it's being run on. So regardless whether you actually include any explicit AVX/AVX2 assembler in the code, even a simple printf("hi"); may produce AVX2 instruction(s) if the compiler feels like it. That's the whole point of the compiler compiling for the given cpu (-march=native) - it's allowed to use all the capabilities (and thus instruction sets) of the cpu.


That's what I was referring to when I wrote "compiler produced AVX2". AVX(2) provides SIMD instructions an it's unlikely something
like a printf would use it. A memcpy wouldn't use it because of the overhead of loading/storing the data to/from the ymm regs.
It's only useful for vector arith, and apparently the compiler isn't smart enough to convert conventionally coded array processing
loops to AVX2. I'm not even sure *if* the compiler can optimize in this fashion, the existance of so much hand coded AVX2 suggests
otherwise.

Since we're playing semantic games would you care to explain your concerns with my use of the term cross compiling?
IMO cross compiling can mean any compilation not done on the target machine and not executable on the build machine.

And my comment about you maybe using a core2 was based on the symptoms you decribed and that some server CPUs can
be optimized for efficiency by removing/disabling unneeded features like floating point, AES or AVX.

Not true. memcpy() and friends CAN and do use SSE/AVX - if the source/dests are aligned properly.

It doesn't really matter in this context whether it crashes before or after the warning message.

I'll take your word for it, but it doesn't seem to make much sense. It is essentially load, move, store, 256 bits wide. Where
are the savings? I presume it takes longer to load data into the ymm regs than general purpose ones. The same amount of data
has to be moved around in memory. Using AVX seems to make sense if you're going to do a lot of processing of the data while
in vector format.

There's my strawman, rip it apart.

Speaking of alignment I need to fix that up in my avx code. I used all loadu/storeu for convenience.
legendary
Activity: 1470
Merit: 1114

Theoretically yes if there exists any earlier executed code that contained compiler produced AVX2 instructions from regular source.
That isn't likely since the capabilities check is done ealy in main.

Let's not forget that we ask gcc to compile and optimize (all those -O2 -O3 -Ofast) for the cpu it's being run on. So regardless whether you actually include any explicit AVX/AVX2 assembler in the code, even a simple printf("hi"); may produce AVX2 instruction(s) if the compiler feels like it. That's the whole point of the compiler compiling for the given cpu (-march=native) - it's allowed to use all the capabilities (and thus instruction sets) of the cpu.


That's what I was referring to when I wrote "compiler produced AVX2". AVX(2) provides SIMD instructions an it's unlikely something
like a printf would use it. A memcpy wouldn't use it because of the overhead of loading/storing the data to/from the ymm regs.
It's only useful for vector arith, and apparently the compiler isn't smart enough to convert conventionally coded array processing
loops to AVX2. I'm not even sure *if* the compiler can optimize in this fashion, the existance of so much hand coded AVX2 suggests
otherwise.

Since we're playing semantic games would you care to explain your concerns with my use of the term cross compiling?
IMO cross compiling can mean any compilation not done on the target machine and not executable on the build machine.

And my comment about you maybe using a core2 was based on the symptoms you decribed and that some server CPUs can
be optimized for efficiency by removing/disabling unneeded features like floating point, AES or AVX.
legendary
Activity: 1470
Merit: 1114
There are two copies of scrypt-jane, one copy used by scrypt-jane itself and the other used by
argon2. Only the argon2 version is optimized and was taken from the argon2 branch of multi. The other copy was taken
from the windows branch and used by scrypt-jane. At some point I intend to integrate the optimized version for use
by scrypt-jane algo, but it's not a very popular algo. Maybe you could try compiling the argon2 branch of multi to see
if it also has the same error.

You are absolutely right. If I build the argon2 branch with the default build.sh (LTO and the other flags disabled) it compiles, but if I do my uncommenting I get the same error:


I think the situation is pretty well understood now.

gcc 5.4.0 has enhancements to LTO that are incompatible with the existing optimized scrypt-jane code used by argon2.
Those same enhancements improve performance if compiled with gcc 5.4.0 and -flto.
The short term workaround for users with gcc 5.4.0 is to disable argon2 by hiding the source directory from the compiler
and removing the registration of argon2 in algo-gate-api.c:register_algo_gate.
The long term solution is to find the cause of the compile error and plan a way to fix it or rewrite the functions using
and in a way that doesn't try to outsmart the compiler.

In addition it is necessary to workaound a compile error in grs. This is accomplished by hiding algo/groestl/sse2/ directory from
the compiler and removing algo/groestl/sse2/grso-asm.c from Makefile.am. This workaround will break the SSE2 compile.
A long term fix for this issue is unlikely and will probably result in reduced performance for some algos on SSE2 limited
CPUs in order to have more performance on newer ones.
hero member
Activity: 589
Merit: 507
I don't buy nor sell anything here and never will.
There are two copies of scrypt-jane, one copy used by scrypt-jane itself and the other used by
argon2. Only the argon2 version is optimized and was taken from the argon2 branch of multi. The other copy was taken
from the windows branch and used by scrypt-jane. At some point I intend to integrate the optimized version for use
by scrypt-jane algo, but it's not a very popular algo. Maybe you could try compiling the argon2 branch of multi to see
if it also has the same error.

You are absolutely right. If I build the argon2 branch with the default build.sh (LTO and the other flags disabled) it compiles, but if I do my uncommenting I get the same error:
Code:
/tmp/ccAcmNi5.ltrans14.ltrans.o: In function `scrypt_ROMix_sse2.lto_priv.317':
/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: undefined reference to `scrypt_ChunkMix_sse2'
/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: undefined reference to `scrypt_ChunkMix_sse2'
/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: undefined reference to `scrypt_ChunkMix_sse2'
/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: undefined reference to `scrypt_ChunkMix_sse2'
/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: undefined reference to `scrypt_ChunkMix_sse2'
/tmp/ccAcmNi5.ltrans14.ltrans.o:/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: more undefined references to `scrypt_ChunkMix_sse2' follow
/tmp/ccAcmNi5.ltrans14.ltrans.o: In function `scrypt_ROMix_ssse3.lto_priv.316':
/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: undefined reference to `scrypt_ChunkMix_ssse3'
/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: undefined reference to `scrypt_ChunkMix_ssse3'
/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: undefined reference to `scrypt_ChunkMix_ssse3'
/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: undefined reference to `scrypt_ChunkMix_ssse3'
/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: undefined reference to `scrypt_ChunkMix_ssse3'
/tmp/ccAcmNi5.ltrans14.ltrans.o:/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: more undefined references to `scrypt_ChunkMix_ssse3' follow
/tmp/ccAcmNi5.ltrans14.ltrans.o: In function `scrypt_ROMix_avx.lto_priv.315':
/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: undefined reference to `scrypt_ChunkMix_avx'
/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: undefined reference to `scrypt_ChunkMix_avx'
/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: undefined reference to `scrypt_ChunkMix_avx'
/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: undefined reference to `scrypt_ChunkMix_avx'
/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: undefined reference to `scrypt_ChunkMix_avx'
/tmp/ccAcmNi5.ltrans14.ltrans.o:/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: more undefined references to `scrypt_ChunkMix_avx' follow
/tmp/ccAcmNi5.ltrans14.ltrans.o: In function `scrypt_ROMix_xop.lto_priv.314':
/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: undefined reference to `scrypt_ChunkMix_xop'
/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: undefined reference to `scrypt_ChunkMix_xop'
/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: undefined reference to `scrypt_ChunkMix_xop'
/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: undefined reference to `scrypt_ChunkMix_xop'
/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: undefined reference to `scrypt_ChunkMix_xop'
/tmp/ccAcmNi5.ltrans14.ltrans.o:/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: more undefined references to `scrypt_ChunkMix_xop' follow
/tmp/ccAcmNi5.ltrans14.ltrans.o: In function `scrypt_ROMix_avx2.lto_priv.313':
/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: undefined reference to `scrypt_ChunkMix_avx2'
/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: undefined reference to `scrypt_ChunkMix_avx2'
/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: undefined reference to `scrypt_ChunkMix_avx2'
/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: undefined reference to `scrypt_ChunkMix_avx2'
/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: undefined reference to `scrypt_ChunkMix_avx2'
/tmp/ccAcmNi5.ltrans14.ltrans.o:/root/z/cpuminer-multi/ar2/sj/scrypt-jane-romix-template.h:89: more undefined references to `scrypt_ChunkMix_avx2' follow
collect2: error: ld returned 1 exit status
Makefile:881: recipe for target 'cpuminer' failed
legendary
Activity: 1470
Merit: 1114
I just want to make sure I understand the problem definition

- multi is faster with -flto
- multi without -flto is slower than identically compiled opt
- multi with -flto is faster than pre-avx2 compiled without -flto
- opt fails to compile with gcc 5.4.0 with -flto
- -flto compiles with gcc 4.8.4 with no effect in performance.

The significant points are:

- flto is faster with gcc 5.4.0
- code that compiles with -flto using gcc 4.8.4 fails to compile using gcc 5.4.0.

Yes I think you got all those points right. Well, we would have to deal with 5.4.0 sooner or later anyway, it's not going anywhere.

The code that fails to compile is pretty ugly. It uses asm function pointers to select targets at compile time.
I've never seen anything like this so it will take a while to understand what is going on. It looks like the code is
self contained and the error doesn't seem to be related to missing libraries.

Ugly or ingenious, crazy anyway. It took me a while to figure where exactly was the error coming from  Smiley

I just looked at the current tpruvot's and this culprit file scrypt-jane-romix-template.h is a bit different now. You know, it's pity you are not using a versioning system where you would have your improvements on top of tpruvot's. If you used github you could have had automatic linux/windows building done on travis-ci after every commit, including an automatic publishing of the binaries on github.


Maybe too ingeneous for the compiler.

I can't find any normal definition of scrypt_ChunkMix_avx2 but there is some code in an unfamiliar syntax in
scrypt-jane-mix_salsa64-avx2.h that may be it. I'm guessing the technique used to abtract asm functions by
building custom stack linkage as well as using asm function pointers is too much for LTO to handle.

I'll look into the TPruvot delta to see if it addresses this issue. Another alternative is to rewrite them in C but
I'm not motivated to do that at this time, given there are alternatives for optimized mining of argon2.

The GRS macros have been a pain for me since day 1. I had to make changes so they could be included in
multiple algos. It looks like LTO likes them as much as I do. The problem is they're faster than the SPH versions.
Do the errors occur only with -flto or do they also occur with gcc 5.4.0 with my build.sh options? This code is only
compiled for SSE2 builds, I may have to drop support for it or degrade performance by using SPH functions in the future.

There are two copies of scrypt-jane, one copy used by scrypt-jane itself and the other used by
argon2. Only the argon2 version is optimized and was taken from the argon2 branch of multi. The other copy was taken
from the windows branch and used by scrypt-jane. At some point I intend to integrate the optimized version for use
by scrypt-jane algo, but it's not a very popular algo. Maybe you could try compiling the argon2 branch of multi to see
if it also has the same error.
hero member
Activity: 589
Merit: 507
I don't buy nor sell anything here and never will.
The GRS macros have been a pain for me since day 1. I had to make changes so they could be included in
multiple algos. It looks like LTO likes them as much as I do. The problem is they're faster than the SPH versions.
Do the errors occur only with -flto or do they also occur with gcc 5.4.0 with my build.sh options? This code is only
compiled for SSE2 builds, I may have to drop support for it or degrade performance by using SPH functions in the future.

With your build.sh, without messing with LTO, I haven't seen any problem. But that was on avx/avx2 cpus. On non-avx cpu I compiled only today and that was straight with LTO. So now I did just that - downloaded 3.4.0 and ran the untouched build.sh. No error. But look at the remarkable speed difference when I compare it with 3.4.0 built with tpruvot's uncommented (LTO enabled) build.sh:
Code:
root@xxx:~/z/cpuminer-opt-3.4.0# ./cpuminer -a lyra2re --benchmark

         **********  cpuminer-multi 1.2-dev  ***********
     A CPU miner with multi algo support and optimized for CPUs
     with AES_NI and AVX extensions.
     BTC donation address: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT
     Forked from TPruvot's cpuminer-multi with credits
     to Lucas Jones, elmad, palmd, djm34, pooler, ig0tik3d,
     Wolf0, Jeff Garzik and Optiminer.

CPU: Intel(R) Xeon(R) CPU           X5570  @ 2.93GHz
CPU features: SSE2
SW built on Aug  5 2016 with GCC 5.4.0
SW features: SSE2
Algo features: SSE2 AES AVX AVX2
AES not available, starting mining with SSE2 optimizations...

[2016-08-05 19:03:55] 16 miner threads started, using 'lyra2re' algorithm.
[2016-08-05 19:03:56] Total: 983.04 kH, 872.93 kH/s
[2016-08-05 19:04:00] Total: 3218.22 kH, 927.88 kH/s
[2016-08-05 19:04:06] Total: 4515.37 kH, 896.32 kH/s
[2016-08-05 19:04:10] Total: 4143.24 kH, 907.80 kH/s

root@xxx:~/cpuminer-opt-sse# ./cpuminer -a lyra2re --benchmark

         **********  cpuminer-multi 1.2-dev  ***********
     A CPU miner with multi algo support and optimized for CPUs
     with AES_NI and AVX extensions.
     BTC donation address: 12tdvfF7KmAsihBXQXynT6E6th2c2pByTT
     Forked from TPruvot's cpuminer-multi with credits
     to Lucas Jones, elmad, palmd, djm34, pooler, ig0tik3d,
     Wolf0, Jeff Garzik and Optiminer.

CPU: Intel(R) Xeon(R) CPU           X5570  @ 2.93GHz
CPU features: SSE2
SW built on Aug  5 2016 with GCC 5.4.0
SW features: SSE2
Algo features: SSE2 AES AVX AVX2
AES not available, starting mining with SSE2 optimizations...

[2016-08-05 19:04:33] 16 miner threads started, using 'lyra2re' algorithm.
[2016-08-05 19:04:34] Total: 786.43 kH, 753.31 kH/s
[2016-08-05 19:04:38] Total: 3006.12 kH, 993.53 kH/s
[2016-08-05 19:04:43] Total: 4691.35 kH, 988.85 kH/s
[2016-08-05 19:04:48] Total: 4694.50 kH, 994.89 kH/s
[2016-08-05 19:04:53] Total: 4881.65 kH, 998.29 kH/s

it's pity you are not using a versioning system where you would have your improvements on top of tpruvot's. If you used github you could have had automatic linux/windows building done on travis-ci after every commit, including an automatic publishing of the binaries on github.

I can't figure out how to get my code into github. It must be simple but I haven't figured it out. I have an account and played with
it using the tutorial but I'm stuck.
I also want to make a gradual transition so I don't get bogged down trying to figure out github when I'm trying to focus
on the miner. The first phase is to continue development offline and upload releases.

I think this should be of help: https://help.github.com/articles/adding-an-existing-project-to-github-using-the-command-line/. Or, to make you feel better, you can always do what I did when I didn't know how to do this same thing: I created an empty repo on github, then git clone it locally, then copied all the files into the directory, then git add it and then just git push it back to github  Smiley

If I remember correctly when you try to push it the first time it will setup your github credentials, if you don't want it to ask you for your password every single time you can use git config credential.helper store and it will save it and remember.

As for travis, it requires .travis.yml file to be present. This file already exists and it's quite self-explanatory if you look into it. Travis for public opensource projects is free and if you hook it up it will then automatically build after every commit you make. Those builds, for linux, are excellent in that they independently confirm your code is compilable (which you already knew anyway, but still..) but I wouldn't try to gather and publish the binaries, though. I wouldn't publish binaries for linux at all, you would just get into troubles and people are already used to compile themselves. But you can also setup building with win64 target using mingw and those I would publish so that you (the people) don't rely on some unknown third party. I don't want to spread paranoia or anything but you never know what did they do to the code, if they added something..

You can also get an .edu email for $2 from here: https://bitcointalksearch.org/topic/vouchededu-email-2paypal-acceptedmany-promoshero-sellertons-of-vouches-1321638 which will (among others) allow you to get a github student pack for free, which in turn will (among others) give you access to travis-ci.com (as opposite to travis-ci.org) which is for private building (normally it would cost money, this way it would be free) and that you could use as a sandbox maybe..

But if you are serious about github there is one crucial decision you will have to make, not necessarily now. Whether you want to keep working on top of years old tpruvot's code and keep ignoring all changes and possible improvements he has done ever since you took his code and he will keep doing in the future, or if you actually extract all your changes from the old code you took, fork his repo and put them on top of it. That way you will stay in sync with him and will keep your changes being reapplied after every his commit. This would require quite a lot of work, the part where you extract your changes, it's just about making a diff, fast-forward his code to the recent and reapplying the diff. Maybe someone young and enthusiastic could do that? Unfortunately I am old and tired so I am out of the game.

Theoretically yes if there exists any earlier executed code that contained compiler produced AVX2 instructions from regular source.
That isn't likely since the capabilities check is done ealy in main.

Let's not forget that we ask gcc to compile and optimize (all those -O2 -O3 -Ofast) for the cpu it's being run on. So regardless whether you actually include any explicit AVX/AVX2 assembler in the code, even a simple printf("hi"); may produce AVX2 instruction(s) if the compiler feels like it. That's the whole point of the compiler compiling for the given cpu (-march=native) - it's allowed to use all the capabilities (and thus instruction sets) of the cpu.

EDIT: oh, Fuzzbawls was faster. Btw, Fuzzbawls, you wouldn't get by "ok" with strictly using the github website, you would commit a suicide. Editing file directly via the github website is possible but it's so painful, don't even try it. Let alone everytime you click on "save", of every file you changed, it creates a new commit, so then you can't even review your changes easily. Horrible, terrible. But for non-commandline people many IDEs have git integrated so you don't have to use git but you really would very hardly survive by using just the github website.
hero member
Activity: 750
Merit: 500
If someone knows the command to login and upload a source tree it would get me unstuck and I could move forward.

quick n' dirty: https://help.github.com/articles/adding-an-existing-project-to-github-using-the-command-line/

this will take your existing source tree and upload it to whatever new repository you create on github, with the "First Commit" being a complete replica of your local source tree's current state (at the moment you make the commit). You will be prompted for your github username/password after step 9.

A quick read of https://git-scm.com/book/en/v2/Getting-Started-First-Time-Git-Setup, specifically the two CLI commands pertaining to "Your Identity" should be done between step 4 and 5 (user.email should match the email associated with your github account, but can also be username@users.noreply.github.com where username is your github username)
legendary
Activity: 1470
Merit: 1114
it's pity you are not using a versioning system where you would have your improvements on top of tpruvot's. If you used github you could have had automatic linux/windows building done on travis-ci after every commit, including an automatic publishing of the binaries on github.


I can't figure out how to get my code into github. It must be simple but I haven't figured it out. I have an account and played with
it using the tutorial but I'm stuck.
I also want to make a gradual transition so I don't get bogged down trying to figure out github when I'm trying to focus
on the miner. The first phase is to continue development offline and upload releases.

I know this may sound counter-productive for focusing on the actual coding of the miner, but the sooner you learn git + github the easier your life becomes in regards to making ANY changes to code. Well worth the invested time!

Its important to remember, however, that github is a GUI (mostly) that is built on top of git. One can manage and get by "ok" with strictly using the github website for their code management, but knowing how to use the git CLI toolset is second to none.

Travis-CI, as mentioned, can also greatly improve the efficiency of any build testing you do. It is another system to learn, and has it's own learning curve and, of course, changes over time...but is also a VERY powerful and customizable tool.

Hi Fuzzbawls, you are correct but I need to get to a certain level of proficiency with github (I mean both git cli and github gui) before
I can be as productive. I don't want to get bogged down because I'm lost in the dev environment.

If someone knows the command to login and upload a source tree it would get me unstuck and I could move forward.
legendary
Activity: 1470
Merit: 1114
My logic for AVX2 isn't fully implemented yet in the capablilities checks, had it been it would have
displayed a message warning of the impending crash, then crashed. This is what you should see when implemented:

Code:
CPU features: SSE2 AES AVX
SW built on Aug  5 2016 with GCC 5.4.0
SW features: SSE2 AES AVX AVX2
Algo features: SSE2 AES AVX AVX2
[color=red]Unsupported CPU or SW configuration, miner will likely crash![/color]
Illegal instruction (core dumped)

It's theoretically possibly that the binary would crash even before this print if gcc decided to use some of those "illegal" instructions in the code preceding the print.

Theoretically yes if there exists any earlier executed code that contained compiler produced AVX2 instructions from regular source.
That isn't likely since the capabilities check is done ealy in main.
hero member
Activity: 750
Merit: 500
it's pity you are not using a versioning system where you would have your improvements on top of tpruvot's. If you used github you could have had automatic linux/windows building done on travis-ci after every commit, including an automatic publishing of the binaries on github.


I can't figure out how to get my code into github. It must be simple but I haven't figured it out. I have an account and played with
it using the tutorial but I'm stuck.
I also want to make a gradual transition so I don't get bogged down trying to figure out github when I'm trying to focus
on the miner. The first phase is to continue development offline and upload releases.

I know this may sound counter-productive for focusing on the actual coding of the miner, but the sooner you learn git + github the easier your life becomes in regards to making ANY changes to code. Well worth the invested time!

Its important to remember, however, that github is a GUI (mostly) that is built on top of git. One can manage and get by "ok" with strictly using the github website for their code management, but knowing how to use the git CLI toolset is second to none.

Travis-CI, as mentioned, can also greatly improve the efficiency of any build testing you do. It is another system to learn, and has it's own learning curve and, of course, changes over time...but is also a VERY powerful and customizable tool.
legendary
Activity: 1470
Merit: 1114
it's pity you are not using a versioning system where you would have your improvements on top of tpruvot's. If you used github you could have had automatic linux/windows building done on travis-ci after every commit, including an automatic publishing of the binaries on github.


I can't figure out how to get my code into github. It must be simple but I haven't figured it out. I have an account and played with
it using the tutorial but I'm stuck.
I also want to make a gradual transition so I don't get bogged down trying to figure out github when I'm trying to focus
on the miner. The first phase is to continue development offline and upload releases.
legendary
Activity: 1470
Merit: 1114
I just want to make sure I understand the problem definition

- multi is faster with -flto
- multi without -flto is slower than identically compiled opt
- multi with -flto is faster than pre-avx2 compiled without -flto
- opt fails to compile with gcc 5.4.0 with -flto
- -flto compiles with gcc 4.8.4 with no effect in performance.

The significant points are:

- flto is faster with gcc 5.4.0
- code that compiles with -flto using gcc 4.8.4 fails to compile using gcc 5.4.0.

Yes I think you got all those points right. Well, we would have to deal with 5.4.0 sooner or later anyway, it's not going anywhere.

The code that fails to compile is pretty ugly. It uses asm function pointers to select targets at compile time.
I've never seen anything like this so it will take a while to understand what is going on. It looks like the code is
self contained and the error doesn't seem to be related to missing libraries.

Ugly or ingenious, crazy anyway. It took me a while to figure where exactly was the error coming from  Smiley

I just looked at the current tpruvot's and this culprit file scrypt-jane-romix-template.h is a bit different now. You know, it's pity you are not using a versioning system where you would have your improvements on top of tpruvot's. If you used github you could have had automatic linux/windows building done on travis-ci after every commit, including an automatic publishing of the binaries on github.


Maybe too ingeneous for the compiler.

I can't find any normal definition of scrypt_ChunkMix_avx2 but there is some code in an unfamiliar syntax in
scrypt-jane-mix_salsa64-avx2.h that may be it. I'm guessing the technique used to abtract asm functions by
building custom stack linkage as well as using asm function pointers is too much for LTO to handle.

I'll look into the TPruvot delta to see if it addresses this issue. Another alternative is to rewrite them in C but
I'm not motivated to do that at this time, given there are alternatives for optimized mining of argon2.

The GRS macros have been a pain for me since day 1. I had to make changes so they could be included in
multiple algos. It looks like LTO likes them as much as I do. The problem is they're faster than the SPH versions.
Do the errors occur only with -flto or do they also occur with gcc 5.4.0 with my build.sh options? This code is only
compiled for SSE2 builds, I may have to drop support for it or degrade performance by using SPH functions in the future.
hero member
Activity: 589
Merit: 507
I don't buy nor sell anything here and never will.
I just want to make sure I understand the problem definition

- multi is faster with -flto
- multi without -flto is slower than identically compiled opt
- multi with -flto is faster than pre-avx2 compiled without -flto
- opt fails to compile with gcc 5.4.0 with -flto
- -flto compiles with gcc 4.8.4 with no effect in performance.

The significant points are:

- flto is faster with gcc 5.4.0
- code that compiles with -flto using gcc 4.8.4 fails to compile using gcc 5.4.0.

Yes I think you got all those points right. Well, we would have to deal with 5.4.0 sooner or later anyway, it's not going anywhere.

The code that fails to compile is pretty ugly. It uses asm function pointers to select targets at compile time.
I've never seen anything like this so it will take a while to understand what is going on. It looks like the code is
self contained and the error doesn't seem to be related to missing libraries.

Ugly or ingenious, crazy anyway. It took me a while to figure where exactly was the error coming from  Smiley

I just looked at the current tpruvot's and this culprit file scrypt-jane-romix-template.h is a bit different now. You know, it's pity you are not using a versioning system where you would have your improvements on top of tpruvot's. If you used github you could have had automatic linux/windows building done on travis-ci after every commit, including an automatic publishing of the binaries on github.
hero member
Activity: 589
Merit: 507
I don't buy nor sell anything here and never will.
Excellent work. The easiest way to block the compile error is to comment out the source dir for argon2 and remove the registration
call for argon2 in algo-gate-api.c:register_algo_gate. You can easilly remove any algo this way.

You have demonstrated that LTO improves performance with the new compiler but has some incompatibilities with the existing
argon2 code. I will investigate argon2 to try to solve it.

I am glad I could help. When I was compiling on Xeon X5570 which is neither avx/avx2 I got also this error at the final link (the compiler and the source code was the same, just a different cpu):
Code:
/tmp/ccmS1O9H.ltrans19.ltrans.o: In function `grsoQ1024ASM':
:(.text+0xa530): undefined reference to `grsoT0'
:(.text+0xa538): undefined reference to `grsoT1'
:(.text+0xa54a): undefined reference to `grsoT0'
:(.text+0xa552): undefined reference to `grsoT1'
:(.text+0xa564): undefined reference to `grsoT2'
:(.text+0xa56c): undefined reference to `grsoT3'
:(.text+0xa57e): undefined reference to `grsoT2'
:(.text+0xa586): undefined reference to `grsoT3'
:(.text+0xa598): undefined reference to `grsoT4'
:(.text+0xa5a0): undefined reference to `grsoT5'
:(.text+0xa5b2): undefined reference to `grsoT4'
:(.text+0xa5ba): undefined reference to `grsoT5'
:(.text+0xa5d0): undefined reference to `grsoT6'
:(.text+0xa5d8): undefined reference to `grsoT7'
:(.text+0xa5ea): undefined reference to `grsoT6'
:(.text+0xa5f2): undefined reference to `grsoT7'
:(.text+0xa600): undefined reference to `grsoT0'
:(.text+0xa608): undefined reference to `grsoT1'
:(.text+0xa61a): undefined reference to `grsoT0'
:(.text+0xa622): undefined reference to `grsoT1'
:(.text+0xa634): undefined reference to `grsoT2'
:(.text+0xa63c): undefined reference to `grsoT3'
:(.text+0xa64e): undefined reference to `grsoT2'
:(.text+0xa656): undefined reference to `grsoT3'
:(.text+0xa668): undefined reference to `grsoT4'
:(.text+0xa670): undefined reference to `grsoT5'
:(.text+0xa682): undefined reference to `grsoT4'
:(.text+0xa68a): undefined reference to `grsoT5'
:(.text+0xa6a0): undefined reference to `grsoT6'
:(.text+0xa6a8): undefined reference to `grsoT7'
:(.text+0xa6ba): undefined reference to `grsoT6'
:(.text+0xa6c2): undefined reference to `grsoT7'
:(.text+0xa730): undefined reference to `grsoT0'
:(.text+0xa738): undefined reference to `grsoT1'
:(.text+0xa74a): undefined reference to `grsoT0'
:(.text+0xa752): undefined reference to `grsoT1'
:(.text+0xa764): undefined reference to `grsoT2'
:(.text+0xa76c): undefined reference to `grsoT3'
:(.text+0xa77e): undefined reference to `grsoT2'
:(.text+0xa786): undefined reference to `grsoT3'
:(.text+0xa798): undefined reference to `grsoT4'
:(.text+0xa7a0): undefined reference to `grsoT5'
:(.text+0xa7b2): undefined reference to `grsoT4'
:(.text+0xa7ba): undefined reference to `grsoT5'
:(.text+0xa7d0): undefined reference to `grsoT6'
:(.text+0xa7d8): undefined reference to `grsoT7'
:(.text+0xa7ea): undefined reference to `grsoT6'
:(.text+0xa7f2): undefined reference to `grsoT7'
:(.text+0xa800): undefined reference to `grsoT0'
:(.text+0xa808): undefined reference to `grsoT1'
:(.text+0xa81a): undefined reference to `grsoT0'
:(.text+0xa822): undefined reference to `grsoT1'
:(.text+0xa834): undefined reference to `grsoT2'
:(.text+0xa83c): undefined reference to `grsoT3'
:(.text+0xa84e): undefined reference to `grsoT2'
:(.text+0xa856): undefined reference to `grsoT3'
:(.text+0xa868): undefined reference to `grsoT4'
:(.text+0xa870): undefined reference to `grsoT5'
:(.text+0xa882): undefined reference to `grsoT4'
:(.text+0xa88a): undefined reference to `grsoT5'
:(.text+0xa8a0): undefined reference to `grsoT6'
:(.text+0xa8a8): undefined reference to `grsoT7'
:(.text+0xa8ba): undefined reference to `grsoT6'
:(.text+0xa8c2): undefined reference to `grsoT7'
:(.text+0xa930): undefined reference to `grsoT0'
:(.text+0xa938): undefined reference to `grsoT1'
:(.text+0xa94a): undefined reference to `grsoT0'
:(.text+0xa952): undefined reference to `grsoT1'
:(.text+0xa964): undefined reference to `grsoT2'
:(.text+0xa96c): undefined reference to `grsoT3'
:(.text+0xa97e): undefined reference to `grsoT2'
:(.text+0xa986): undefined reference to `grsoT3'
:(.text+0xa998): undefined reference to `grsoT4'
:(.text+0xa9a0): undefined reference to `grsoT5'
:(.text+0xa9b2): undefined reference to `grsoT4'
:(.text+0xa9ba): undefined reference to `grsoT5'
:(.text+0xa9d0): undefined reference to `grsoT6'
:(.text+0xa9d8): undefined reference to `grsoT7'
:(.text+0xa9ea): undefined reference to `grsoT6'
:(.text+0xa9f2): undefined reference to `grsoT7'
:(.text+0xaa00): undefined reference to `grsoT0'
:(.text+0xaa08): undefined reference to `grsoT1'
:(.text+0xaa1a): undefined reference to `grsoT0'
:(.text+0xaa22): undefined reference to `grsoT1'
:(.text+0xaa34): undefined reference to `grsoT2'
:(.text+0xaa3c): undefined reference to `grsoT3'
:(.text+0xaa4e): undefined reference to `grsoT2'
:(.text+0xaa56): undefined reference to `grsoT3'
:(.text+0xaa68): undefined reference to `grsoT4'
:(.text+0xaa70): undefined reference to `grsoT5'
:(.text+0xaa82): undefined reference to `grsoT4'
:(.text+0xaa8a): undefined reference to `grsoT5'
:(.text+0xaaa0): undefined reference to `grsoT6'
:(.text+0xaaa8): undefined reference to `grsoT7'
:(.text+0xaaba): undefined reference to `grsoT6'
:(.text+0xaac2): undefined reference to `grsoT7'
:(.text+0xab30): undefined reference to `grsoT0'
:(.text+0xab38): undefined reference to `grsoT1'
:(.text+0xab4a): undefined reference to `grsoT0'
:(.text+0xab52): undefined reference to `grsoT1'
:(.text+0xab64): undefined reference to `grsoT2'
:(.text+0xab6c): undefined reference to `grsoT3'
:(.text+0xab7e): undefined reference to `grsoT2'
:(.text+0xab86): undefined reference to `grsoT3'
:(.text+0xab98): undefined reference to `grsoT4'
:(.text+0xaba0): undefined reference to `grsoT5'
:(.text+0xabb2): undefined reference to `grsoT4'
:(.text+0xabba): undefined reference to `grsoT5'
:(.text+0xabd0): undefined reference to `grsoT6'
:(.text+0xabd8): undefined reference to `grsoT7'
:(.text+0xabea): undefined reference to `grsoT6'
:(.text+0xabf2): undefined reference to `grsoT7'
:(.text+0xac00): undefined reference to `grsoT0'
:(.text+0xac08): undefined reference to `grsoT1'
:(.text+0xac1a): undefined reference to `grsoT0'
:(.text+0xac22): undefined reference to `grsoT1'
:(.text+0xac34): undefined reference to `grsoT2'
:(.text+0xac3c): undefined reference to `grsoT3'
:(.text+0xac4e): undefined reference to `grsoT2'
:(.text+0xac56): undefined reference to `grsoT3'
:(.text+0xac68): undefined reference to `grsoT4'
:(.text+0xac70): undefined reference to `grsoT5'
:(.text+0xac82): undefined reference to `grsoT4'
:(.text+0xac8a): undefined reference to `grsoT5'
:(.text+0xaca0): undefined reference to `grsoT6'
:(.text+0xaca8): undefined reference to `grsoT7'
:(.text+0xacba): undefined reference to `grsoT6'
:(.text+0xacc2): undefined reference to `grsoT7'
/tmp/ccmS1O9H.ltrans19.ltrans.o: In function `grsoP1024ASM':
:(.text+0xadbb): undefined reference to `grsoT0'
:(.text+0xadc3): undefined reference to `grsoT1'
:(.text+0xadd5): undefined reference to `grsoT0'
:(.text+0xaddd): undefined reference to `grsoT1'
:(.text+0xadef): undefined reference to `grsoT2'
:(.text+0xadf7): undefined reference to `grsoT3'
:(.text+0xae09): undefined reference to `grsoT2'
:(.text+0xae11): undefined reference to `grsoT3'
:(.text+0xae23): undefined reference to `grsoT4'
:(.text+0xae2b): undefined reference to `grsoT5'
:(.text+0xae3d): undefined reference to `grsoT4'
:(.text+0xae45): undefined reference to `grsoT5'
:(.text+0xae57): undefined reference to `grsoT6'
:(.text+0xae5f): undefined reference to `grsoT7'
:(.text+0xae6d): undefined reference to `grsoT6'
:(.text+0xae75): undefined reference to `grsoT7'
:(.text+0xae9b): undefined reference to `grsoT0'
:(.text+0xaea3): undefined reference to `grsoT1'
:(.text+0xaeb5): undefined reference to `grsoT0'
:(.text+0xaebd): undefined reference to `grsoT1'
:(.text+0xaecf): undefined reference to `grsoT2'
:(.text+0xaed7): undefined reference to `grsoT3'
:(.text+0xaee9): undefined reference to `grsoT2'
:(.text+0xaef1): undefined reference to `grsoT3'
:(.text+0xaf03): undefined reference to `grsoT4'
:(.text+0xaf0b): undefined reference to `grsoT5'
:(.text+0xaf1d): undefined reference to `grsoT4'
:(.text+0xaf25): undefined reference to `grsoT5'
:(.text+0xaf37): undefined reference to `grsoT6'
:(.text+0xaf3f): undefined reference to `grsoT7'
:(.text+0xaf4d): undefined reference to `grsoT6'
:(.text+0xaf55): undefined reference to `grsoT7'
:(.text+0xaf7b): undefined reference to `grsoT0'
:(.text+0xaf83): undefined reference to `grsoT1'
:(.text+0xaf95): undefined reference to `grsoT0'
:(.text+0xaf9d): undefined reference to `grsoT1'
:(.text+0xafaf): undefined reference to `grsoT2'
:(.text+0xafb7): undefined reference to `grsoT3'
:(.text+0xafc9): undefined reference to `grsoT2'
:(.text+0xafd1): undefined reference to `grsoT3'
:(.text+0xafe3): undefined reference to `grsoT4'
:(.text+0xafeb): undefined reference to `grsoT5'
:(.text+0xaffd): undefined reference to `grsoT4'
:(.text+0xb005): undefined reference to `grsoT5'
:(.text+0xb017): undefined reference to `grsoT6'
:(.text+0xb01f): undefined reference to `grsoT7'
:(.text+0xb02d): undefined reference to `grsoT6'
:(.text+0xb035): undefined reference to `grsoT7'
:(.text+0xb061): undefined reference to `grsoT0'
:(.text+0xb069): undefined reference to `grsoT1'
:(.text+0xb07b): undefined reference to `grsoT0'
:(.text+0xb083): undefined reference to `grsoT1'
:(.text+0xb095): undefined reference to `grsoT2'
:(.text+0xb09d): undefined reference to `grsoT3'
:(.text+0xb0af): undefined reference to `grsoT2'
:(.text+0xb0b7): undefined reference to `grsoT3'
:(.text+0xb0c9): undefined reference to `grsoT4'
:(.text+0xb0d1): undefined reference to `grsoT5'
:(.text+0xb0e3): undefined reference to `grsoT4'
:(.text+0xb0eb): undefined reference to `grsoT5'
:(.text+0xb0fd): undefined reference to `grsoT6'
:(.text+0xb105): undefined reference to `grsoT7'
:(.text+0xb113): undefined reference to `grsoT6'
:(.text+0xb11b): undefined reference to `grsoT7'
:(.text+0xb146): undefined reference to `grsoT0'
:(.text+0xb14e): undefined reference to `grsoT1'
:(.text+0xb160): undefined reference to `grsoT0'
:(.text+0xb168): undefined reference to `grsoT1'
:(.text+0xb17a): undefined reference to `grsoT2'
:(.text+0xb182): undefined reference to `grsoT3'
:(.text+0xb194): undefined reference to `grsoT2'
:(.text+0xb19c): undefined reference to `grsoT3'
:(.text+0xb1ae): undefined reference to `grsoT4'
:(.text+0xb1b6): undefined reference to `grsoT5'
:(.text+0xb1c8): undefined reference to `grsoT4'
:(.text+0xb1d0): undefined reference to `grsoT5'
:(.text+0xb1e2): undefined reference to `grsoT6'
:(.text+0xb1ea): undefined reference to `grsoT7'
:(.text+0xb1f8): undefined reference to `grsoT6'
:(.text+0xb200): undefined reference to `grsoT7'
:(.text+0xb22c): undefined reference to `grsoT0'
:(.text+0xb234): undefined reference to `grsoT1'
:(.text+0xb246): undefined reference to `grsoT0'
:(.text+0xb24e): undefined reference to `grsoT1'
:(.text+0xb260): undefined reference to `grsoT2'
:(.text+0xb268): undefined reference to `grsoT3'
:(.text+0xb27a): undefined reference to `grsoT2'
:(.text+0xb282): undefined reference to `grsoT3'
:(.text+0xb294): undefined reference to `grsoT4'
:(.text+0xb29c): undefined reference to `grsoT5'
:(.text+0xb2ae): undefined reference to `grsoT4'
:(.text+0xb2b6): undefined reference to `grsoT5'
:(.text+0xb2c8): undefined reference to `grsoT6'
:(.text+0xb2d0): undefined reference to `grsoT7'
:(.text+0xb2de): undefined reference to `grsoT6'
:(.text+0xb2e6): undefined reference to `grsoT7'
:(.text+0xb311): undefined reference to `grsoT0'
:(.text+0xb319): undefined reference to `grsoT1'
:(.text+0xb32b): undefined reference to `grsoT0'
:(.text+0xb333): undefined reference to `grsoT1'
:(.text+0xb345): undefined reference to `grsoT2'
:(.text+0xb34d): undefined reference to `grsoT3'
:(.text+0xb35f): undefined reference to `grsoT2'
:(.text+0xb367): undefined reference to `grsoT3'
:(.text+0xb379): undefined reference to `grsoT4'
:(.text+0xb381): undefined reference to `grsoT5'
:(.text+0xb393): undefined reference to `grsoT4'
:(.text+0xb39b): undefined reference to `grsoT5'
:(.text+0xb3ad): undefined reference to `grsoT6'
:(.text+0xb3b5): undefined reference to `grsoT7'
:(.text+0xb3c3): undefined reference to `grsoT6'
:(.text+0xb3cb): undefined reference to `grsoT7'
:(.text+0xb3d9): undefined reference to `grsoT0'
:(.text+0xb3e1): undefined reference to `grsoT1'
:(.text+0xb3f3): undefined reference to `grsoT0'
:(.text+0xb3fb): undefined reference to `grsoT1'
:(.text+0xb40d): undefined reference to `grsoT2'
:(.text+0xb415): undefined reference to `grsoT3'
:(.text+0xb427): undefined reference to `grsoT2'
:(.text+0xb42f): undefined reference to `grsoT3'
:(.text+0xb441): undefined reference to `grsoT4'
:(.text+0xb449): undefined reference to `grsoT5'
:(.text+0xb45b): undefined reference to `grsoT4'
:(.text+0xb463): undefined reference to `grsoT5'
:(.text+0xb475): undefined reference to `grsoT6'
:(.text+0xb47d): undefined reference to `grsoT7'
:(.text+0xb48b): undefined reference to `grsoT6'
:(.text+0xb493): undefined reference to `grsoT7'
collect2: error: ld returned 1 exit status
Makefile:1292: recipe for target 'cpuminer' failed
make[2]: *** [cpuminer] Error 1
make[2]: Leaving directory '/root/cpuminer-opt-sse'
Makefile:3320: recipe for target 'all-recursive' failed
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory '/root/cpuminer-opt-sse'
Makefile:658: recipe for target 'all' failed
make: *** [all] Error 2

I solved it by deleting almost all the assembler from algo/groestl/sse2/grso-asm.c.

CPU architecture selection is made at compile time. If you do a native compile on a CPU that supports AVX2 you can not run it
on a CPU with only AVX.  If you want to cross compile you must specify the arch of the target CPU, and produce seperate executables
for each desired architecture.

Of course you are perfectly right (although cross-compiling is when a different host/target is desired, not just a different cpu type), I just got a bit confused because I saw some remarks "choose runtime" in the source code so I thought maybe there is some decision at runtime.

So I have 3 binaries now for sse/avx/avx2. I made three directories and launch the corresponding binary like this:
Code:
grep -q avx /proc/cpuinfo && feat="avx" || feat="sse" && grep -q avx2 /proc/cpuinfo && feat="avx2"
/somewhere/cpuminer-opt-$feat/cpuminer ...

SSE and AVX versions are on par with tpruvot's speed-wise but when I compile them with tpruvot's uncommented build.sh they seem a tiny bit faster but anyway they have those align flags so I stay with it.

My logic for AVX2 isn't fully implemented yet in the capablilities checks, had it been it would have
displayed a message warning of the impending crash, then crashed. This is what you should see when implemented:

Code:
CPU features: SSE2 AES AVX
SW built on Aug  5 2016 with GCC 5.4.0
SW features: SSE2 AES AVX AVX2
Algo features: SSE2 AES AVX AVX2
[color=red]Unsupported CPU or SW configuration, miner will likely crash![/color]
Illegal instruction (core dumped)

It's theoretically possibly that the binary would crash even before this print if gcc decided to use some of those "illegal" instructions in the code preceding the print.
legendary
Activity: 1470
Merit: 1114
legendary
Activity: 1470
Merit: 1114
Success!

[snip]

So I will be using joblo's cpuminer with tpruvot's (uncommented) build.sh because that build.sh has all those other flags (including -falign-*) which may or may not matter, so just to be safe..


EDIT: when I took the avx2 binary and tried to run it on a avx cpu I got this:
Code:
CPU:       Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
CPU features: SSE2 AES AVX
SW built on Aug  5 2016 with GCC 5.4.0
SW features: SSE2 AES AVX AVX2
Algo features: SSE2 AES AVX AVX2
Start mining with SSE2 AES AVX

Illegal instruction (core dumped)

But wasn't the whole idea that all the cpu features will be compiled in and what particular feature shall be used will be determined at the runtime? It's not a big deal, I just recompiled it and I will have two versions (avx and avx2) and run the one that's appropriate to the cpu. Just I thought I would report this.

Excellent work. The easiest way to block the compile error is to comment out the source dir for argon2 and remove the registration
call for argon2 in algo-gate-api.c:register_algo_gate. You can easilly remove any algo this way.

You have demonstrated that LTO improves performance with the new compiler but has some incompatibilities with the existing
argon2 code. I will investigate argon2 to try to solve it.

CPU architecture selection is made at compile time. If you do a native compile on a CPU that supports AVX2 you can not run it
on a CPU with only AVX.  If you want to cross compile you must specify the arch of the target CPU, and produce seperate executables
for each desired architecture.

My logic for AVX2 isn't fully implemented yet in the capablilities checks, had it been it would have
displayed a message warning of the impending crash, then crashed. This is what you should see when implemented:

Code:
CPU features: SSE2 AES AVX
SW built on Aug  5 2016 with GCC 5.4.0
SW features: SSE2 AES AVX AVX2
Algo features: SSE2 AES AVX AVX2
[color=red]Unsupported CPU or SW configuration, miner will likely crash![/color]
Illegal instruction (core dumped)

legendary
Activity: 1470
Merit: 1114
So when is the Windows bin out?  Huh

Cryptomining Blog have usually been good producing binaries within a few hours of release.
I'm sure why not this time. You could ask.

I can't build distributable Windows binaries but mingw works to compile your own, instructions in README.md
Jump to: