Pages:
Author

Topic: An (even more) optimized version of cpuminer (pooler's cpuminer, CPU-only) - page 68. (Read 1958546 times)

full member
Activity: 128
Merit: 100
Pooler,

 Don't you think that NVidia CUDA can run some new assembly code for Scrypt more fast than a CPU?
 I'm asking this because I see that some guys are working on a open source miner for SolidCoin (a.k.a Shitcoin) that run on CUDA...

Best!
Thiago

I might be wrong, but doesn't SolidCoin use a different algorithm?
Apart from that, I'm not a GPGPU expert in any way, but I don't think computing Scrypt on a GPU would be efficient, at least not with the hardware that is available at the moment.
They do use a different algorithm, funny enough I remember asking him if it used scrypt back on IRC when they launched SC2 since it was supposed to be cpu only and he said yes.
legendary
Activity: 2114
Merit: 1031
Intel Core 2 Duo P8700 went from 1.15 khash/s to 3.4khash/s!
sr. member
Activity: 278
Merit: 250
The Pentium D's are running fresh installs of Ubuntu 10.04 64-bit, up to date "aptitude dist-upgrade" and rebooted.  

The artforz miner was pulled from git yesterday with "git pull https://github.com/ArtForz/cpuminer" as was pooler's "git pull https://github.com/pooler/cpuminer"  Both were compiled with CFLAGS="-march=native -O3 -Wall -msse2"

I even tried copying over the binaries that were compiled on the sempron and xeon boxes but got the same results.

I'm thinking there's either something about the P4D's that makes them bad at scrypt, or I've got a bios or OS setting messed up somewhere.

It seems odd that they are 2.8 GHz dual-core chips with each thread doing exactly half it's GHz in khash/s.  Maybe there is some instruction that takes two clock cycles which only takes one in newer chips?

If I can get some time, I might go try Windows and/or Ubuntu 11.10 on one of them to see if it makes a difference.

Yes, I would like to see the performance in 32-bit mode. The Pentium D's were very early 64-bit cpus, and are not as good at SSE as later Core-based models, but I expected them to get some improvement from the new code.

I put Ubuntu 11.10 i386 on one of the Pentium D boxes and it's not good - it only sees one core, with ArtForz's miner doing 0.84 kh/s and pooler's miner running at 1.26 kh/s

There might be an smp kernel, but I'd rather go back to the amd64 build and just count my blessings at 1.48 kh/s x2.
hero member
Activity: 842
Merit: 507
Pooler,

 Don't you think that NVidia CUDA can run some new assembly code for Scrypt more fast than a CPU?
 I'm asking this because I see that some guys are working on a open source miner for SolidCoin (a.k.a Shitcoin) that run on CUDA...

Best!
Thiago

I might be wrong, but doesn't SolidCoin use a different algorithm?
Apart from that, I'm not a GPGPU expert in any way, but I don't think computing Scrypt on a GPU would be efficient, at least not with the hardware that is available at the moment.
hero member
Activity: 742
Merit: 500
Went from 3.5 khash/sec to 4.7 on my 3.0GHz AMD Athlon II X2 250 Processor
legendary
Activity: 1204
Merit: 1000
฿itcoin: Currency of Resistance!
Pooler,

 Don't you think that NVidia CUDA can run some new assembly code for Scrypt more fast than a CPU?
 I'm asking this because I see that some guys are working on a open source miner for SolidCoin (a.k.a Shitcoin) that run on CUDA...

Best!
Thiago
hero member
Activity: 842
Merit: 507
64-bit miner should work 50% faster if u calc 2 hashes at once. To get this bonus u should double ur code in the following maneur:

1st SSE instruction that calc hash #1 (using xmm0-xmm7)
1st SSE instruction that calc hash #2 (using xmm8-xmm15)
2nd SSE instruction that calc hash #1 (using xmm0-xmm7)
2nd SSE instruction that calc hash #2 (using xmm8-xmm15)
...
...
Nth SSE instruction that calc hash #1 (using xmm0-xmm7)
Nth SSE instruction that calc hash #2 (using xmm8-xmm15)

Thank you for the suggestion, I'll try to implement something like that as soon as I find some time. I'm currently trying to fix a couple bugs already present in the old minerd.
legendary
Activity: 2142
Merit: 1010
Newbie
64-bit miner should work 50% faster if u calc 2 hashes at once. To get this bonus u should double ur code in the following maneur:

1st SSE instruction that calc hash #1 (using xmm0-xmm7)
1st SSE instruction that calc hash #2 (using xmm8-xmm15)
2nd SSE instruction that calc hash #1 (using xmm0-xmm7)
2nd SSE instruction that calc hash #2 (using xmm8-xmm15)
...
...
Nth SSE instruction that calc hash #1 (using xmm0-xmm7)
Nth SSE instruction that calc hash #2 (using xmm8-xmm15)
hero member
Activity: 842
Merit: 507
The Pentium D's are running fresh installs of Ubuntu 10.04 64-bit, up to date "aptitude dist-upgrade" and rebooted.  

The artforz miner was pulled from git yesterday with "git pull https://github.com/ArtForz/cpuminer" as was pooler's "git pull https://github.com/pooler/cpuminer"  Both were compiled with CFLAGS="-march=native -O3 -Wall -msse2"

I even tried copying over the binaries that were compiled on the sempron and xeon boxes but got the same results.

I'm thinking there's either something about the P4D's that makes them bad at scrypt, or I've got a bios or OS setting messed up somewhere.

It seems odd that they are 2.8 GHz dual-core chips with each thread doing exactly half it's GHz in khash/s.  Maybe there is some instruction that takes two clock cycles which only takes one in newer chips?

If I can get some time, I might go try Windows and/or Ubuntu 11.10 on one of them to see if it makes a difference.

Yes, I would like to see the performance in 32-bit mode. The Pentium D's were very early 64-bit cpus, and are not as good at SSE as later Core-based models, but I expected them to get some improvement from the new code.
sr. member
Activity: 278
Merit: 250
I have a handful of boxes with Pentium D 2.8 GHz cpu's (820's I think) and this version of cpuminer is actually slower than artforz's

I'm not sure what's going on with these boxes, but artforz's runs at 1.48 khash/s/thread (two threads) while pooler's is only doing 1.40  I know its a tiny difference, but still seems strange to me.  I've tried recompiling each with varying CFLAGS and seen no change.

This is under ubuntu 10.04 LTS completely updated.

Any one else seen this or have ideas?

Uhm, bizarre. I have never worked with Pentium D's, so... let me have a look at Wikipedia... ok, these basically seem to be 64-bit-enabled dual-core Pentium 4's (i.e. Netburst arch).
Judging from the results, I guess you are running a 32-bit environment, but still I don't understand how the new version could be slower.
Anyone else with a Pentium D can confirm this issue?

I'll check on a work machine in a bit, but I'm not sure which artforz miner he was using. I'm only going to see what the new kh/s is with a 32 bit os and miner.

The Pentium D's are running fresh installs of Ubuntu 10.04 64-bit, up to date "aptitude dist-upgrade" and rebooted.  

The artforz miner was pulled from git yesterday with "git pull https://github.com/ArtForz/cpuminer" as was pooler's "git pull https://github.com/pooler/cpuminer"  Both were compiled with CFLAGS="-march=native -O3 -Wall -msse2"

I even tried copying over the binaries that were compiled on the sempron and xeon boxes but got the same results.

I'm thinking there's either something about the P4D's that makes them bad at scrypt, or I've got a bios or OS setting messed up somewhere.

It seems odd that they are 2.8 GHz dual-core chips with each thread doing exactly half it's GHz in khash/s.  Maybe there is some instruction that takes two clock cycles which only takes one in newer chips?

If I can get some time, I might go try Windows and/or Ubuntu 11.10 on one of them to see if it makes a difference.





sr. member
Activity: 252
Merit: 250
Here are my results:

Intel(R) Core(TM) i7 CPU       Q 740  @ 1.73GHz

8 threads
0,8 khash/thread => 2 khash/thread

Intel(R) Xeon(R) CPU           X3430  @ 2.40GHz

4 threads
2,2 khash/thread => 3,6 khash/thread

and just for the fun of it:

AMD Sempron(tm) Processor 3200+

2 threads
0,8 khash/thread => 0,9 khash/thread

Optimal CFLAGS determined with this method: http://blog.mybox.ro/2011/11/02/how-to-recompile-software-with-hardware-optimizations/
sr. member
Activity: 413
Merit: 250
I have a handful of boxes with Pentium D 2.8 GHz cpu's (820's I think) and this version of cpuminer is actually slower than artforz's

I'm not sure what's going on with these boxes, but artforz's runs at 1.48 khash/s/thread (two threads) while pooler's is only doing 1.40  I know its a tiny difference, but still seems strange to me.  I've tried recompiling each with varying CFLAGS and seen no change.

This is under ubuntu 10.04 LTS completely updated.

Any one else seen this or have ideas?

Uhm, bizarre. I have never worked with Pentium D's, so... let me have a look at Wikipedia... ok, these basically seem to be 64-bit-enabled dual-core Pentium 4's (i.e. Netburst arch).
Judging from the results, I guess you are running a 32-bit environment, but still I don't understand how the new version could be slower.
Anyone else with a Pentium D can confirm this issue?

I'll check on a work machine in a bit, but I'm not sure which artforz miner he was using. I'm only going to see what the new kh/s is with a 32 bit os and miner.
sr. member
Activity: 352
Merit: 250
Firstbits: 1m8xa
Holy crap.

My hashrate per thread jumped from 1.7 khash/sec per thread to 3.2 khash/sec per thread. Shocked

And with full firepower it's now 13 khash/sec in total. Amazing!
hero member
Activity: 842
Merit: 507
I have a handful of boxes with Pentium D 2.8 GHz cpu's (820's I think) and this version of cpuminer is actually slower than artforz's

I'm not sure what's going on with these boxes, but artforz's runs at 1.48 khash/s/thread (two threads) while pooler's is only doing 1.40  I know its a tiny difference, but still seems strange to me.  I've tried recompiling each with varying CFLAGS and seen no change.

This is under ubuntu 10.04 LTS completely updated.

Any one else seen this or have ideas?

Uhm, bizarre. I have never worked with Pentium D's, so... let me have a look at Wikipedia... ok, these basically seem to be 64-bit-enabled dual-core Pentium 4's (i.e. Netburst arch).
Judging from the results, I guess you are running a 32-bit environment, but still I don't understand how the new version could be slower.
Anyone else with a Pentium D can confirm this issue?
legendary
Activity: 2142
Merit: 1010
Newbie
It seems to me that speed boost is higher for old machines than for new ones. If i'm right then it's very good feature. It helps those who own obsolete computers to compete with the others.
hero member
Activity: 518
Merit: 500
Im seeing very impressive speedups on Core 2 duos and quads. Less impressive on my AMD machines and an old P4. unfortunately this speed bump course does little to increase profitability of litecoin mining, which is currently pretty much non existent, since everyone will upgrade heh. But great job tweaking the code!
newbie
Activity: 28
Merit: 0
Wow thank you!

My Macbook Pro i5 M 2,53 had 1,18 kh/s x 4 = 4,72 now it's got 2,9 kh/s x4 = 11,6 !

Amazing!

Edit: When I think about it maybe i build something wrong with the first minerd. I'm new to such things. Now I used the prebuilt binary.
mrx
member
Activity: 86
Merit: 10
Test results:

Linux x86-64, Intel Xeon, before (artforz, modified speed output):
Code:
[2011-12-20 13:02:41] thread 6: 15074 hashes, 3.01210 khash/sec
[2011-12-20 13:02:42] thread 7: 14079 hashes, 2.96454 khash/sec
[2011-12-20 13:02:42] thread 0: 14959 hashes, 2.99386 khash/sec
[2011-12-20 13:02:44] thread 1: 14920 hashes, 2.97748 khash/sec
[2011-12-20 13:02:44] thread 3: 14619 hashes, 2.87325 khash/sec
[2011-12-20 13:02:45] thread 2: 14765 hashes, 2.93248 khash/sec
[2011-12-20 13:02:45] thread 4: 15090 hashes, 3.01685 khash/sec
[2011-12-20 13:02:45] thread 5: 15079 hashes, 2.89249 khash/sec
[2011-12-20 13:02:46] thread 6: 15061 hashes, 3.01753 khash/sec


after (modified speed output):
Code:
[2011-12-20 13:06:44] thread 5: 20551 hashes, 4.28743 khash/s
[2011-12-20 13:06:45] thread 0: 21568 hashes, 4.31442 khash/s
[2011-12-20 13:06:45] thread 1: 20840 hashes, 4.18909 khash/s
[2011-12-20 13:06:46] thread 6: 21690 hashes, 4.33446 khash/s
[2011-12-20 13:06:48] thread 2: 21572 hashes, 4.30622 khash/s
[2011-12-20 13:06:49] thread 3: 21128 hashes, 4.27796 khash/s
[2011-12-20 13:06:49] thread 7: 21588 hashes, 4.25990 khash/s
[2011-12-20 13:06:49] thread 5: 21438 hashes, 4.32439 khash/s
[2011-12-20 13:06:50] thread 4: 21709 hashes, 3.77865 khash/s

Windows 32-bit, Intel Core 2 Duo, before(amdfam10-sse4a):
Code:
[2011-12-20 13:15:10] thread 1: 6553 hashes, 1.40 khash/sec
[2011-12-20 13:15:10] thread 0: 6553 hashes, 1.38 khash/sec

after:
Code:
[2011-12-20 13:17:05] thread 0: 16422 hashes, 3.49 khash/s
[2011-12-20 13:17:06] thread 1: 16346 hashes, 3.46 khash/s

Windows 32-bit, AMD Phenom II X4, before(amdfam10-sse4a):
Code:
[2011-12-20 13:22:01] thread 1: 9101 hashes, 1.70 khash/sec
[2011-12-20 13:22:04] thread 0: 6965 hashes, 1.76 khash/sec
[2011-12-20 13:22:04] thread 3: 9362 hashes, 1.87 khash/sec
[2011-12-20 13:22:05] thread 2: 8364 hashes, 1.62 khash/sec

after:
Code:
[2011-12-20 13:28:24] thread 1: 12141 hashes, 2.39 khash/s
[2011-12-20 13:28:24] thread 0: 11528 hashes, 2.31 khash/s
[2011-12-20 13:28:24] thread 2: 12009 hashes, 2.45 khash/s
[2011-12-20 13:28:24] thread 3: 11708 hashes, 2.35 khash/s


Splendid!
sr. member
Activity: 278
Merit: 250
I have a handful of boxes with Pentium D 2.8 GHz cpu's (820's I think) and this version of cpuminer is actually slower than artforz's

I'm not sure what's going on with these boxes, but artforz's runs at 1.48 khash/s/thread (two threads) while pooler's is only doing 1.40  I know its a tiny difference, but still seems strange to me.  I've tried recompiling each with varying CFLAGS and seen no change.

This is under ubuntu 10.04 LTS completely updated.

Any one else seen this or have ideas?

hero member
Activity: 630
Merit: 500
Posts: 69
Saying nothing new here, but amazing.
Pages:
Jump to: