Author

Topic: AMD FX-6100 Bulldozer - Does it suck for CPU scrypt mining? Yes! [Benchmarks] (Read 8196 times)

donator
Activity: 1218
Merit: 1079
Gerald Davis
No again i disagree non of the new instruction sets used for the amd are in a miner yet .... so its still not certain

Then don't accept it however none of AMD new instructions support integer math.  None are applicable to scrypt or SHA-256.  So there never is going to be a new miner which makes Bulldozer faster (on a relative basis) due to using floating point instructions.   

Maybe if Bulldozer II adds future support for AVX integer ops using the SIMD register that might be enough but that would be a future chip with future instruction sets not this one.
hero member
Activity: 774
Merit: 500
Lazy Lurker Reads Alot
Um the thread was:

"AMD FX-6100 Bulldozer - Does it suck for CPU scrypt mining?  Yes!"

Thus potential future floating point improvement isn't going to change that.

No again i disagree non of the new instruction sets used for the amd are in a miner yet .... so its still not certain
And no you can not use a i7 version on the amd or the other way around it ends in failure or invalid blocks
When a miner get build with all the instruction sets i will accept that answer before that no way

I can run the i7 version on my amd and it then shows insane values ... but i never believe my old x3-720 doing insane 7.2 Khash/s a second.... its simply impossible
hero member
Activity: 774
Merit: 500
Lazy Lurker Reads Alot
In my mini-review I did mention that new software could change things.  I was using ArtForz's minerd compiled with the Wall flag.  That includes SSE4 and AVX on the Intel processor.  But you are right that there are other instruction set extensions that could be made use of.

I'm guessing your Q6600 does 1400 hash/sec, not 1400 K hash/sec.  How are you getting 28 kh/s from an X3?  I can't get more than 3.3 kh/s per core out of my X6 at a much higher clock speed.
I use the minerd version with sse4a instructions set i have to search where i found it
The official name for the version i found is for the amd : minerd-amdfam10-sse4a-64.exe

Sorry for the typo in the speed i made an error on that indeed its exactly PER CORE i assume:
intel q6600 @3.2 Ghz :  1.41  khash/s
x3-720 original speed @ 2.8 Ghz :  2.82 Khash/s
so your much faster x6: 3.30 khash/s ofcourse beats my oldy xD
donator
Activity: 1218
Merit: 1079
Gerald Davis
Um the thread was:

"AMD FX-6100 Bulldozer - Does it suck for CPU scrypt mining?  Yes!"

Thus potential future floating point improvement isn't going to change that.
hero member
Activity: 774
Merit: 500
Lazy Lurker Reads Alot
lol
your right those where not intended nor usefull for mining but i am just stating the overall weight of the cpu, it gets burned as a failure allready while it is in my view not been tested enough yet.
 
Its is being burned in every post on the net while it just must be compared to a 2500k intel, and yes i still do not think its all that bad.
Sure the X6 is funny enough a competition for the bulldozer, but does the X6 have the advanced instruction sets .... no.
So i am just saying its not really so bad a cpu in my view yet Cheesy, i am hoping the next version will use a bit less juice when it does i am gonna buy one for sure. I expect it to become a bit cheaper soon as well, so in all its not a i7 killer and i think it will not be anything soon as well.
But hopefully it can be usefull in some way or can amaze us at some other task, i dare not expect it run much better under win 8

Why you ask!, do i keep away from being all negative about AMD simply because everything made or done is intel/nvidia minded or made by them
Even microsoft its developing software just concentrates on the products from intel ... think what happens when amd is gone ...
You are gonna pay a super price for the same cpu for many years if that happens, even stronger i dare to say we would still use pentium 4 cpu's if amd would not have made intel wake up from time to time.
For me its important that AMD will stay in the race even though it did not made the killer we wanted yet, and even if they do the might of intel enormous amount of money will make sure they beat it again count on that.
Simply said its the flee who wakes up the giants once in a while and this giant needs that to keep us happy campers
donator
Activity: 1218
Merit: 1079
Gerald Davis
s soon as we get a miner which uses the sse4.1, sse4.2, avx and last but not least AES, CLMUL, and as well as future instruction sets proposed by AMD (XOP and FMA4), i am not sure if amd put in FMAx into this Fx yet
But to make a long story short the race has to be made yet to come, if some of the cpu programmer guru out here gets their hands on the sets i fear the outcome might change a lot.

How do you use floating point instructions to increase integer performance?
How do you use AES instructions to accelerate a non-AES cipher?
How do you use finite field instructions to speed up integer performance?
hero member
Activity: 546
Merit: 500
Well lol i am glad you tested it but ... here comes the catch you do not have the right program
First of all non of the current miners use any other then the instruction sets for sse2,sse3,ssse3 and only one i see using sse4a which is minerd which blows away my q6600 at 3.2 Ghz which only do 1400 Khash the amd x3-720 with 2.8 Ghz is 2800 khash..... just by using sse4a.
As soon as we get a miner which uses the sse4.1, sse4.2, avx and last but not least AES, CLMUL, and as well as future instruction sets proposed by AMD (XOP and FMA4), i am not sure if amd put in FMAx into this Fx yet
But to make a long story short the race has to be made yet to come, if some of the cpu programmer guru out here gets their hands on the sets i fear the outcome might change a lot.
I just ask you to see how what the i7-avx version does from minerd .... since i do not have a i7... yet
But for me its was instant surprise that my old x3 beats my much faster intel by so much PER CORE more then double the speed. now ofcourse these are oldies and do not compete with the monsters like the x6 or i7
So before burn down the bulldozer we should see if any programmer will get us the advanced instruction sets like they did on the i7
If that has been done and the bulldozer does not beat a 2500 intel then i consider the bulldozer a fail as well, and i still wanna see what FMA4 is going todo since this is going to make a huge dent in calculation on cpu is my guess.

In my mini-review I did mention that new software could change things.  I was using ArtForz's minerd compiled with the Wall flag.  That includes SSE4 and AVX on the Intel processor.  But you are right that there are other instruction set extensions that could be made use of.

I'm guessing your Q6600 does 1400 hash/sec, not 1400 K hash/sec.  How are you getting 28 kh/s from an X3?  I can't get more than 3.3 kh/s per core out of my X6 at a much higher clock speed.
hero member
Activity: 774
Merit: 500
Lazy Lurker Reads Alot
Well lol i am glad you tested it but ... here comes the catch you do not have the right program
First of all non of the current miners use any other then the instruction sets for sse2,sse3,ssse3 and only one i see using sse4a which is minerd which blows away my q6600 at 3.2 Ghz which only do 1400 Khash the amd x3-720 with 2.8 Ghz is 2800 khash..... just by using sse4a.
As soon as we get a miner which uses the sse4.1, sse4.2, avx and last but not least AES, CLMUL, and as well as future instruction sets proposed by AMD (XOP and FMA4), i am not sure if amd put in FMAx into this Fx yet
But to make a long story short the race has to be made yet to come, if some of the cpu programmer guru out here gets their hands on the sets i fear the outcome might change a lot.
I just ask you to see how what the i7-avx version does from minerd .... since i do not have a i7... yet
But for me its was instant surprise that my old x3 beats my much faster intel by so much PER CORE more then double the speed. now ofcourse these are oldies and do not compete with the monsters like the x6 or i7
So before burn down the bulldozer we should see if any programmer will get us the advanced instruction sets like they did on the i7
If that has been done and the bulldozer does not beat a 2500 intel then i consider the bulldozer a fail as well, and i still wanna see what FMA4 is going todo since this is going to make a huge dent in calculation on cpu is my guess.
donator
Activity: 1218
Merit: 1079
Gerald Davis
Nice catch.  Looking back it appear you are right.  Intel is higher performance per clock per core so 32KB is likely sufficient.   I was getting confused from total performance (Phenom II x6 50% more cores is hard to overcome).

On Fermi thinking about it further I doubt performance would be good enough.   It has enough SP to operate like a 16 "core" processor but that is less than 3x the number of cores that a Phenom II has.  It would also need some very high ALU efficiency (which NVidia isn't well known for when it comes to integer math) because the Tesla has much lower clock speed than Phenom II.

Likely in pure performance the Tesla would outhash Phenom II but  $1400+ pricetag and 225W means it would need insane efficiency (hashes per instruction) to beat Phenom II on either hash/$ or hash/Watt metric. 

On the FPGA angle.... I have no clue but it would be interesting.  The issue is that you would need a lot of cache.  FPGA tend to have low clocks especially relative to their cost so the goal would be to process scrypt in parallel but that requires ~32KB per parallel pipeline.
hero member
Activity: 546
Merit: 500
Cache can be deceptive.

As I indicated in earlier speculation thread scrypt is VERY L1 cache dependent.  

Phenom II has 64KB of L1 data cache per core.
Bulldozer has 16KB of L1 data cache per integer core.

The big question is would scrypt lookup table fit in the L1 cache of Bulldozer.  The high performance on Phenom II (over Intel chips) indicates 64KB is sufficient L1 cache but is 16KB?

Your benchmark just answered that question.

Very interesting analysis.  Based on the benchmarks your speculation appears to be right.  L1 cache size makes a huge difference.  As another data point, my Core i5 2400 running at 3.1GHz (stock) is ~3 kh/s per core, which is slightly faster clock-for-clock than the X6 (3.33kh/s @ 3.6GHz).  Of course there are differences in instruction sets and architecture, so you cannot directly conclude that scrypt will fit in the Core i processors' 32k L1 cache, but it seems that 32KB is probably the floor for efficient processing.

If you were going to do serious scrypt mining, the Tesla GPU might not be that bad of an idea.  If you can get 16x better performance for 10x the price of a CPU, you're coming out on top.

The other question is if any FPGA's support 32kb or more of L1 cache.  Seems like a lot for a FPGA, but I don't know them that well.
donator
Activity: 1218
Merit: 1079
Gerald Davis
From the reviews Bulldozer seemed capable with heavily multithreaded apps in some cases and it boasts much more cache than the anemic cache on the Phenom II.  

Total cache can be deceptive.
As I indicated in earlier speculation thread scrypt is VERY L1 cache dependent.  

While the Bulldozer has more total cache (L1+L2+L3) it has less L1 data cache (L1 cahce is divided into discrete data & instruction caches).

Phenom II has 64KB of L1 data cache per core.
Bulldozer has 16KB of L1 data cache per integer core.

The hypothesis I proposed in the speculation thread was that Bulldozer would do better (8 cores vs 6 cores) if scrypt lookup table would fit in the L1 cache of Bulldozer.  Your benchmark just answered that question.

The larger L2 & L3 cache of Bulldozer is immaterial.  There is a 3 clock cycle latency to L2 cache and 20 (IIRC) clock cycle latency to L3 cache.  It would appear that the scrypt lookup tables can't fit in 16KB thus the CPU is being idled thousands of times per hash waiting for data to SLLLLLLLOOOOOOOOOWWWWWWWWWWWWLLLLLLLLLYYYYYYYYY make it way from L2 -> L1.  L3 cache is likely completely unused for the datasets used by scrypt.

The nice thing is you have shown 16KB of L1 cache is likely insufficient.  We know 64KB is sufficient.   That gives us an upper and lower bounds.


The i5 series CPU have 32KB of L1 cache.  Clock for clock they tend to underperform the Phenom II but still do ok.  My guess is that there may be some cache misses but not too many which allows decent performance.

On edit: looks like I was incorrect.  Clock for clock i3/5/7 series outperforms Pheom II.  Phenom II has higher overall performance but that is due to more cores & higher overclock. That would indicate 32KB is sufficient.


The Fermi GF100 series Tesla cards can be configured to use 48K L1 cache per SM (stream module)*.  Thus it *may* be possible that scrypt could be GPU accelerated.  Granted the high cost of Tesal cards makes benchmarking a very expensive test.  The Teslas in Amazon EC2 are low end w/ only 16KB L1 cache so aren't very interesting.

*The 48KB of L1 Cache on M2050/M2070/M2090 series Teslas is per SM (stream module) a group of 32 SP.  That likely means the card likely isn't parallel enough to justify it's cost.  448 SP is effectively 16 independent SM but the cost ($1400+) doesn't justify really justify only 16x performance increase.

Since you got me thinking I looked up some L1 data cache sizes:
AMD 5xxx/6xxx GPU - 8KB per SIMD (group of 8 SP)
NVidia Fermi based GPU - 16KB per SM (group of 32 SP)
AMD Bulldozer - 16KB per integer core
Intel Core i3/i5/i7 - 32KB per core
NVidia GF100 series Tesla cards - 48KB per SM (group fo 32 SP)
Phenom II - 64KB per core
hero member
Activity: 546
Merit: 500
By now we've all seen the reviews showing that the new AMD Bulldozer processors really disappoint.  But the reviews naturally did not include any benchmarks relevant to the cryptocurrency mining community.  From the reviews Bulldozer seemed capable with heavily multithreaded apps in some cases and it boasts much more cache than the anemic cache on the Phenom II.  So I wondered if it was possible that the bulldozer may not suck at CPU mining and could in fact provide similar processing power at a lower TDP.  So I ordered an FX-6100 and benched it against my Phenom II X6 1075T.  Both are 6-core and both are similarly priced (I paid $169 for the FX 6100 and $159 for the Phenom II X6).

Test Setup:
* AMD Phenom II X6 1075T overclocked to 3.6Ghz
* AMD FX-6100 overclocked to 3.8GHz
* Asus M5A97 AMD FX970 AM3+ motherboard
* Bios 810 (with Bulldozer support), virtualization features enabled
* G.SKILL Ripjaws X Series 8GB (2 x 4GB) DDR3 1333 (PC3 10666)
* Gigabyte 5870OC
* Host OS: Windows 7 Professional x64
* VirtualBox 4.0.14 x64
* Ubuntu 11.04 x64 guest OS
* Guest OS: 2GB RAM, 6 processors, virtualization features enabled

For mining I used ArtForz's CPU miner cloned from git and built with the following configuration (gcc 4.6):

sudo CFLAGS="-march=amdfam10 -O3 -Wall" ./configure

For each test I rebuilt in case there were any differences in instruction sets.  I used the CPU miner to mine Litecoins via the dyndns.org pool.

Results:

AMD Phenom II X6 1075T @ 3.6Ghz: ~3.33 kh/s


AMD FX-6100 @ 3.8GHz: 2.10 kh/s


Summary
The X6 is giving 58% better performance at a lower clock speed.  But the FX-6100 is only a 95W TDP part, you say!  Still no good.  I estimate power usage for the FX at ~110W when overclocked to 3.8GHz.  The Phenom II X6 at 3.6GHz I estimate is using ~150W.  So you are saving 40W.  But, if you calculate it, the FX 6100 is giving you 0.019 kh per Watt versus the X6 giving you 0.022 kh/W.  The X6 is giving you 16% higher performance per watt.

So, there you have it.  Unless new software and BIOS tweaks will result in big boosts, stay away from the bulldozer processors for CPU scrypt mining.

Jump to: