Author

Topic: Faster SHA-256, MSVC build (Read 15634 times)

lfm
full member
Activity: 196
Merit: 104
August 23, 2010, 12:45:29 AM
#29
For a single point of comparison, my Core 2 Duo system consumes 50 watts while computing just a little over 1000 khash/s. VIA Mini-ITX systems may consume a little less than this (note, entire system, not just the mobo + CPU) so they would be somewhat better, but not hugely so.

I measured my VIA-C7 at the plug at 27 watts including 2 hard drives. With the SHA256 instruction support (I added) in bitcoin it gets 1430 khash/s @1.8 GHz so it seems like a pretty good improvement in power efficiency.

The code has improved since my previous post, and my machine now gets about 1800 khash/s at 50 W. Nevertheless, Via C7 remains more efficient. In fact, I am considering getting a Via Nano motherboard, there is even a Mini-ITX model with a full PCIe slot.

The C7 version of bitcoin should work well on the Nano even if it is just a 32 bit mode, not 64. You might recompile it for 64 bit on a Nano and see if it still works right.
sr. member
Activity: 520
Merit: 253
555
August 22, 2010, 05:47:09 AM
#28
For a single point of comparison, my Core 2 Duo system consumes 50 watts while computing just a little over 1000 khash/s. VIA Mini-ITX systems may consume a little less than this (note, entire system, not just the mobo + CPU) so they would be somewhat better, but not hugely so.

I measured my VIA-C7 at the plug at 27 watts including 2 hard drives. With the SHA256 instruction support (I added) in bitcoin it gets 1430 khash/s @1.8 GHz so it seems like a pretty good improvement in power efficiency.

The code has improved since my previous post, and my machine now gets about 1800 khash/s at 50 W. Nevertheless, Via C7 remains more efficient. In fact, I am considering getting a Via Nano motherboard, there is even a Mini-ITX model with a full PCIe slot.
lfm
full member
Activity: 196
Merit: 104
August 21, 2010, 08:37:10 PM
#27
I have a VIA C7 that I'm working to port code over to. Theoretically it won't be faster than multi-core i7/xeons but it will be a very fast hashing engine that require quite low power. From what I've calculated, it would be able to compute about 1500kh/s for < 100W. We won't know until I get it setup and running Cheesy

For a single point of comparison, my Core 2 Duo system consumes 50 watts while computing just a little over 1000 khash/s. VIA Mini-ITX systems may consume a little less than this (note, entire system, not just the mobo + CPU) so they would be somewhat better, but not hugely so.

I measured my VIA-C7 at the plug at 27 watts including 2 hard drives. With the SHA256 instruction support (I added) in bitcoin it gets 1430 khash/s @1.8 GHz so it seems like a pretty good improvement in power efficiency.
newbie
Activity: 4
Merit: 0
July 29, 2010, 12:38:19 AM
#26
Interestingly this build even gives my old Ubuntu Core Duo 32 bit a pretty hefty boost. In Wine no less !
Native client 0.3.3 = 602 kHash/s, MSVC build 0.3.3 (+ Wine) 771 kHash/s.
member
Activity: 84
Merit: 10
July 26, 2010, 12:26:13 PM
#25
For a single point of comparison, my Core 2 Duo system consumes 50 watts while computing just a little over 1000 khash/s. VIA Mini-ITX systems may consume a little less than this (note, entire system, not just the mobo + CPU) so they would be somewhat better, but not hugely so.
VIA declares they have the best performance/watt at the market. Having cryptoaccelerator it could be even better for applications like bitcoin

AMD also has something on their Geode processor. It's even slower then VIA, but the complete system on it consumes about 5 Watt energy. I used to have such router built on top of Alix PC hardware.
sr. member
Activity: 520
Merit: 253
555
July 26, 2010, 11:16:32 AM
#24
I have a VIA C7 that I'm working to port code over to. Theoretically it won't be faster than multi-core i7/xeons but it will be a very fast hashing engine that require quite low power. From what I've calculated, it would be able to compute about 1500kh/s for < 100W. We won't know until I get it setup and running Cheesy

For a single point of comparison, my Core 2 Duo system consumes 50 watts while computing just a little over 1000 khash/s. VIA Mini-ITX systems may consume a little less than this (note, entire system, not just the mobo + CPU) so they would be somewhat better, but not hugely so.
member
Activity: 61
Merit: 10
July 26, 2010, 10:32:03 AM
#23
Nice! I don't know if it would "pay off" the electricity it uses but it would be a great extra node on the network and low power at that. My next project aims to take this even a step further and lower, right into a chip to crunch and run the program.
member
Activity: 84
Merit: 10
July 26, 2010, 10:22:54 AM
#22
falkenberg,

I have a VIA C7 that I'm working to port code over to. Theoretically it won't be faster than multi-core i7/xeons but it will be a very fast hashing engine that require quite low power. From what I've calculated, it would be able to compute about 1500kh/s for < 100W. We won't know until I get it setup and running Cheesy


sgtstein

Sounds cool Wink I also have a VIA processor running my router/bittorrent stuff at 24/7 mode. Having your code I know how to make it busy while it idles Smiley Maybe it will be able to compensate the electricity it consumes :DDD
member
Activity: 61
Merit: 10
July 26, 2010, 09:56:22 AM
#21
falkenberg,

I have a VIA C7 that I'm working to port code over to. Theoretically it won't be faster than multi-core i7/xeons but it will be a very fast hashing engine that require quite low power. From what I've calculated, it would be able to compute about 1500kh/s for < 100W. We won't know until I get it setup and running Cheesy


sgtstein
member
Activity: 84
Merit: 10
July 26, 2010, 06:45:53 AM
#20
Hi guys,
what about the cryptoengines, available on some architectures? Ie PADLock on a VIA processor (yeah, it's a very slow processor, but has padlock instructions)? As far as I know SHA-256 is backed upped by the hardware and must be fast.  Last versions of OpenSSL can benefit from hardware engine, so if it used instead of your own SHA-256 implementation you can accelerate the program without dealing with low-level details.

Regards,
member
Activity: 70
Merit: 10
July 25, 2010, 11:48:33 PM
#19
I was able to integrate the SHA256 functionality from Crypto++ 5.6.0 into Bitcoin.  This is the fastest SHA256 yet using the SSE2 assembly code.  Since Bitcoin was sending unaligned data to the block hash function, I had to change the MOVDQA instruction to MOVDQU.

I think using the SHA256 functionality from Crypto++ 5.6.0 is the way forward right now.

http://www.filedropper.com/bitcoin-033

is this the x86 asm? I dumped out the x64 asm and integrated it and performance has proved to be nothing short of blistering.
legendary
Activity: 1246
Merit: 1016
Strength in numbers
July 25, 2010, 07:55:53 PM
#18
Is it easy for a newb to try this stuff?
newbie
Activity: 53
Merit: 0
July 25, 2010, 06:26:36 PM
#17
Excellent work.

Can you provide patches against current SVN?
newbie
Activity: 17
Merit: 0
July 25, 2010, 06:12:23 PM
#16
I was able to integrate the SHA256 functionality from Crypto++ 5.6.0 into Bitcoin.  This is the fastest SHA256 yet using the SSE2 assembly code.  Since Bitcoin was sending unaligned data to the block hash function, I had to change the MOVDQA instruction to MOVDQU.

I think using the SHA256 functionality from Crypto++ 5.6.0 is the way forward right now.

http://www.filedropper.com/bitcoin-033
newbie
Activity: 53
Merit: 0
July 25, 2010, 04:36:03 AM
#15
Found the culprit. I had left in the  /Ob0 option from the original makefile, which obviously led to the abysmal performance I was getting. With proper settings, the VC++ build is really faster. This tinkering with makefiles is a major hassle, I'm going to suggest converting to CMake in another post.

Regarding the SHA-256 function, we can:

1) leave it as it is - requires no effort, but it seems other solutions provide significant performance benefits;

2) adopt the SHA-256 code from later releases of Crypto++; an asm version is currently available; we have the option to either extract the functionality from modules and integrate it into bitcoin source as currently done, which would not be trivial for all SHA module dependencies, or use the complete Crypto++ library as a dependency. No one has done either yet as far as I know, so it is not clear how much faster will it be. More interestingly, I've skimmed the change logs and there are various fixes in Crypto++'s sha module as well. I'm not certain if there are any serious problems with the code bitcoin is currently using though.

3) Integrate code from PolarSSL, like BlackEye did. He claims a 50% khash/s increase with that code.

If we choose either of the last two options, we need to take great care that the hashing functionality is preserved without change or breaking anything. Unit tests would greatly help here, but for that the sha-invoking code in bitcoin would need to be extracted to a separate unit which can be tested.

I could do that refactoring, create unit tests for the hashing and provide patches. Then we can more freely experiment with upgrading the sha implementation. Satoshi and/or anyone interested, post your thoughts.
newbie
Activity: 17
Merit: 0
July 22, 2010, 02:37:30 PM
#14
Yes, my post clearly says I used the latest BDB which will update the database format, and you should backup your database beforehand.  I just used the latest production release from the Oracle website, and I didn't see any version requirement in the documentation about compiling Bitcoin.  If you look at the BDB release notes, there were plenty of bugs squashed since whatever 4.7.x release Bitcoin is currently using.  What was the rationale of using an outdated version for the official release?

edit
Here's a release statically linked against BDB 4.7.25, so there will be no issues with database versions.
http://www.filedropper.com/bitcoin-032_4
member
Activity: 70
Merit: 10
July 22, 2010, 01:22:15 PM
#13
The project file is attached.  You'll need to remove the txt extension and change the paths to your libraries, and compile the Release build of course, not the Debug one.

If you are using the Express edition to compile, make sure to read this thread on msdn.

bit of a shame you linked it against BerkeleyDB5, that will break everyone's database if they should wish to go back to the stock build.
newbie
Activity: 17
Merit: 0
July 22, 2010, 09:31:56 AM
#12
The project file is attached.  You'll need to remove the txt extension and change the paths to your libraries, and compile the Release build of course, not the Debug one.

If you are using the Express edition to compile, make sure to read this thread on msdn.
newbie
Activity: 53
Merit: 0
July 22, 2010, 08:43:24 AM
#11
I took the plunge and got all the dependencies together and compiled Bitcoin myself to try to get the new hashing in place.  It's odd that you say that a MSVC build decreases the hashing performance, as I've found it increases it.  I'm using Visual Studio 2008 Standard on a 32bit dual core machine, so maybe that has something to do with it.  I went from ~1000khash/sec with the build from the Bitcoin website, to ~1350khash/sec by just compiling the source with Visual Studio 2008, to ~15000khash/sec with Visual Studio 2008 using the PolarSSL hashing functions.

It's baffling. Do you mind posting the makefile/project you used to build the original source?
newbie
Activity: 17
Merit: 0
July 22, 2010, 08:15:01 AM
#10
A 32 bit build running on a 32 bit system with the new algorithm is definitely faster than the base algorithm.  I've ran it on 3 different systems and all showed improvement.
member
Activity: 70
Merit: 10
July 22, 2010, 12:43:39 AM
#9
There was an issue with hashing multiple blocks in the binary above.  I've corrected the source for now, but I won't be able to compile a new binary until tomorrow.  Here's the corrected source for anyone who is interested.
http://www.filedropper.com/bitcoin-032_2

I gave it a try using the x64 Intel compiler with full optimization, performance is practically identical to the stock algorithm, in fact the new algo seems marginally worse.
legendary
Activity: 1596
Merit: 1099
July 21, 2010, 06:15:01 PM
#8
Ideally, bitcoin should be hashing in CPU cacheline-sized, cacheline-aligned chunks (usually 64 bytes).

Also, on modern CPUs, you can issue a "pre-fetch" like this

     while (have bytes)
          pre-fetch(index + 1)
          sha256(index)

which will potentially speed up the operation.

Both Linux and Windows compilers can generate prefetches.  gcc provides builtins for this.
newbie
Activity: 17
Merit: 0
July 21, 2010, 05:21:58 PM
#7
Further examination of the source makes it clear that the hashing is done in blocks of 64 bytes.  I was able to hack the SHA256 functions from PolarSSL and got it to do the block hashing just like the current code, with about a 10% increase in speed.  I verified the hashes produced are the same.

I took the plunge and got all the dependencies together and compiled Bitcoin myself to try to get the new hashing in place.  It's odd that you say that a MSVC build decreases the hashing performance, as I've found it increases it.  I'm using Visual Studio 2008 Standard on a 32bit dual core machine, so maybe that has something to do with it.  I went from ~1000khash/sec with the build from the Bitcoin website, to ~1350khash/sec by just compiling the source with Visual Studio 2008, to ~15000khash/sec with Visual Studio 2008 using the PolarSSL hashing functions.

edit - latest binary 2010-07-22
You can get my build here : http://www.filedropper.com/bitcoin-032_3
You'll need the Visual Studio 2008 redistributable to run this, so if it crashes immediately or complains about an incorrect configuration you need to install this.  It includes the modified sources.  I used the latest bdb, which seems to update the database format, so you can't go back to the old client because it can't open the newer database.  I suggest you save your database before you try this build if you want to revert back later.
member
Activity: 70
Merit: 10
July 20, 2010, 07:30:35 PM
#6
How many bytes does bitcoin typically hash each time?

as far as I can see, it hashes 16 bytes at a time, the number of 16 byte blocks to process is variable, check the main.cpp
newbie
Activity: 17
Merit: 0
July 20, 2010, 06:50:28 PM
#5
How many bytes does bitcoin typically hash each time?

edit
It appears, if I'm looking at the source correctly, that bitcoin does 2 hashes and a bunch of other stuff and only counts it as 1 hash.  Which means polarssl probably isn't faster then.
member
Activity: 70
Merit: 10
July 20, 2010, 06:37:53 PM
#4
You could try the SHA2 implementation from : http://polarssl.org/

With a simple test in Visual Studio 2008 of a busy loop executing the hash function it was able to hash at 1.5x the rate that bitcoin does.

really? I'm struggling to see how that implementation could possibly be faster.
newbie
Activity: 17
Merit: 0
July 20, 2010, 06:20:05 PM
#3
You could try the SHA2 implementation from : http://polarssl.org/

With a simple test in Visual Studio 2008 of a busy loop executing the hash function it was able to hash at 1.5x the rate that bitcoin does.
founder
Activity: 364
Merit: 7060
July 18, 2010, 05:24:09 PM
#2
OpenSSL doesn't have any interface for doing just the low level raw block hash part of SHA256.  SHA256 begins by wrapping your data in a specially formatted buffer.  Setting up the buffer takes an order of magnitude longer than the actual hashing if you're only hashing one or two blocks like we do.  It's intended that the time is amortised if you were hashing many KB or MB of data.  In BitcoinMiner, we format the buffer once and keep reusing it.

If you can find SHA256 code that's faster (with MinGW/GCC) than what we've got, that would be really great!  (although, keep licensing in mind)  The one we have is the only one I tried, so there's significant chance for improvement.

When I wrote it more than 2 years ago, there were screaming hot SHA1 implementations but minimal attention to SHA256.  That's a lot of time for them to come up with better stuff.  SHA256 was a lot slower than the fastest SHA1 at the time than I thought it should be.  Obviously SHA256 should be slower than SHA1 by a certain amount, but not by as much as I saw.

(hope you don't mind I renamed your thread, SHA-256 optimisation is something important that I keep forgetting about)
newbie
Activity: 53
Merit: 0
July 18, 2010, 09:28:50 AM
#1
I've managed to set up dependencies and build bitcoin with MS Visual C++ 2008 Express Edition. I'll give 2010 a try at some time.

There is a custom allocator class in serialize.h, secure_allocator, that fails to build with non-debug runtime selected. It is my understanding allocator classes require a template copy constructor, I've attached a small patch that solves the problem.

As Satoshi noted elsewhere, the MSVC build is indeed significantly slower khash/s-wise (more than twice) than the prebuilt one (MinGW?), even though I enabled the highest optimization level options and also global optimization with link-time code generation. I find that result strange, since MSVC is not known to have significantly worse optimizer than GCC's. Most probably the problem can be traced to the sha module that is extracted from Crypto++. I find in Crypto++ SVN there are revised versions of the module, including x86/x64 assembly for SHA-256. Using the newer versions would involve reintegrating their dependencies, though. On that note, why aren't we using OpenSSL's SHA-2 hashing functions instead? Since we already use OpenSSL, this would be a better solution than to manually support a SHA module from another library.
Jump to: