Author

Topic: anyone tried running with VIA Padlock extensions? (Read 17003 times)

lfm
full member
Activity: 196
Merit: 100
I'll try other algos now, to see if this behaviour is independent of the selected algorithm or not.

The -4way algo does seem to work alright and successfully generated a few coins within a day or so.

It seems there something wrong with the padlock code for the VIA Nano, at least in 64bit mode.

I dont think it supports 64 bit mode, it is only coded for the via c7 atm. the C7 doesnt have 64 bit support. It should work compiled for 32 bit mode on the nano even if it is a 64 bit os. If you want to get involved, the nano has some extended hash instructions that would be usefull I think to speed it up on the nano. Make a separate sha256_nano module, keep the sha256_via separate for the c7 I think would be best for now.

There may be still some problem with the sha256_via even on the c7 in 32 bit mode. Not sure yet. I am doing a testnet run but no results for certain yet.
legendary
Activity: 1596
Merit: 1022
It seems there something wrong with the padlock code for the VIA Nano, at least in 64bit mode.

Are you using cpuminer?  You need version 0.3.1 for sha256_via fixes.
newbie
Activity: 11
Merit: 0
I'll try other algos now, to see if this behaviour is independent of the selected algorithm or not.

The -4way algo does seem to work alright and successfully generated a few coins within a day or so.

It seems there something wrong with the padlock code for the VIA Nano, at least in 64bit mode.
lfm
full member
Activity: 196
Merit: 100
How many khashes per second are you currently getting on a via-padlock?

On a via C7 at 1.8 ghz I get 1418 khash/sec on linux

$ cat /proc/cpuinfo
processor       : 0
vendor_id       : CentaurHauls
cpu family      : 6
model           : 13
model name      : VIA C7-D Processor 1800MHz
stepping        : 0
cpu MHz         : 1800.000
cache size      : 128 KB
newbie
Activity: 11
Merit: 0
I've tried that patch to main.cpp you suggested. Here the output from minerd:

Code:
DBG: found zeroes in hash:
7e242ac3d2f4298e502efd7e4b3677cc287114488e1c77c4a1406cba00000000
HashMeter(0): 7994345 hashes, 1587.98 khash/sec
PROOF OF WORK FOUND?  submitting...
PROOF OF WORK RESULT: false (booooo)

The output from patched bitcoind is this:

Code:
proof-of-work check FAILED...
  hash: 9c581ce97e417b9ea6ffb2502041a46ad740a74567e55bbd636d8944cc552995
target: 0000000045120800000000000000000000000000000000000000000000000000

FYI, I'm using bitcoin on a amd64 Debian unstable machine. Both bitcoind and minerd were compiled natively for amd64.

I'll try other algos now, to see if this behaviour is independent of the selected algorithm or not.
newbie
Activity: 7
Merit: 0
How many khashes per second are you currently getting on a via-padlock?
legendary
Activity: 1596
Merit: 1022
git updated with sha256_via, sha256_4way fixes.
legendary
Activity: 1596
Merit: 1022

jgarzik:

trying your cpu-miner on via:

bug in main pprogram segment violation:  needs extra NULL check for sparse array in parse arg
Code:
                      if (algo_names[i] != NULL &&
                            !strcmp(arg, algo_names[i])) {

Good catch.  Applied similar patch.

Thanks for taking a look!
lfm
full member
Activity: 196
Merit: 100
ok in sha256_via.c  also align tmp_hash1 to 128 to avoid stack clobber.


btw I am on a via-c7 which is less capable than the via nano (eg no sse2 or 64 bit but also lesser padlock support)

There was another problem in the compiling the sha256_4way.c on my system I had disable some headers that errored when I had no sse support in the compiler thus:

Code:

#include
#include

#ifdef WANT_SSE2_4WAY

#include
#include
#include
#include "miner.h"

#define NPAR 32


but I got it working eventually about the same speed as my old version of the main prog and easier to support
lfm
full member
Activity: 196
Merit: 100

jgarzik:

trying your cpu-miner on via:

bug in main pprogram segment violation:  needs extra NULL check for sparse array in parse arg
Code:
                      if (algo_names[i] != NULL &&
                            !strcmp(arg, algo_names[i])) {
 
now it is reporting stack clobbered but I havnt found that yet
legendary
Activity: 1596
Merit: 1022
Thanks for the info. I'll let the testnet client run for the night. What generate setting should bitcoind have? setgenerate set to true with limit to zero processors, or should setgenerate be set to false?

setgenerate controls the in-client miner.  So, it may be set, or not, as you choose.

These external miners use the 'getwork' JSON-RPC call, which works regardless of the setgenerate setting.
newbie
Activity: 11
Merit: 0
Thanks for the info. I'll let the testnet client run for the night. What generate setting should bitcoind have? setgenerate set to true with limit to zero processors, or should setgenerate be set to false?
legendary
Activity: 1596
Merit: 1022
You may find this patch to bitcoin helpful:

Code:
diff --git a/main.cpp b/main.cpp
index a1865a4..da85b0d 100644
--- a/main.cpp
+++ b/main.cpp
@@ -3273,8 +3273,11 @@ bool CheckWork(CBlock* pblock, CReserveKey& reservekey)
     uint256 hash = pblock->GetHash();
     uint256 hashTarget = CBigNum().SetCompact(pblock->nBits).getuint256();
 
-    if (hash > hashTarget)
+    if (hash > hashTarget) {
+           printf("proof-of-work check FAILED...\n  hash: %s\ntarget: %s\n",
+                  hash.GetHex().c_str(), hashTarget.GetHex().c_str());
         return false;
+    }
 
     //// debug print
     printf("BitcoinMiner:\n");


This will show the proper, byte-reversed hash, and how close you came to the target.   That is very helpful in verifying whether or not the algorithm is truly working.
legendary
Activity: 1596
Merit: 1022
I've tried the cpuminer version 0.2.1 on a VIA Nano machine. I used the "via" algo with a bitcoin running on the testnet. The miner worked, but the results generated seemed to be wrong, debug output pasted in below.
The system is a 64bit Debian unstable machine with a VIA VB8001 motherboards. It's running a stepping 2 VIA Nano. The kernel seems to do a workaround for Nanos with that stepping, perhaps something needs to be done in the miner code, as well.

Code:
HashMeter(0): 16777216 hashes, 1589.69 khash/sec
DBG: found zeroes in hash:
9ec42e51b34b69fc2f7209f3e334afcfa563d1da21647832cd2b312c00000000
HashMeter(0): 6644792 hashes, 1606.76 khash/sec
PROOF OF WORK FOUND?  submitting...
DBG: sending RPC call:
{"method": "getwork", "params": [ "000000016f643cccfaa9574cd1a3369a23da6452fcf296587e4da572a008520300000001f1071376c66751bede719672dd1e9e3b2a3daeec709fc5bcaede21364748d93a4cf93ea21d05106000000000000000800000000000000000000000000000000000000000000000000000000000000000000000000000000080020000" ], "id":1}
PROOF OF WORK RESULT: false (booooo)

Actually, it looks like it is working, to me.  As explained in this thread, cpuminer searches for an approximate number of leading zeroes in the hash.

It then submits that hash to bitcoin, for final verification.  Thus, it is normal for cpuminer to find several almost-solutions, before finding a real solution, depending on current difficulty.

The official bitcoin client works this way too -- it stops hashing when a certain amount of zeroes appear.  However, it does so silently, whereas cpuminer print something.
newbie
Activity: 11
Merit: 0
I've tried the cpuminer version 0.2.1 on a VIA Nano machine. I used the "via" algo with a bitcoin running on the testnet. The miner worked, but the results generated seemed to be wrong, debug output pasted in below.
The system is a 64bit Debian unstable machine with a VIA VB8001 motherboards. It's running a stepping 2 VIA Nano. The kernel seems to do a workaround for Nanos with that stepping, perhaps something needs to be done in the miner code, as well.

Code:
HashMeter(0): 16777216 hashes, 1589.69 khash/sec
DBG: found zeroes in hash:
9ec42e51b34b69fc2f7209f3e334afcfa563d1da21647832cd2b312c00000000
HashMeter(0): 6644792 hashes, 1606.76 khash/sec
PROOF OF WORK FOUND?  submitting...
DBG: sending RPC call:
{"method": "getwork", "params": [ "000000016f643cccfaa9574cd1a3369a23da6452fcf296587e4da572a008520300000001f1071376c66751bede719672dd1e9e3b2a3daeec709fc5bcaede21364748d93a4cf93ea21d05106000000000000000800000000000000000000000000000000000000000000000000000000000000000000000000000000080020000" ], "id":1}
PROOF OF WORK RESULT: false (booooo)
legendary
Activity: 1596
Merit: 1022
I had not even gotten far enough to determine why your code lacked the midstate caching stuff Smiley  If you have the hardware (I don't), giving my miner a try would be really helpful.  I don't even have a simple "it works" confirmation on VIA yet.

If you happen to figure out anything interesting, I'll be happy to integrate it and post a new Windows build.

No problem. I'll see if I can find some time to boot a linux live cd and play with it. Otherwise it's running a Windows build right now.

You don't need Linux... there's a Windows build:  http://yyz.us/bitcoin/cpuminer-installer-0.2.zip
member
Activity: 61
Merit: 10
I had not even gotten far enough to determine why your code lacked the midstate caching stuff Smiley  If you have the hardware (I don't), giving my miner a try would be really helpful.  I don't even have a simple "it works" confirmation on VIA yet.

If you happen to figure out anything interesting, I'll be happy to integrate it and post a new Windows build.

No problem. I'll see if I can find some time to boot a linux live cd and play with it. Otherwise it's running a Windows build right now.
legendary
Activity: 1596
Merit: 1022
Thanks for the inspiration.  After reading this, I added VIA padlock support to my CPU miner.

Cool! Did you ever figure out how to get midstate caching working with it? I thought th C7 were capable of it. I know the Nanos are, but I thought VIA had it somewhere on their site. I'll have to try a Windows build of it.

I had not even gotten far enough to determine why your code lacked the midstate caching stuff Smiley  If you have the hardware (I don't), giving my miner a try would be really helpful.  I don't even have a simple "it works" confirmation on VIA yet.

If you happen to figure out anything interesting, I'll be happy to integrate it and post a new Windows build.
member
Activity: 61
Merit: 10
Thanks for the inspiration.  After reading this, I added VIA padlock support to my CPU miner.

Cool! Did you ever figure out how to get midstate caching working with it? I thought th C7 were capable of it. I know the Nanos are, but I thought VIA had it somewhere on their site. I'll have to try a Windows build of it.
legendary
Activity: 1596
Merit: 1022
Download 0.3.10 HERE!!

Thanks for the inspiration.  After reading this, I added VIA padlock support to my CPU miner.
member
Activity: 61
Merit: 10
Download 0.3.10 HERE!!

---

Trying to figure out OpenSSL support for VIA padlock functions seems like a quagmire from what I see so far surfing the net.

Yea, completely agree there. I've looked into it as well, lots might need to be changed.
lfm
full member
Activity: 196
Merit: 100
There's a "drivers/crypto/padlock-sha.c" driver implementation in the standard kernel.

How does the openssl speed benchmark compare to bitcoin's khash/s?
Code:
openssl speed -evp sha256

On my Core2Duo E8500, it's:
Code:
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
sha256           25568.41k    60726.70k   108968.11k   137848.27k   146604.46k


I was unaware of OpenSSL support. I don't think I have it being used yet. At least I don't see any speed changes no matter what I have tried.

On my VIA C7 1.8 ghz I currently get

Code:
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
sha256            3379.47k     8061.39k    14412.11k    18119.94k    19447.70k

so its still kinda slow. I can't tell if this is with VIA support enabled or not.

Trying to figure out OpenSSL support for VIA padlock functions seems like a quagmire from what I see so far surfing the net.
lfm
full member
Activity: 196
Merit: 100
Ok, I got a version 0.3.10 bitcoin running on a C7. It does about 1430 khash/s currently on a 1.8 ghz VIA C7.

It's not clear yet if  we can get it optimized to do the 2 block hash (1 block pre-hashed) instead of the 3 block hash per nonce-attempt. We'll be investigating that.
member
Activity: 111
Merit: 10
If I was building dedicated bitminer hardware, I'd start with something like this:
http://www.xilinx.com/products/ipcenter/Fast_SHA-1-SHA-256_MD5_Hashing_cores.htm

full member
Activity: 199
Merit: 696
A 'whole attempt' is actually 3 blocks.  2 blocks of data are hashed to produce a 256 bit result, then that result is padded to make another block and hashed again.  Each block is given the usual 64 pass treatment of SHA-256.  The interesting part that generates the coins uses a simplified implementation in c++ rather than OpenSSL.
member
Activity: 61
Merit: 10
I had been needing a fun system for a home server anyway... So, I just picked up a mobo/cpu combo of Newegg tonight. VIA [email protected]. It will be running with 2GB of RAM on a 16GB compact Flash disk(for BitCoin) and a 120GB 2.5" laptop SATA drive for the OS etc. Hopefully I will have this up by the beginning of next week once it gets here and I figure out how to get Linux onto this one. I'll create a thread or continue this one as I build and test for it.
administrator
Activity: 4228
Merit: 8647
Is each "khash/sec" a whole attempt, or each single cycle through SHA256?

It's a whole attempt.
jib
member
Activity: 92
Merit: 10
Thanks for the hashing analysis from a much more experienced perspective!

I'm not "experienced", I'm just another random person who reads stuff on the internet. Please don't trust anything I say :p
newbie
Activity: 52
Merit: 0
Thanks for the hashing analysis from a much more experienced perspective! I am still interested in how this little processor can do... even if I was off by a factor of about 10, it might still be competitive with much more expensive and energy intensive desktop processors. I can get about 2100 khash/sec using all 4 cores of my 64 bit machine when the system is otherwise idle, and that certainly makes the fans blow a lot of hot air. I though it might be possible for VIA to overcome because custom circuits (FPGA or ASIC) for some cryptographic functions have in the past proved orders of magnitude faster than general desktop processors or even GPUs.
member
Activity: 111
Merit: 10
Excellent insight into how the SHA256 stages are used.
The machine I quoted above seems to run about 1630 khash/sec, so there's one data point.

If openssh is hashing 512-bit chunks at a throughput rate of 60726.70k bytes per second, that's 485 Mbps.
If bitcoin on the same machine is doing 1,630,000 btc-hashes per second, and each btc-hash is effectively 1536 bits (three 512-bit hash inputs) through the same pipeline, that's a whopping 2503 Mbps.

Is bitcoin really running SHA256 at 5x the speed of openssh?

Is each "khash/sec" a whole attempt, or each single cycle through SHA256?
jib
member
Activity: 92
Merit: 10
5 Gbit/sec of SHA-256 hashes, which I believe is about 19500 khash/sec

Your calculation seems to be based on 256 bits per hash. I think the 5Gbit/sec refers to the amount of data being hashed, not the size of the hash itself. So instead you should be estimating 80 bytes (640 bits) per hash, which is the size of the block header being hashed. And bitcoin actually uses SHA-256 twice for each hash (using the first result as input to a second SHA-256), so add another 256 bits. This means each hash requires processing 896 bits of input, so at 5Gbit/sec you get 5580 khash/sec.

Although, if we consider how SHA-256 works in more detail, it separates the input data into 512-bit blocks (padded to a whole number of blocks). So for the first hash we process two blocks (1024 bits) and for the second hash we process one block (512 bits), making 1536 bits total. At 5Gbit/sec that gives 3255 khash/sec.

This is ignoring the overhead of initialising the hardware for each SHA-256 we do, which might be significant. The 5Gbit/sec figure is probably for throughput when hashing a long stream of data, not for millions of small hashes. So the actual performance might be much lower.

I guess it's worth experimenting with, but it's almost certainly not going to be as awesome as originally claimed.
jib
member
Activity: 92
Merit: 10
I assumed that since bitcoin appears to be built on OpenSSL you would just need to rebuild it from source on your VIA machine with Padlock-aware OpenSSL, but maybe there is more to it.

Bitcoin uses its own SHA256 code, not OpenSSL's. You'd need to modify Bitcoin to be Padlock-aware or to use OpenSSL's code.
newbie
Activity: 52
Merit: 0
There's a "drivers/crypto/padlock-sha.c" driver implementation in the standard kernel.

How does the openssl speed benchmark compare to bitcoin's khash/s?
Code:
openssl speed -evp sha256

On my Core2Duo E8500, it's:
Code:
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
sha256           25568.41k    60726.70k   108968.11k   137848.27k   146604.46k

That's a good question. I'm not on my home machine right now so I can't compare khash/s to the OpenSSL benchmark. On the largest block size in that benchmark, it looks like your machine does about 1.2 Gbit/sec (146604.46 * 1000 * Cool. So if the VIA chip can reach its full potential it would be about 4 times as fast.

I assumed that since bitcoin appears to be built on OpenSSL you would just need to rebuild it from source on your VIA machine with Padlock-aware OpenSSL, but maybe there is more to it.

I think the most common VIA use right now is in netbooks. I would be pretty amused if a little $350 netbook with this processor could keep up with an i5 or Phenom II. I bet a good CUDA hasher can thrash it, but I wouldn't be surprised if the VIA chip wins on hashes per dollar of hardware and per dollar of electricity.
member
Activity: 61
Merit: 10
While this appears to be a good idea for a low power server that can "keep up with the big boys", I think that it isn't the best method for it. It looks like OpenSSL DOES have a CUDA version floating around out there for linux boxes. That would be a HUGE increase over anything we could do in this hardware. I'm still going to be pursuing both paths for now. It would be awesome having both of my "big" boxes crunching for this.
member
Activity: 61
Merit: 10
Yep, the openssl acceleration is already built into the kernel.

Check out these for AES benhcmarks on XP. Not sure what for SHA-256 on *nix.
http://www.logix.cz/michal/devel/padlock/bench.xp
member
Activity: 111
Merit: 10
There's a "drivers/crypto/padlock-sha.c" driver implementation in the standard kernel.

How does the openssl speed benchmark compare to bitcoin's khash/s?
Code:
openssl speed -evp sha256

On my Core2Duo E8500, it's:
Code:
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
sha256           25568.41k    60726.70k   108968.11k   137848.27k   146604.46k
full member
Activity: 199
Merit: 696
We would need to write code to use this hardware, there is no automatic acceleration, btw.
full member
Activity: 199
Merit: 696
That does sound interesting, I might go pick one of those up to experiment.
newbie
Activity: 52
Merit: 0
I understand that the VIA C7 line of x86 processors has "Padlock" cryptographic acceleration CPU instructions built in. OpenSSL can support these extensions too.

I've only been able to find AES benchmarks on Padlock (with some very impressive speedup), but it supports SHA-256 acceleration too. According to VIA's website it can do up to 5 Gbit/sec of SHA-256 hashes, which I believe is about 19500 khash/sec! If you can get even close to that performance it should leave the beefiest multicore 64 bit systems in the dust.

Does anyone here have a VIA processor with these extensions? Does bitcoin automatically pick them up, or do you need to rebuild from source? It seems like this inexpensive processor line should be a real coinspinner!

Sun's Niagara hardware has similar cryptographic acceleration, and I believe there are separate cards you can buy for this purpose too, but I think the VIA line packs a mighty big wallop for a mighty small price. Unfortunately I don't have such a processor myself, but I'd be really interested if someone else manages to get the combination working.
Jump to: