Pages:
Author

Topic: 4 hashes parallel on SSE2 CPUs for 0.3.6 - page 4. (Read 22002 times)

member
Activity: 111
Merit: 10
August 02, 2010, 05:17:07 AM
#28
With the patch above, I was unable to build the test program.  You?
newbie
Activity: 18
Merit: 0
August 02, 2010, 05:00:55 AM
#27
Is it a AMD only optimization perhaps?

Or a 64-bit only optimization.
sr. member
Activity: 308
Merit: 256
August 02, 2010, 04:47:04 AM
#26
Is it a AMD only optimization perhaps?
newbie
Activity: 18
Merit: 0
August 02, 2010, 04:13:11 AM
#25
I have been able to apply the patch against SVN (r121) and I tested it on 2 machines:
  • on an AMD Opteron 2374 HE running x86_64 linux I got a 105% improvement (!)
  • on an Intel Core 2 Duo T7300 running x86_64 linux it was 55% slower compared to the stock version (r121)

The strange thing is that despite the fact that I have been running it on 6 Opterons (i.e. 6x4=24 cores) for 40 hours with an average rate of 51,000 khash/s, I still haven't generated any blocks. The probability of this (no blocks, 40 hours, 51,000 khash/s and diffuculty=244.2) is 0.09% or 1/1098. Are you sure this thing works correctly and that the reported rate is correct? How do I run the included test program?
member
Activity: 111
Merit: 10
August 02, 2010, 02:57:20 AM
#24
I got the patch knitted in, and I think I did it correctly.. wasn't complicated.

Regrettably, the hash rate has decreased by almost half.  I'm down from 2071 (stock build, svn tip) to 1150 khash/sec with the patch.

It's an Intel Xeon 3 GHz, Linux, with these proc flags:
Code:
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc pni monitor ds_cpl cid cx16 xtpr

Has anyone seen gains?

Did I botch it?  Missing CPU capabilities?  Wrong compiler options?
member
Activity: 111
Merit: 10
August 01, 2010, 08:22:43 PM
#23
No joy against SVN tip here.

Code:
patching file sha256.cpp
patching file main.cpp
Hunk #1 FAILED at 2555.
Hunk #2 FAILED at 2703.
2 out of 2 hunks FAILED -- saving rejects to file main.cpp.rej
patching file makefile.unix
Hunk #1 FAILED at 41.
Hunk #2 FAILED at 52.
Hunk #3 FAILED at 64.
3 out of 3 hunks FAILED -- saving rejects to file makefile.unix.rej
patching file test.cpp

Trying manually now.
newbie
Activity: 10
Merit: 0
newbie
Activity: 3
Merit: 0
August 01, 2010, 06:16:48 AM
#21
care with __attribute__ ((aligned (16))) , it doesn't work with local variable, gcc doesn't align the stack

Maybe gcc doesn't align the stack, but it can (and automatically does) align variables on the stack.
sr. member
Activity: 337
Merit: 263
Patch against SVN. Maybe it'll work now...
Code:
diff --git a/cryptopp/sha256.cpp b/cryptopp/sha256.cpp
new file mode 100644
index 0000000..6735678
--- /dev/null
+++ b/cryptopp/sha256.cpp
@@ -0,0 +1,447 @@
+#include
+#include
+
+#include
+#include
+#include
+
+#define NPAR 32
+
+static const unsigned int sha256_consts[] = {
+ 0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5, /*  0 */
+ 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,
+ 0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3, /*  8 */
+ 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174,
+ 0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc, /* 16 */
+ 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da,
+ 0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7, /* 24 */
+ 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967,
+ 0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13, /* 32 */
+ 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85,
+ 0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3, /* 40 */
+ 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070,
+ 0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5, /* 48 */
+ 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3,
+ 0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208, /* 56 */
+ 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2
+};
+
+
+static inline __m128i Ch(const __m128i b, const __m128i c, const __m128i d) {
+ return (b & c) ^ (~b & d);
+}
+
+static inline __m128i Maj(const __m128i b, const __m128i c, const __m128i d) {
+ return (b & c) ^ (b & d) ^ (c & d);
+}
+
+static inline __m128i ROTR(__m128i x, const int n) {
+ return _mm_srli_epi32(x, n) | _mm_slli_epi32(x, 32 - n);
+}
+
+static inline __m128i SHR(__m128i x, const int n) {
+ return _mm_srli_epi32(x, n);
+}
+
+/* SHA256 Functions */
+#define BIGSIGMA0_256(x) (ROTR((x), 2) ^ ROTR((x), 13) ^ ROTR((x), 22))
+#define BIGSIGMA1_256(x) (ROTR((x), 6) ^ ROTR((x), 11) ^ ROTR((x), 25))
+#define SIGMA0_256(x) (ROTR((x), 7) ^ ROTR((x), 18) ^ SHR((x), 3))
+#define SIGMA1_256(x) (ROTR((x), 17) ^ ROTR((x), 19) ^ SHR((x), 10))
+
+static inline __m128i load_epi32(const unsigned int x0, const unsigned int x1, const unsigned int x2, const unsigned int x3) {
+ return _mm_set_epi32(x0, x1, x2, x3);
+}
+
+static inline unsigned int store32(const __m128i x, int i) {
+ union { unsigned int ret[4]; __m128i x; } box;
+ box.x = x;
+ return box.ret[i];
+}
+
+static inline void store_epi32(const __m128i x, unsigned int *x0, unsigned int *x1, unsigned int *x2, unsigned int *x3) {
+ union { unsigned int ret[4]; __m128i x; } box;
+ box.x = x;
+ *x0 = box.ret[3]; *x1 = box.ret[2]; *x2 = box.ret[1]; *x3 = box.ret[0];
+}
+
+static inline __m128i SHA256_CONST(const int i) {
+ return _mm_set1_epi32(sha256_consts[i]);
+}
+
+#define add4(x0, x1, x2, x3) _mm_add_epi32(_mm_add_epi32(_mm_add_epi32(x0, x1), x2), x3)
+#define add5(x0, x1, x2, x3, x4) _mm_add_epi32(add4(x0, x1, x2, x3), x4)
+
+#define SHA256ROUND(a, b, c, d, e, f, g, h, i, w)                       \
+ T1 = add5(h, BIGSIGMA1_256(e), Ch(e, f, g), SHA256_CONST(i), w); \
+d = _mm_add_epi32(d, T1);                                           \
+T2 = _mm_add_epi32(BIGSIGMA0_256(a), Maj(a, b, c));                 \
+h = _mm_add_epi32(T1, T2);
+
+#define SHA256ROUND_lastd(a, b, c, d, e, f, g, h, i, w)                       \
+ T1 = add5(h, BIGSIGMA1_256(e), Ch(e, f, g), SHA256_CONST(i), w); \
+d = _mm_add_epi32(d, T1);                                           
+//T2 = _mm_add_epi32(BIGSIGMA0_256(a), Maj(a, b, c));                 
+//h = _mm_add_epi32(T1, T2);
+
+#define SHA256ROUND_last(a, b, c, d, e, f, g, h, i, w)                       \
+ T1 = add5(h, BIGSIGMA1_256(e), Ch(e, f, g), SHA256_CONST(i), w); \
+T2 = _mm_add_epi32(BIGSIGMA0_256(a), Maj(a, b, c));                 \
+h = _mm_add_epi32(T1, T2);
+
+static inline unsigned int swap(unsigned int value) {
+ __asm__ ("bswap %0" : "=r" (value) : "0" (value));
+ return value;
+}
+
+static inline unsigned int SWAP32(const void *addr) {
+ unsigned int value = (*((unsigned int *)(addr)));
+ __asm__ ("bswap %0" : "=r" (value) : "0" (value));
+ return value;
+}
+
+static inline void dumpreg(__m128i x, char *msg) {
+ union { unsigned int ret[4]; __m128i x; } box;
+ box.x = x ;
+ printf("%s %08x %08x %08x %08x\n", msg, box.ret[0], box.ret[1], box.ret[2], box.ret[3]);
+}
+
+#if 1
+#define dumpstate(i) printf("%s: %08x %08x %08x %08x %08x %08x %08x %08x %08x\n", \
+ __func__, store32(w0, i), store32(a, i), store32(b, i), store32(c, i), store32(d, i), store32(e, i), store32(f, i), store32(g, i), store32(h, i));
+#else
+#define dumpstate()
+#endif
+void Double_BlockSHA256(const void* pin, void* pad, const void *pre, unsigned int thash[9][NPAR], const void *init)
+{
+ unsigned int* In = (unsigned int*)pin;
+ unsigned int* Pad = (unsigned int*)pad;
+ unsigned int* hPre = (unsigned int*)pre;
+ unsigned int* hInit = (unsigned int*)init;
+ unsigned int i, j, k;
+
+ /* vectors used in calculation */
+ __m128i w0, w1, w2, w3, w4, w5, w6, w7;
+ __m128i w8, w9, w10, w11, w12, w13, w14, w15;
+ __m128i T1, T2;
+ __m128i a, b, c, d, e, f, g, h;
+  __m128i nonce;
+
+ /* nonce offset for vector */
+ __m128i offset = load_epi32(0x00000003, 0x00000002, 0x00000001, 0x00000000);
+
+
+ for(k = 0; k+ w0 = load_epi32(In[0], In[0], In[0], In[0]);
+ w1 = load_epi32(In[1], In[1], In[1], In[1]);
+ w2 = load_epi32(In[2], In[2], In[2], In[2]);
+ //w3 = load_epi32(In[3], In[3], In[3], In[3]); nonce will be later hacked into the hash
+ w4 = load_epi32(In[4], In[4], In[4], In[4]);
+ w5 = load_epi32(In[5], In[5], In[5], In[5]);
+ w6 = load_epi32(In[6], In[6], In[6], In[6]);
+ w7 = load_epi32(In[7], In[7], In[7], In[7]);
+ w8 = load_epi32(In[8], In[8], In[8], In[8]);
+ w9 = load_epi32(In[9], In[9], In[9], In[9]);
+ w10 = load_epi32(In[10], In[10], In[10], In[10]);
+ w11 = load_epi32(In[11], In[11], In[11], In[11]);
+ w12 = load_epi32(In[12], In[12], In[12], In[12]);
+ w13 = load_epi32(In[13], In[13], In[13], In[13]);
+ w14 = load_epi32(In[14], In[14], In[14], In[14]);
+ w15 = load_epi32(In[15], In[15], In[15], In[15]);
+
+ /* hack nonce into lowest byte of w3 */
+ nonce = load_epi32(In[3], In[3], In[3], In[3]);
+ __m128i k_vec = load_epi32(k, k, k, k);
+ nonce = _mm_add_epi32(nonce, offset);
+ nonce = _mm_add_epi32(nonce, k_vec);
+    w3 = nonce;
+
+ a = load_epi32(hPre[0], hPre[0], hPre[0], hPre[0]);
+ b = load_epi32(hPre[1], hPre[1], hPre[1], hPre[1]);
+ c = load_epi32(hPre[2], hPre[2], hPre[2], hPre[2]);
+ d = load_epi32(hPre[3], hPre[3], hPre[3], hPre[3]);
+ e = load_epi32(hPre[4], hPre[4], hPre[4], hPre[4]);
+ f = load_epi32(hPre[5], hPre[5], hPre[5], hPre[5]);
+ g = load_epi32(hPre[6], hPre[6], hPre[6], hPre[6]);
+ h = load_epi32(hPre[7], hPre[7], hPre[7], hPre[7]);
+
+ SHA256ROUND(a, b, c, d, e, f, g, h, 0, w0);   
+ SHA256ROUND(h, a, b, c, d, e, f, g, 1, w1);
+ SHA256ROUND(g, h, a, b, c, d, e, f, 2, w2);
+ SHA256ROUND(f, g, h, a, b, c, d, e, 3, w3);
+ SHA256ROUND(e, f, g, h, a, b, c, d, 4, w4);
+ SHA256ROUND(d, e, f, g, h, a, b, c, 5, w5);
+ SHA256ROUND(c, d, e, f, g, h, a, b, 6, w6);
+ SHA256ROUND(b, c, d, e, f, g, h, a, 7, w7);
+ SHA256ROUND(a, b, c, d, e, f, g, h, 8, w8);
+ SHA256ROUND(h, a, b, c, d, e, f, g, 9, w9);
+ SHA256ROUND(g, h, a, b, c, d, e, f, 10, w10);
+ SHA256ROUND(f, g, h, a, b, c, d, e, 11, w11);
+ SHA256ROUND(e, f, g, h, a, b, c, d, 12, w12);
+ SHA256ROUND(d, e, f, g, h, a, b, c, 13, w13);
+ SHA256ROUND(c, d, e, f, g, h, a, b, 14, w14);
+ SHA256ROUND(b, c, d, e, f, g, h, a, 15, w15);
+
+ w0 = add4(SIGMA1_256(w14), w9, SIGMA0_256(w1), w0);
+ SHA256ROUND(a, b, c, d, e, f, g, h, 16, w0);
+ w1 = add4(SIGMA1_256(w15), w10, SIGMA0_256(w2), w1);
+ SHA256ROUND(h, a, b, c, d, e, f, g, 17, w1);
+ w2 = add4(SIGMA1_256(w0), w11, SIGMA0_256(w3), w2);
+ SHA256ROUND(g, h, a, b, c, d, e, f, 18, w2);
+ w3 = add4(SIGMA1_256(w1), w12, SIGMA0_256(w4), w3);
+ SHA256ROUND(f, g, h, a, b, c, d, e, 19, w3);
+ w4 = add4(SIGMA1_256(w2), w13, SIGMA0_256(w5), w4);
+ SHA256ROUND(e, f, g, h, a, b, c, d, 20, w4);
+ w5 = add4(SIGMA1_256(w3), w14, SIGMA0_256(w6), w5);
+ SHA256ROUND(d, e, f, g, h, a, b, c, 21, w5);
+ w6 = add4(SIGMA1_256(w4), w15, SIGMA0_256(w7), w6);
+ SHA256ROUND(c, d, e, f, g, h, a, b, 22, w6);
+ w7 = add4(SIGMA1_256(w5), w0, SIGMA0_256(w8), w7);
+ SHA256ROUND(b, c, d, e, f, g, h, a, 23, w7);
+ w8 = add4(SIGMA1_256(w6), w1, SIGMA0_256(w9), w8);
+ SHA256ROUND(a, b, c, d, e, f, g, h, 24, w8);
+ w9 = add4(SIGMA1_256(w7), w2, SIGMA0_256(w10), w9);
+ SHA256ROUND(h, a, b, c, d, e, f, g, 25, w9);
+ w10 = add4(SIGMA1_256(w8), w3, SIGMA0_256(w11), w10);
+ SHA256ROUND(g, h, a, b, c, d, e, f, 26, w10);
+ w11 = add4(SIGMA1_256(w9), w4, SIGMA0_256(w12), w11);
+ SHA256ROUND(f, g, h, a, b, c, d, e, 27, w11);
+ w12 = add4(SIGMA1_256(w10), w5, SIGMA0_256(w13), w12);
+ SHA256ROUND(e, f, g, h, a, b, c, d, 28, w12);
+ w13 = add4(SIGMA1_256(w11), w6, SIGMA0_256(w14), w13);
+ SHA256ROUND(d, e, f, g, h, a, b, c, 29, w13);
+ w14 = add4(SIGMA1_256(w12), w7, SIGMA0_256(w15), w14);
+ SHA256ROUND(c, d, e, f, g, h, a, b, 30, w14);
+ w15 = add4(SIGMA1_256(w13), w8, SIGMA0_256(w0), w15);
+ SHA256ROUND(b, c, d, e, f, g, h, a, 31, w15);
+
+ w0 = add4(SIGMA1_256(w14), w9, SIGMA0_256(w1), w0);
+ SHA256ROUND(a, b, c, d, e, f, g, h, 32, w0);
+ w1 = add4(SIGMA1_256(w15), w10, SIGMA0_256(w2), w1);
+ SHA256ROUND(h, a, b, c, d, e, f, g, 33, w1);
+ w2 = add4(SIGMA1_256(w0), w11, SIGMA0_256(w3), w2);
+ SHA256ROUND(g, h, a, b, c, d, e, f, 34, w2);
+ w3 = add4(SIGMA1_256(w1), w12, SIGMA0_256(w4), w3);
+ SHA256ROUND(f, g, h, a, b, c, d, e, 35, w3);
+ w4 = add4(SIGMA1_256(w2), w13, SIGMA0_256(w5), w4);
+ SHA256ROUND(e, f, g, h, a, b, c, d, 36, w4);
+ w5 = add4(SIGMA1_256(w3), w14, SIGMA0_256(w6), w5);
+ SHA256ROUND(d, e, f, g, h, a, b, c, 37, w5);
+ w6 = add4(SIGMA1_256(w4), w15, SIGMA0_256(w7), w6);
+ SHA256ROUND(c, d, e, f, g, h, a, b, 38, w6);
+ w7 = add4(SIGMA1_256(w5), w0, SIGMA0_256(w8), w7);
+ SHA256ROUND(b, c, d, e, f, g, h, a, 39, w7);
+ w8 = add4(SIGMA1_256(w6), w1, SIGMA0_256(w9), w8);
+ SHA256ROUND(a, b, c, d, e, f, g, h, 40, w8);
+ w9 = add4(SIGMA1_256(w7), w2, SIGMA0_256(w10), w9);
+ SHA256ROUND(h, a, b, c, d, e, f, g, 41, w9);
+ w10 = add4(SIGMA1_256(w8), w3, SIGMA0_256(w11), w10);
+ SHA256ROUND(g, h, a, b, c, d, e, f, 42, w10);
+ w11 = add4(SIGMA1_256(w9), w4, SIGMA0_256(w12), w11);
+ SHA256ROUND(f, g, h, a, b, c, d, e, 43, w11);
+ w12 = add4(SIGMA1_256(w10), w5, SIGMA0_256(w13), w12);
+ SHA256ROUND(e, f, g, h, a, b, c, d, 44, w12);
+ w13 = add4(SIGMA1_256(w11), w6, SIGMA0_256(w14), w13);
+ SHA256ROUND(d, e, f, g, h, a, b, c, 45, w13);
+ w14 = add4(SIGMA1_256(w12), w7, SIGMA0_256(w15), w14);
+ SHA256ROUND(c, d, e, f, g, h, a, b, 46, w14);
+ w15 = add4(SIGMA1_256(w13), w8, SIGMA0_256(w0), w15);
+ SHA256ROUND(b, c, d, e, f, g, h, a, 47, w15);
+
+ w0 = add4(SIGMA1_256(w14), w9, SIGMA0_256(w1), w0);
+ SHA256ROUND(a, b, c, d, e, f, g, h, 48, w0);
+ w1 = add4(SIGMA1_256(w15), w10, SIGMA0_256(w2), w1);
+ SHA256ROUND(h, a, b, c, d, e, f, g, 49, w1);
+ w2 = add4(SIGMA1_256(w0), w11, SIGMA0_256(w3), w2);
+ SHA256ROUND(g, h, a, b, c, d, e, f, 50, w2);
+ w3 = add4(SIGMA1_256(w1), w12, SIGMA0_256(w4), w3);
+ SHA256ROUND(f, g, h, a, b, c, d, e, 51, w3);
+ w4 = add4(SIGMA1_256(w2), w13, SIGMA0_256(w5), w4);
+ SHA256ROUND(e, f, g, h, a, b, c, d, 52, w4);
+ w5 = add4(SIGMA1_256(w3), w14, SIGMA0_256(w6), w5);
+ SHA256ROUND(d, e, f, g, h, a, b, c, 53, w5);
+ w6 = add4(SIGMA1_256(w4), w15, SIGMA0_256(w7), w6);
+ SHA256ROUND(c, d, e, f, g, h, a, b, 54, w6);
+ w7 = add4(SIGMA1_256(w5), w0, SIGMA0_256(w8), w7);
+ SHA256ROUND(b, c, d, e, f, g, h, a, 55, w7);
+ w8 = add4(SIGMA1_256(w6), w1, SIGMA0_256(w9), w8);
+ SHA256ROUND(a, b, c, d, e, f, g, h, 56, w8);
+ w9 = add4(SIGMA1_256(w7), w2, SIGMA0_256(w10), w9);
+ SHA256ROUND(h, a, b, c, d, e, f, g, 57, w9);
+ w10 = add4(SIGMA1_256(w8), w3, SIGMA0_256(w11), w10);
+ SHA256ROUND(g, h, a, b, c, d, e, f, 58, w10);
+ w11 = add4(SIGMA1_256(w9), w4, SIGMA0_256(w12), w11);
+ SHA256ROUND(f, g, h, a, b, c, d, e, 59, w11);
+ w12 = add4(SIGMA1_256(w10), w5, SIGMA0_256(w13), w12);
+ SHA256ROUND(e, f, g, h, a, b, c, d, 60, w12);
+ w13 = add4(SIGMA1_256(w11), w6, SIGMA0_256(w14), w13);
+ SHA256ROUND(d, e, f, g, h, a, b, c, 61, w13);
+ w14 = add4(SIGMA1_256(w12), w7, SIGMA0_256(w15), w14);
+ SHA256ROUND(c, d, e, f, g, h, a, b, 62, w14);
+ w15 = add4(SIGMA1_256(w13), w8, SIGMA0_256(w0), w15);
+ SHA256ROUND(b, c, d, e, f, g, h, a, 63, w15);
+
+#define store_load(x, i, dest) \
+ w8 = load_epi32((hPre)[i], (hPre)[i], (hPre)[i], (hPre)[i]); \
+ dest = _mm_add_epi32(w8, x);
+
+ store_load(a, 0, w0);
+ store_load(b, 1, w1);
+ store_load(c, 2, w2);
+ store_load(d, 3, w3);
+ store_load(e, 4, w4);
+ store_load(f, 5, w5);
+ store_load(g, 6, w6);
+ store_load(h, 7, w7);
+
+ w8 = load_epi32(Pad[8], Pad[8], Pad[8], Pad[8]);
+ w9 = load_epi32(Pad[9], Pad[9], Pad[9], Pad[9]);
+ w10 = load_epi32(Pad[10], Pad[10], Pad[10], Pad[10]);
+ w11 = load_epi32(Pad[11], Pad[11], Pad[11], Pad[11]);
+ w12 = load_epi32(Pad[12], Pad[12], Pad[12], Pad[12]);
+ w13 = load_epi32(Pad[13], Pad[13], Pad[13], Pad[13]);
+ w14 = load_epi32(Pad[14], Pad[14], Pad[14], Pad[14]);
+ w15 = load_epi32(Pad[15], Pad[15], Pad[15], Pad[15]);
+
+ a = load_epi32(hInit[0], hInit[0], hInit[0], hInit[0]);
+ b = load_epi32(hInit[1], hInit[1], hInit[1], hInit[1]);
+ c = load_epi32(hInit[2], hInit[2], hInit[2], hInit[2]);
+ d = load_epi32(hInit[3], hInit[3], hInit[3], hInit[3]);
+ e = load_epi32(hInit[4], hInit[4], hInit[4], hInit[4]);
+ f = load_epi32(hInit[5], hInit[5], hInit[5], hInit[5]);
+ g = load_epi32(hInit[6], hInit[6], hInit[6], hInit[6]);
+ h = load_epi32(hInit[7], hInit[7], hInit[7], hInit[7]);
+
+ SHA256ROUND(a, b, c, d, e, f, g, h, 0, w0);   
+ SHA256ROUND(h, a, b, c, d, e, f, g, 1, w1);
+ SHA256ROUND(g, h, a, b, c, d, e, f, 2, w2);
+ SHA256ROUND(f, g, h, a, b, c, d, e, 3, w3);
+ SHA256ROUND(e, f, g, h, a, b, c, d, 4, w4);
+ SHA256ROUND(d, e, f, g, h, a, b, c, 5, w5);
+ SHA256ROUND(c, d, e, f, g, h, a, b, 6, w6);
+ SHA256ROUND(b, c, d, e, f, g, h, a, 7, w7);
+ SHA256ROUND(a, b, c, d, e, f, g, h, 8, w8);
+ SHA256ROUND(h, a, b, c, d, e, f, g, 9, w9);
+ SHA256ROUND(g, h, a, b, c, d, e, f, 10, w10);
+ SHA256ROUND(f, g, h, a, b, c, d, e, 11, w11);
+ SHA256ROUND(e, f, g, h, a, b, c, d, 12, w12);
+ SHA256ROUND(d, e, f, g, h, a, b, c, 13, w13);
+ SHA256ROUND(c, d, e, f, g, h, a, b, 14, w14);
+ SHA256ROUND(b, c, d, e, f, g, h, a, 15, w15);
+
+ w0 = add4(SIGMA1_256(w14), w9, SIGMA0_256(w1), w0);
+ SHA256ROUND(a, b, c, d, e, f, g, h, 16, w0);
+ w1 = add4(SIGMA1_256(w15), w10, SIGMA0_256(w2), w1);
+ SHA256ROUND(h, a, b, c, d, e, f, g, 17, w1);
+ w2 = add4(SIGMA1_256(w0), w11, SIGMA0_256(w3), w2);
+ SHA256ROUND(g, h, a, b, c, d, e, f, 18, w2);
+ w3 = add4(SIGMA1_256(w1), w12, SIGMA0_256(w4), w3);
+ SHA256ROUND(f, g, h, a, b, c, d, e, 19, w3);
+ w4 = add4(SIGMA1_256(w2), w13, SIGMA0_256(w5), w4);
+ SHA256ROUND(e, f, g, h, a, b, c, d, 20, w4);
+ w5 = add4(SIGMA1_256(w3), w14, SIGMA0_256(w6), w5);
+ SHA256ROUND(d, e, f, g, h, a, b, c, 21, w5);
+ w6 = add4(SIGMA1_256(w4), w15, SIGMA0_256(w7), w6);
+ SHA256ROUND(c, d, e, f, g, h, a, b, 22, w6);
+ w7 = add4(SIGMA1_256(w5), w0, SIGMA0_256(w8), w7);
+ SHA256ROUND(b, c, d, e, f, g, h, a, 23, w7);
+ w8 = add4(SIGMA1_256(w6), w1, SIGMA0_256(w9), w8);
+ SHA256ROUND(a, b, c, d, e, f, g, h, 24, w8);
+ w9 = add4(SIGMA1_256(w7), w2, SIGMA0_256(w10), w9);
+ SHA256ROUND(h, a, b, c, d, e, f, g, 25, w9);
+ w10 = add4(SIGMA1_256(w8), w3, SIGMA0_256(w11), w10);
+ SHA256ROUND(g, h, a, b, c, d, e, f, 26, w10);
+ w11 = add4(SIGMA1_256(w9), w4, SIGMA0_256(w12), w11);
+ SHA256ROUND(f, g, h, a, b, c, d, e, 27, w11);
+ w12 = add4(SIGMA1_256(w10), w5, SIGMA0_256(w13), w12);
+ SHA256ROUND(e, f, g, h, a, b, c, d, 28, w12);
+ w13 = add4(SIGMA1_256(w11), w6, SIGMA0_256(w14), w13);
+ SHA256ROUND(d, e, f, g, h, a, b, c, 29, w13);
+ w14 = add4(SIGMA1_256(w12), w7, SIGMA0_256(w15), w14);
+ SHA256ROUND(c, d, e, f, g, h, a, b, 30, w14);
+ w15 = add4(SIGMA1_256(w13), w8, SIGMA0_256(w0), w15);
+ SHA256ROUND(b, c, d, e, f, g, h, a, 31, w15);
+
+ w0 = add4(SIGMA1_256(w14), w9, SIGMA0_256(w1), w0);
+ SHA256ROUND(a, b, c, d, e, f, g, h, 32, w0);
+ w1 = add4(SIGMA1_256(w15), w10, SIGMA0_256(w2), w1);
+ SHA256ROUND(h, a, b, c, d, e, f, g, 33, w1);
+ w2 = add4(SIGMA1_256(w0), w11, SIGMA0_256(w3), w2);
+ SHA256ROUND(g, h, a, b, c, d, e, f, 34, w2);
+ w3 = add4(SIGMA1_256(w1), w12, SIGMA0_256(w4), w3);
+ SHA256ROUND(f, g, h, a, b, c, d, e, 35, w3);
+ w4 = add4(SIGMA1_256(w2), w13, SIGMA0_256(w5), w4);
+ SHA256ROUND(e, f, g, h, a, b, c, d, 36, w4);
+ w5 = add4(SIGMA1_256(w3), w14, SIGMA0_256(w6), w5);
+ SHA256ROUND(d, e, f, g, h, a, b, c, 37, w5);
+ w6 = add4(SIGMA1_256(w4), w15, SIGMA0_256(w7), w6);
+ SHA256ROUND(c, d, e, f, g, h, a, b, 38, w6);
+ w7 = add4(SIGMA1_256(w5), w0, SIGMA0_256(w8), w7);
+ SHA256ROUND(b, c, d, e, f, g, h, a, 39, w7);
+ w8 = add4(SIGMA1_256(w6), w1, SIGMA0_256(w9), w8);
+ SHA256ROUND(a, b, c, d, e, f, g, h, 40, w8);
+ w9 = add4(SIGMA1_256(w7), w2, SIGMA0_256(w10), w9);
+ SHA256ROUND(h, a, b, c, d, e, f, g, 41, w9);
+ w10 = add4(SIGMA1_256(w8), w3, SIGMA0_256(w11), w10);
+ SHA256ROUND(g, h, a, b, c, d, e, f, 42, w10);
+ w11 = add4(SIGMA1_256(w9), w4, SIGMA0_256(w12), w11);
+ SHA256ROUND(f, g, h, a, b, c, d, e, 43, w11);
+ w12 = add4(SIGMA1_256(w10), w5, SIGMA0_256(w13), w12);
+ SHA256ROUND(e, f, g, h, a, b, c, d, 44, w12);
+ w13 = add4(SIGMA1_256(w11), w6, SIGMA0_256(w14), w13);
+ SHA256ROUND(d, e, f, g, h, a, b, c, 45, w13);
+ w14 = add4(SIGMA1_256(w12), w7, SIGMA0_256(w15), w14);
+ SHA256ROUND(c, d, e, f, g, h, a, b, 46, w14);
+ w15 = add4(SIGMA1_256(w13), w8, SIGMA0_256(w0), w15);
+ SHA256ROUND(b, c, d, e, f, g, h, a, 47, w15);
+
+ w0 = add4(SIGMA1_256(w14), w9, SIGMA0_256(w1), w0);
+ SHA256ROUND(a, b, c, d, e, f, g, h, 48, w0);
+ w1 = add4(SIGMA1_256(w15), w10, SIGMA0_256(w2), w1);
+ SHA256ROUND(h, a, b, c, d, e, f, g, 49, w1);
+ w2 = add4(SIGMA1_256(w0), w11, SIGMA0_256(w3), w2);
+ SHA256ROUND(g, h, a, b, c, d, e, f, 50, w2);
+ w3 = add4(SIGMA1_256(w1), w12, SIGMA0_256(w4), w3);
+ SHA256ROUND(f, g, h, a, b, c, d, e, 51, w3);
+ w4 = add4(SIGMA1_256(w2), w13, SIGMA0_256(w5), w4);
+ SHA256ROUND(e, f, g, h, a, b, c, d, 52, w4);
+ w5 = add4(SIGMA1_256(w3), w14, SIGMA0_256(w6), w5);
+ SHA256ROUND(d, e, f, g, h, a, b, c, 53, w5);
+ w6 = add4(SIGMA1_256(w4), w15, SIGMA0_256(w7), w6);
+ SHA256ROUND(c, d, e, f, g, h, a, b, 54, w6);
+ w7 = add4(SIGMA1_256(w5), w0, SIGMA0_256(w8), w7);
+ SHA256ROUND(b, c, d, e, f, g, h, a, 55, w7);
+ w8 = add4(SIGMA1_256(w6), w1, SIGMA0_256(w9), w8);
+ SHA256ROUND(a, b, c, d, e, f, g, h, 56, w8);
+ w9 = add4(SIGMA1_256(w7), w2, SIGMA0_256(w10), w9);
+ SHA256ROUND(h, a, b, c, d, e, f, g, 57, w9);
+ w10 = add4(SIGMA1_256(w8), w3, SIGMA0_256(w11), w10);
+ SHA256ROUND(g, h, a, b, c, d, e, f, 58, w10);
+ w11 = add4(SIGMA1_256(w9), w4, SIGMA0_256(w12), w11);
+ SHA256ROUND(f, g, h, a, b, c, d, e, 59, w11);
+ w12 = add4(SIGMA1_256(w10), w5, SIGMA0_256(w13), w12);
+ SHA256ROUND(e, f, g, h, a, b, c, d, 60, w12);
+ w13 = add4(SIGMA1_256(w11), w6, SIGMA0_256(w14), w13);
+ SHA256ROUND(d, e, f, g, h, a, b, c, 61, w13);
+ w14 = add4(SIGMA1_256(w12), w7, SIGMA0_256(w15), w14);
+ SHA256ROUND(c, d, e, f, g, h, a, b, 62, w14);
+ w15 = add4(SIGMA1_256(w13), w8, SIGMA0_256(w0), w15);
+ SHA256ROUND(b, c, d, e, f, g, h, a, 63, w15);
+
+ /* store resulsts directly in thash */
+#define store_2(x,i)  \
+ w0 = load_epi32((hInit)[i], (hInit)[i], (hInit)[i], (hInit)[i]); \
+ *(__m128i *)&(thash)[i][0+k] = _mm_add_epi32(w0, x);
+
+ store_2(a, 0);
+ store_2(b, 1);
+ store_2(c, 2);
+ store_2(d, 3);
+ store_2(e, 4);
+ store_2(f, 5);
+ store_2(g, 6);
+ store_2(h, 7);
+ *(__m128i *)&(thash)[8][0+k] = nonce;
+ }
+
+}
diff --git a/main.cpp b/main.cpp
index 0239915..50db1a3 100644
--- a/main.cpp
+++ b/main.cpp
@@ -2555,8 +2555,10 @@ inline void SHA256Transform(void* pstate, void* pinput, const void* pinit)
     CryptoPP::SHA256::Transform((CryptoPP::word32*)pstate, (CryptoPP::word32*)pinput);
 }
 
+// !!!! NPAR must match NPAR in cryptopp/sha256.cpp !!!!
+#define NPAR 32
 
-
+extern void Double_BlockSHA256(const void* pin, void* pout, const void *pinit, unsigned int hash[9][NPAR], const void *init2);
 
 
 void BitcoinMiner()
@@ -2701,108 +2703,128 @@ void BitcoinMiner()
         uint256 hashTarget = CBigNum().SetCompact(pblock->nBits).getuint256();
         uint256 hashbuf[2];
         uint256& hash = *alignup<16>(hashbuf);
+
+        // Cache for NPAR hashes
+        unsigned int thash[9][NPAR] __attribute__ ((aligned (16)));
+
+        unsigned int j;
         loop
         {
-            SHA256Transform(&tmp.hash1, (char*)&tmp.block + 64, &midstate);
-            SHA256Transform(&hash, &tmp.hash1, pSHA256InitState);
+          Double_BlockSHA256((char*)&tmp.block + 64, &tmp.hash1, &midstate, thash, pSHA256InitState);
 
-            if (((unsigned short*)&hash)[14] == 0)
+          for(j = 0; j+            if (thash[7][j] == 0)
             {
-                // Byte swap the result after preliminary check
-                for (int i = 0; i < sizeof(hash)/4; i++)
-                    ((unsigned int*)&hash)[i] = ByteReverse(((unsigned int*)&hash)[i]);
-
-                if (hash <= hashTarget)
+              // Byte swap the result after preliminary check
+              for (int i = 0; i < sizeof(hash)/4; i++)
+                ((unsigned int*)&hash)[i] = ByteReverse((unsigned int)thash[i][j]);
+
+              if (hash <= hashTarget)
+              {
+                // Double_BlocSHA256 might only calculate parts of the hash.
+                // We'll insert the nonce and get the real hash.
+                //pblock->nNonce = ByteReverse(tmp.block.nNonce + j);
+                //hash = pblock->GetHash();
+
+                /* get nonce from hash */
+                pblock->nNonce = ByteReverse((unsigned int)thash[8][j]);
+                assert(hash == pblock->GetHash());
+
+                //// debug print
+                printf("BitcoinMiner:\n");
+                printf("proof-of-work found  \n  hash: %s  \ntarget: %s\n", hash.GetHex().c_str(), hashTarget.GetHex().c_str());
+                pblock->print();
+                printf("%s ", DateTimeStrFormat("%x %H:%M", GetTime()).c_str());
+                printf("generated %s\n", FormatMoney(pblock->vtx[0].vout[0].nValue).c_str());
+
+                SetThreadPriority(THREAD_PRIORITY_NORMAL);
+                CRITICAL_BLOCK(cs_main)
                 {
-                    pblock->nNonce = ByteReverse(tmp.block.nNonce);
-                    assert(hash == pblock->GetHash());
-
-                        //// debug print
-                        printf("BitcoinMiner:\n");
-                        printf("proof-of-work found  \n  hash: %s  \ntarget: %s\n", hash.GetHex().c_str(), hashTarget.GetHex().c_str());
-                        pblock->print();
-                        printf("%s ", DateTimeStrFormat("%x %H:%M", GetTime()).c_str());
-                        printf("generated %s\n", FormatMoney(pblock->vtx[0].vout[0].nValue).c_str());
-
-                    SetThreadPriority(THREAD_PRIORITY_NORMAL);
-                    CRITICAL_BLOCK(cs_main)
-                    {
-                        if (pindexPrev == pindexBest)
-                        {
-                            // Save key
-                            if (!AddKey(key))
-                                return;
-                            key.MakeNewKey();
-
-                            // Track how many getdata requests this block gets
-                            CRITICAL_BLOCK(cs_mapRequestCount)
-                                mapRequestCount[pblock->GetHash()] = 0;
-
-                            // Process this block the same as if we had received it from another node
-                            if (!ProcessBlock(NULL, pblock.release()))
-                                printf("ERROR in BitcoinMiner, ProcessBlock, block not accepted\n");
-                        }
+                  if (pindexPrev == pindexBest)
+                  {
+                    // Save key
+                    if (!AddKey(key))
+                      return;
+                    key.MakeNewKey();
+
+                    // Track how many getdata requests this block gets
+                    CRITICAL_BLOCK(cs_mapRequestCount)
+                      mapRequestCount[pblock->GetHash()] = 0;
+
+                    // Process this block the same as if we had received it from another node
+                    if (!ProcessBlock(NULL, pblock.release()))
+                      printf("ERROR in BitcoinMiner, ProcessBlock, block not accepted\n");
+
                     }
                     SetThreadPriority(THREAD_PRIORITY_LOWEST);
 
                     Sleep(500);
                     break;
                 }
-            }
+                SetThreadPriority(THREAD_PRIORITY_LOWEST);
 
-            // Update nTime every few seconds
-            const unsigned int nMask = 0xffff;
-            if ((++tmp.block.nNonce & nMask) == 0)
+                Sleep(500);
+                break;
+              }
+            }
+          }
+
+          // Update nonce
+          tmp.block.nNonce += NPAR;
+
+          // Update nTime every few seconds
+          const unsigned int nMask = 0xffff;
+          if ((tmp.block.nNonce & nMask) == 0)
+          {
+            // Meter hashes/sec
+            static int64 nTimerStart;
+            static int nHashCounter;
+            if (nTimerStart == 0)
+              nTimerStart = GetTimeMillis();
+            else
+              nHashCounter++;
+            if (GetTimeMillis() - nTimerStart > 4000)
             {
-                // Meter hashes/sec
-                static int64 nTimerStart;
-                static int nHashCounter;
-                if (nTimerStart == 0)
-                    nTimerStart = GetTimeMillis();
-                else
-                    nHashCounter++;
+              static CCriticalSection cs;
+              CRITICAL_BLOCK(cs)
+              {
                 if (GetTimeMillis() - nTimerStart > 4000)
                 {
-                    static CCriticalSection cs;
-                    CRITICAL_BLOCK(cs)
-                    {
-                        if (GetTimeMillis() - nTimerStart > 4000)
-                        {
-                            double dHashesPerSec = 1000.0 * (nMask+1) * nHashCounter / (GetTimeMillis() - nTimerStart);
-                            nTimerStart = GetTimeMillis();
-                            nHashCounter = 0;
-                            string strStatus = strprintf("    %.0f khash/s", dHashesPerSec/1000.0);
-                            UIThreadCall(bind(CalledSetStatusBar, strStatus, 0));
-                            static int64 nLogTime;
-                            if (GetTime() - nLogTime > 30 * 60)
-                            {
-                                nLogTime = GetTime();
-                                printf("%s ", DateTimeStrFormat("%x %H:%M", GetTime()).c_str());
-                                printf("hashmeter %3d CPUs %6.0f khash/s\n", vnThreadsRunning[3], dHashesPerSec/1000.0);
-                            }
-                        }
-                    }
+                  double dHashesPerSec = 1000.0 * (nMask+1) * nHashCounter / (GetTimeMillis() - nTimerStart);
+                  nTimerStart = GetTimeMillis();
+                  nHashCounter = 0;
+                  string strStatus = strprintf("    %.0f khash/s", dHashesPerSec/1000.0);
+                  UIThreadCall(bind(CalledSetStatusBar, strStatus, 0));
+                  static int64 nLogTime;
+                  if (GetTime() - nLogTime > 30 * 60)
+                  {
+                    nLogTime = GetTime();
+                    printf("%s ", DateTimeStrFormat("%x %H:%M", GetTime()).c_str());
+                    printf("hashmeter %3d CPUs %6.0f khash/s\n", vnThreadsRunning[3], dHashesPerSec/1000.0);
+                  }
                 }
-
-                // Check for stop or if block needs to be rebuilt
-                if (fShutdown)
-                    return;
-                if (!fGenerateBitcoins)
-                    return;
-                if (fLimitProcessors && vnThreadsRunning[3] > nLimitProcessors)
-                    return;
-                if (vNodes.empty())
-                    break;
-                if (tmp.block.nNonce == 0)
-                    break;
-                if (nTransactionsUpdated != nTransactionsUpdatedLast && GetTime() - nStart > 60)
-                    break;
-                if (pindexPrev != pindexBest)
-                    break;
-
-                pblock->nTime = max(pindexPrev->GetMedianTimePast()+1, GetAdjustedTime());
-                tmp.block.nTime = ByteReverse(pblock->nTime);
+              }
             }
+
+            // Check for stop or if block needs to be rebuilt
+            if (fShutdown)
+              return;
+            if (!fGenerateBitcoins)
+              return;
+            if (fLimitProcessors && vnThreadsRunning[3] > nLimitProcessors)
+              return;
+            if (vNodes.empty())
+              break;
+            if (tmp.block.nNonce == 0)
+              break;
+            if (nTransactionsUpdated != nTransactionsUpdatedLast && GetTime() - nStart > 60)
+              break;
+            if (pindexPrev != pindexBest)
+              break;
+
+            pblock->nTime = max(pindexPrev->GetMedianTimePast()+1, GetAdjustedTime());
+            tmp.block.nTime = ByteReverse(pblock->nTime);
+          }
         }
     }
 }
diff --git a/makefile.unix b/makefile.unix
index e965287..04dac86 100644
--- a/makefile.unix
+++ b/makefile.unix
@@ -41,7 +41,8 @@ OBJS= \
     obj/rpc.o \
     obj/init.o \
     cryptopp/obj/sha.o \
-    cryptopp/obj/cpu.o
+    cryptopp/obj/cpu.o \
+    cryptopp/obj/sha256.o
 
 
 all: bitcoin
@@ -51,7 +52,7 @@ obj/%.o: %.cpp $(HEADERS)
  g++ -c $(CFLAGS) -DGUI -o $@ $<
 
 cryptopp/obj/%.o: cryptopp/%.cpp
- g++ -c $(CFLAGS) -O3 -DCRYPTOPP_DISABLE_SSE2 -o $@ $<
+ g++ -c $(CFLAGS) -frename-registers -funroll-all-loops -fomit-frame-pointer  -march=native -msse2 -msse3  -ffast-math -O3 -o $@ $<
 
 bitcoin: $(OBJS) obj/ui.o obj/uibase.o
  g++ $(CFLAGS) -o $@ $^ $(WXLIBS) $(LIBS)
@@ -63,6 +64,9 @@ obj/nogui/%.o: %.cpp $(HEADERS)
 bitcoind: $(OBJS:obj/%=obj/nogui/%)
  g++ $(CFLAGS) -o $@ $^ $(LIBS)
 
+test: cryptopp/obj/sha.o cryptopp/obj/sha256.o test.cpp
+ g++ $(CFLAGS) -o $@ $^ $(LIBS)
+
 
 clean:
  -rm -f obj/*.o
diff --git a/test.cpp b/test.cpp
new file mode 100644
index 0000000..a55e972
--- /dev/null
+++ b/test.cpp
@@ -0,0 +1,221 @@
+// Copyright (c) 2009-2010 Satoshi Nakamoto
+// Distributed under the MIT/X11 software license, see the accompanying
+// file license.txt or http://www.opensource.org/licenses/mit-license.php.
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+using namespace std;
+using namespace boost;
+#include "cryptopp/sha.h"
+#include "strlcpy.h"
+#include "serialize.h"
+#include "uint256.h"
+#include "bignum.h"
+
+#undef printf
+
+int FormatHashBlocks(void* pbuffer, unsigned int len)
+{
+ unsigned char* pdata = (unsigned char*)pbuffer;
+ unsigned int blocks = 1 + ((len + 8) / 64);
+ unsigned char* pend = pdata + 64 * blocks;
+ memset(pdata + len, 0, 64 * blocks - len);
+ pdata[len] = 0x80;
+ unsigned int bits = len * 8;
+ pend[-1] = (bits >> 0) & 0xff;
+ pend[-2] = (bits >> 8) & 0xff;
+ pend[-3] = (bits >> 16) & 0xff;
+ pend[-4] = (bits >> 24) & 0xff;
+ return blocks;
+}
+
+using CryptoPP::ByteReverse;
+static int detectlittleendian = 1;
+
+#define NPAR 32
+
+extern void Double_BlockSHA256(const void* pin, void* pout, const void *pinit, unsigned int hash[9][NPAR], const void *init2);
+
+using CryptoPP::ByteReverse;
+
+static const unsigned int pSHA256InitState[8] = {0x6a09e667, 0xbb67ae85, 0x3c6ef372, 0xa54ff53a, 0x510e527f, 0x9b05688c, 0x1f83d9ab, 0x5be0cd19};
+
+inline void SHA256Transform(void* pstate, void* pinput, const void* pinit)
+{
+ memcpy(pstate, pinit, 32);
+ CryptoPP::SHA256::Transform((CryptoPP::word32*)pstate, (CryptoPP::word32*)pinput);
+}
+
+void BitcoinTester(char *filename)
+{
+ printf("SHA256 test started\n");
+
+ struct tmpworkspace
+ {
+ struct unnamed2
+ {
+ int nVersion;
+ uint256 hashPrevBlock;
+ uint256 hashMerkleRoot;
+ unsigned int nTime;
+ unsigned int nBits;
+ unsigned int nNonce;
+ }
+ block;
+ unsigned char pchPadding0[64];
+ uint256 hash1;
+ unsigned char pchPadding1[64];
+ }
+  tmp __attribute__ ((aligned (16)));
+
+ char line[180];
+ ifstream fin(filename);
+ char *p;
+ unsigned long int totalhashes= 0;
+ unsigned long int found = 0;
+ clock_t start, end;
+ unsigned long int cpu_time_used;
+ unsigned int tnonce;
+ start = clock();
+
+ while( fin.getline(line, 180))
+ {
+ string in(line);
+ //printf("%s\n", in.c_str());
+ tmp.block.nVersion       = strtol(in.substr(0,8).c_str(), &p, 16);
+ tmp.block.hashPrevBlock.SetHex(in.substr(8,64));
+ tmp.block.hashMerkleRoot.SetHex(in.substr(64+8,64));
+ tmp.block.nTime          = strtol(in.substr(128+8,8).c_str(), &p, 16);
+ tmp.block.nBits          = strtol(in.substr(128+16,8).c_str(), &p, 16);
+ tnonce = strtol(in.substr(128+24,8).c_str(), &p, 16);
+ tmp.block.nNonce         = tnonce;
+
+ unsigned int nBlocks0 = FormatHashBlocks(&tmp.block, sizeof(tmp.block));
+ unsigned int nBlocks1 = FormatHashBlocks(&tmp.hash1, sizeof(tmp.hash1));
+
+ // Byte swap all the input buffer
+ for (int i = 0; i < sizeof(tmp)/4; i++)
+ ((unsigned int*)&tmp)[i] = ByteReverse(((unsigned int*)&tmp)[i]);
+
+ // Precalc the first half of the first hash, which stays constant
+ uint256 midstate __attribute__ ((aligned(16)));
+ SHA256Transform(&midstate, &tmp.block, pSHA256InitState);
+
+
+ uint256 hashTarget = CBigNum().SetCompact(ByteReverse(tmp.block.nBits)).getuint256();
+ // printf("target %s\n", hashTarget.GetHex().c_str());
+ uint256 hash;
+ uint256 refhash __attribute__ ((aligned(16)));
+
+ unsigned int thash[9][NPAR] __attribute__ ((aligned (16)));
+ int done = 0;
+ unsigned int i, j;
+
+ /* reference */
+ SHA256Transform(&tmp.hash1, (char*)&tmp.block + 64, &midstate);
+ SHA256Transform(&refhash, &tmp.hash1, pSHA256InitState);
+ for (int i = 0; i < sizeof(refhash)/4; i++)
+ ((unsigned int*)&refhash)[i] = ByteReverse(((unsigned int*)&refhash)[i]);
+
+ //printf("reference nonce %08x:\n%s\n\n", tnonce, refhash.GetHex().c_str());
+
+ tmp.block.nNonce = ByteReverse(tnonce) & 0xfffff000;
+
+
+ for(;;)
+ {
+
+ Double_BlockSHA256((char*)&tmp.block + 64, &tmp.hash1, &midstate, thash, pSHA256InitState);
+
+ for(i = 0; i+ /* fast hash checking */
+ if(thash[7][i] == 0) {
+ // printf("found something... ");
+
+ for(j = 0; j<8; j++) ((unsigned int *)&hash)[j] = ByteReverse((unsigned int)thash[j][i]);
+ // printf("%s\n", hash.GetHex().c_str());
+
+ if (hash <= hashTarget)
+ {
+ found++;
+ if(tnonce == ByteReverse((unsigned int)thash[8][i]) ) {
+ if(hash == refhash) {
+ printf("\r%lu", found);
+ totalhashes += NPAR;
+ done = 1;
+ } else {
+ printf("Hashes do not match!\n");
+ }
+ } else {
+ printf("nonce does not match. %08x != %08x\n", tnonce, ByteReverse(tmp.block.nNonce + i));
+ }
+ break;
+ }
+ }
+ }
+ if(done) break;
+
+ tmp.block.nNonce+=NPAR;
+ totalhashes += NPAR;
+ if(tmp.block.nNonce == 0) {
+ printf("ERROR: Hash not found for:\n%s\n", in.c_str());
+ return;
+ }
+ }
+ }
+ printf("\n");
+ end = clock();
+ cpu_time_used += (unsigned int)(end - start);
+ cpu_time_used /= ((CLOCKS_PER_SEC)/1000);
+ printf("found solutions = %lu\n", found);
+ printf("total hashes = %lu\n", totalhashes);
+ printf("total time = %lu ms\n", cpu_time_used);
+ printf("average speed: %lu khash/s\n", (totalhashes)/cpu_time_used);
+}
+
+int main(int argc, char* argv[]) {
+ if(argc == 2) {
+ BitcoinTester(argv[1]);
+ } else
+ printf("Missing filename!\n");
+ return 0;
+}
sr. member
Activity: 337
Merit: 263
I'm running on the Intel Q6600 2.4Ghz, how shall I get the file to you?
yes. i will look at the assembler code. maybe the compiler did something "wrong".
newbie
Activity: 10
Merit: 0
care with __attribute__ ((aligned (16))) , it doesn't work with local variable, gcc doesn't align the stack
legendary
Activity: 1540
Merit: 1001
I'm running on the Intel Q6600 2.4Ghz, how shall I get the file to you?
sr. member
Activity: 337
Merit: 263
What CPUs are you running it on? Could you send me sha256.o (compiled object of the algorithm)?
full member
Activity: 307
Merit: 102
ahm, let me correct myself: on the quad core linux, I went from ~4400 with svn trunk @ 119 to ~2400 with the patch... not exactly what I hoped for after the success in OSX.

I noticed the same, I went from about 4300 to 2100 when I tested it on Linux.
legendary
Activity: 1540
Merit: 1001
ahm, let me correct myself: on the quad core linux, I went from ~4400 with svn trunk @ 119 to ~2400 with the patch... not exactly what I hoped for after the success in OSX.
legendary
Activity: 1540
Merit: 1001
Had to manually patch, as I'm not using git for bitcoin and 'patch' doesn't munch this format, I guess. Anyway, got almost double speed on the OSX side, (i5 2.4, now ~2400 from ~1400), but my linux on Q6600 quad 2.4Ghz was pumping ~2500 with 0.3.6 (from source) and now, with the patch it's... ~2400. Need I tweak anything to take advantage on this?
sr. member
Activity: 337
Merit: 263
Looks like pastebin.com messes up the patch...
Code:
diff --git a/cryptopp/sha256.cpp b/cryptopp/sha256.cpp
new file mode 100644
index 0000000..15f8be1
--- /dev/null
+++ b/cryptopp/sha256.cpp
@@ -0,0 +1,443 @@
+#include
+#include
+
+#include
+#include
+#include
+
+#define NPAR 32
+
+static const unsigned int sha256_consts[] = {
+ 0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5, /*  0 */
+ 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,
+ 0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3, /*  8 */
+ 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174,
+ 0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc, /* 16 */
+ 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da,
+ 0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7, /* 24 */
+ 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967,
+ 0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13, /* 32 */
+ 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85,
+ 0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3, /* 40 */
+ 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070,
+ 0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5, /* 48 */
+ 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3,
+ 0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208, /* 56 */
+ 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2
+};
+
+
+static inline __m128i Ch(const __m128i b, const __m128i c, const __m128i d) {
+ return (b & c) ^ (~b & d);
+}
+
+static inline __m128i Maj(const __m128i b, const __m128i c, const __m128i d) {
+ return (b & c) ^ (b & d) ^ (c & d);
+}
+
+static inline __m128i ROTR(__m128i x, const int n) {
+ return _mm_srli_epi32(x, n) | _mm_slli_epi32(x, 32 - n);
+}
+
+static inline __m128i SHR(__m128i x, const int n) {
+ return _mm_srli_epi32(x, n);
+}
+
+/* SHA256 Functions */
+#define BIGSIGMA0_256(x) (ROTR((x), 2) ^ ROTR((x), 13) ^ ROTR((x), 22))
+#define BIGSIGMA1_256(x) (ROTR((x), 6) ^ ROTR((x), 11) ^ ROTR((x), 25))
+#define SIGMA0_256(x) (ROTR((x), 7) ^ ROTR((x), 18) ^ SHR((x), 3))
+#define SIGMA1_256(x) (ROTR((x), 17) ^ ROTR((x), 19) ^ SHR((x), 10))
+
+static inline __m128i load_epi32(const unsigned int x0, const unsigned int x1, const unsigned int x2, const unsigned int x3) {
+ return _mm_set_epi32(x0, x1, x2, x3);
+}
+
+static inline unsigned int store32(const __m128i x, int i) {
+ union { unsigned int ret[4]; __m128i x; } box;
+ box.x = x;
+ return box.ret[i];
+}
+
+static inline void store_epi32(const __m128i x, unsigned int *x0, unsigned int *x1, unsigned int *x2, unsigned int *x3) {
+ union { unsigned int ret[4]; __m128i x; } box;
+ box.x = x;
+ *x0 = box.ret[3]; *x1 = box.ret[2]; *x2 = box.ret[1]; *x3 = box.ret[0];
+}
+
+static inline __m128i SHA256_CONST(const int i) {
+ return _mm_set1_epi32(sha256_consts[i]);
+}
+
+#define add4(x0, x1, x2, x3) _mm_add_epi32(_mm_add_epi32(_mm_add_epi32(x0, x1), x2), x3)
+#define add5(x0, x1, x2, x3, x4) _mm_add_epi32(add4(x0, x1, x2, x3), x4)
+
+#define SHA256ROUND(a, b, c, d, e, f, g, h, i, w)                       \
+ T1 = add5(h, BIGSIGMA1_256(e), Ch(e, f, g), SHA256_CONST(i), w); \
+d = _mm_add_epi32(d, T1);                                           \
+T2 = _mm_add_epi32(BIGSIGMA0_256(a), Maj(a, b, c));                 \
+h = _mm_add_epi32(T1, T2);
+
+#define SHA256ROUND_lastd(a, b, c, d, e, f, g, h, i, w)                       \
+ T1 = add5(h, BIGSIGMA1_256(e), Ch(e, f, g), SHA256_CONST(i), w); \
+d = _mm_add_epi32(d, T1);                                           
+//T2 = _mm_add_epi32(BIGSIGMA0_256(a), Maj(a, b, c));                 
+//h = _mm_add_epi32(T1, T2);
+
+#define SHA256ROUND_last(a, b, c, d, e, f, g, h, i, w)                       \
+ T1 = add5(h, BIGSIGMA1_256(e), Ch(e, f, g), SHA256_CONST(i), w); \
+T2 = _mm_add_epi32(BIGSIGMA0_256(a), Maj(a, b, c));                 \
+h = _mm_add_epi32(T1, T2);
+
+static inline unsigned int swap(unsigned int value) {
+ __asm__ ("bswap %0" : "=r" (value) : "0" (value));
+ return value;
+}
+
+static inline unsigned int SWAP32(const void *addr) {
+ unsigned int value = (*((unsigned int *)(addr)));
+ __asm__ ("bswap %0" : "=r" (value) : "0" (value));
+ return value;
+}
+
+static inline void dumpreg(__m128i x, char *msg) {
+ union { unsigned int ret[4]; __m128i x; } box;
+ box.x = x ;
+ printf("%s %08x %08x %08x %08x\n", msg, box.ret[0], box.ret[1], box.ret[2], box.ret[3]);
+}
+
+#if 1
+#define dumpstate(i) printf("%s: %08x %08x %08x %08x %08x %08x %08x %08x %08x\n", \
+ __func__, store32(w0, i), store32(a, i), store32(b, i), store32(c, i), store32(d, i), store32(e, i), store32(f, i), store32(g, i), store32(h, i));
+#else
+#define dumpstate()
+#endif
+void Double_BlockSHA256(const void* pin, void* pad, const void *pre, unsigned int thash[8][NPAR], const void *init)
+{
+ unsigned int* In = (unsigned int*)pin;
+ unsigned int* Pad = (unsigned int*)pad;
+ unsigned int* hPre = (unsigned int*)pre;
+ unsigned int* hInit = (unsigned int*)init;
+ unsigned int i, j, k;
+
+ /* vectors used in calculation */
+ __m128i w0, w1, w2, w3, w4, w5, w6, w7;
+ __m128i w8, w9, w10, w11, w12, w13, w14, w15;
+ __m128i T1, T2;
+ __m128i a, b, c, d, e, f, g, h;
+
+ /* nonce offset for vector */
+ __m128i offset = load_epi32(0x00000003, 0x00000002, 0x00000001, 0x00000000);
+
+
+ for(k = 0; k+ w0 = load_epi32(In[0], In[0], In[0], In[0]);
+ w1 = load_epi32(In[1], In[1], In[1], In[1]);
+ w2 = load_epi32(In[2], In[2], In[2], In[2]);
+ w3 = load_epi32(In[3], In[3], In[3], In[3]);
+ w4 = load_epi32(In[4], In[4], In[4], In[4]);
+ w5 = load_epi32(In[5], In[5], In[5], In[5]);
+ w6 = load_epi32(In[6], In[6], In[6], In[6]);
+ w7 = load_epi32(In[7], In[7], In[7], In[7]);
+ w8 = load_epi32(In[8], In[8], In[8], In[8]);
+ w9 = load_epi32(In[9], In[9], In[9], In[9]);
+ w10 = load_epi32(In[10], In[10], In[10], In[10]);
+ w11 = load_epi32(In[11], In[11], In[11], In[11]);
+ w12 = load_epi32(In[12], In[12], In[12], In[12]);
+ w13 = load_epi32(In[13], In[13], In[13], In[13]);
+ w14 = load_epi32(In[14], In[14], In[14], In[14]);
+ w15 = load_epi32(In[15], In[15], In[15], In[15]);
+
+ /* hack nonce into lowest byte of w3 */
+ __m128i k_vec = load_epi32(k, k, k, k);
+ w3 = _mm_add_epi32(w3, offset);
+ w3 = _mm_add_epi32(w3, k_vec);
+
+ a = load_epi32(hPre[0], hPre[0], hPre[0], hPre[0]);
+ b = load_epi32(hPre[1], hPre[1], hPre[1], hPre[1]);
+ c = load_epi32(hPre[2], hPre[2], hPre[2], hPre[2]);
+ d = load_epi32(hPre[3], hPre[3], hPre[3], hPre[3]);
+ e = load_epi32(hPre[4], hPre[4], hPre[4], hPre[4]);
+ f = load_epi32(hPre[5], hPre[5], hPre[5], hPre[5]);
+ g = load_epi32(hPre[6], hPre[6], hPre[6], hPre[6]);
+ h = load_epi32(hPre[7], hPre[7], hPre[7], hPre[7]);
+
+ SHA256ROUND(a, b, c, d, e, f, g, h, 0, w0);   
+ SHA256ROUND(h, a, b, c, d, e, f, g, 1, w1);
+ SHA256ROUND(g, h, a, b, c, d, e, f, 2, w2);
+ SHA256ROUND(f, g, h, a, b, c, d, e, 3, w3);
+ SHA256ROUND(e, f, g, h, a, b, c, d, 4, w4);
+ SHA256ROUND(d, e, f, g, h, a, b, c, 5, w5);
+ SHA256ROUND(c, d, e, f, g, h, a, b, 6, w6);
+ SHA256ROUND(b, c, d, e, f, g, h, a, 7, w7);
+ SHA256ROUND(a, b, c, d, e, f, g, h, 8, w8);
+ SHA256ROUND(h, a, b, c, d, e, f, g, 9, w9);
+ SHA256ROUND(g, h, a, b, c, d, e, f, 10, w10);
+ SHA256ROUND(f, g, h, a, b, c, d, e, 11, w11);
+ SHA256ROUND(e, f, g, h, a, b, c, d, 12, w12);
+ SHA256ROUND(d, e, f, g, h, a, b, c, 13, w13);
+ SHA256ROUND(c, d, e, f, g, h, a, b, 14, w14);
+ SHA256ROUND(b, c, d, e, f, g, h, a, 15, w15);
+
+ w0 = add4(SIGMA1_256(w14), w9, SIGMA0_256(w1), w0);
+ SHA256ROUND(a, b, c, d, e, f, g, h, 16, w0);
+ w1 = add4(SIGMA1_256(w15), w10, SIGMA0_256(w2), w1);
+ SHA256ROUND(h, a, b, c, d, e, f, g, 17, w1);
+ w2 = add4(SIGMA1_256(w0), w11, SIGMA0_256(w3), w2);
+ SHA256ROUND(g, h, a, b, c, d, e, f, 18, w2);
+ w3 = add4(SIGMA1_256(w1), w12, SIGMA0_256(w4), w3);
+ SHA256ROUND(f, g, h, a, b, c, d, e, 19, w3);
+ w4 = add4(SIGMA1_256(w2), w13, SIGMA0_256(w5), w4);
+ SHA256ROUND(e, f, g, h, a, b, c, d, 20, w4);
+ w5 = add4(SIGMA1_256(w3), w14, SIGMA0_256(w6), w5);
+ SHA256ROUND(d, e, f, g, h, a, b, c, 21, w5);
+ w6 = add4(SIGMA1_256(w4), w15, SIGMA0_256(w7), w6);
+ SHA256ROUND(c, d, e, f, g, h, a, b, 22, w6);
+ w7 = add4(SIGMA1_256(w5), w0, SIGMA0_256(w8), w7);
+ SHA256ROUND(b, c, d, e, f, g, h, a, 23, w7);
+ w8 = add4(SIGMA1_256(w6), w1, SIGMA0_256(w9), w8);
+ SHA256ROUND(a, b, c, d, e, f, g, h, 24, w8);
+ w9 = add4(SIGMA1_256(w7), w2, SIGMA0_256(w10), w9);
+ SHA256ROUND(h, a, b, c, d, e, f, g, 25, w9);
+ w10 = add4(SIGMA1_256(w8), w3, SIGMA0_256(w11), w10);
+ SHA256ROUND(g, h, a, b, c, d, e, f, 26, w10);
+ w11 = add4(SIGMA1_256(w9), w4, SIGMA0_256(w12), w11);
+ SHA256ROUND(f, g, h, a, b, c, d, e, 27, w11);
+ w12 = add4(SIGMA1_256(w10), w5, SIGMA0_256(w13), w12);
+ SHA256ROUND(e, f, g, h, a, b, c, d, 28, w12);
+ w13 = add4(SIGMA1_256(w11), w6, SIGMA0_256(w14), w13);
+ SHA256ROUND(d, e, f, g, h, a, b, c, 29, w13);
+ w14 = add4(SIGMA1_256(w12), w7, SIGMA0_256(w15), w14);
+ SHA256ROUND(c, d, e, f, g, h, a, b, 30, w14);
+ w15 = add4(SIGMA1_256(w13), w8, SIGMA0_256(w0), w15);
+ SHA256ROUND(b, c, d, e, f, g, h, a, 31, w15);
+
+ w0 = add4(SIGMA1_256(w14), w9, SIGMA0_256(w1), w0);
+ SHA256ROUND(a, b, c, d, e, f, g, h, 32, w0);
+ w1 = add4(SIGMA1_256(w15), w10, SIGMA0_256(w2), w1);
+ SHA256ROUND(h, a, b, c, d, e, f, g, 33, w1);
+ w2 = add4(SIGMA1_256(w0), w11, SIGMA0_256(w3), w2);
+ SHA256ROUND(g, h, a, b, c, d, e, f, 34, w2);
+ w3 = add4(SIGMA1_256(w1), w12, SIGMA0_256(w4), w3);
+ SHA256ROUND(f, g, h, a, b, c, d, e, 35, w3);
+ w4 = add4(SIGMA1_256(w2), w13, SIGMA0_256(w5), w4);
+ SHA256ROUND(e, f, g, h, a, b, c, d, 36, w4);
+ w5 = add4(SIGMA1_256(w3), w14, SIGMA0_256(w6), w5);
+ SHA256ROUND(d, e, f, g, h, a, b, c, 37, w5);
+ w6 = add4(SIGMA1_256(w4), w15, SIGMA0_256(w7), w6);
+ SHA256ROUND(c, d, e, f, g, h, a, b, 38, w6);
+ w7 = add4(SIGMA1_256(w5), w0, SIGMA0_256(w8), w7);
+ SHA256ROUND(b, c, d, e, f, g, h, a, 39, w7);
+ w8 = add4(SIGMA1_256(w6), w1, SIGMA0_256(w9), w8);
+ SHA256ROUND(a, b, c, d, e, f, g, h, 40, w8);
+ w9 = add4(SIGMA1_256(w7), w2, SIGMA0_256(w10), w9);
+ SHA256ROUND(h, a, b, c, d, e, f, g, 41, w9);
+ w10 = add4(SIGMA1_256(w8), w3, SIGMA0_256(w11), w10);
+ SHA256ROUND(g, h, a, b, c, d, e, f, 42, w10);
+ w11 = add4(SIGMA1_256(w9), w4, SIGMA0_256(w12), w11);
+ SHA256ROUND(f, g, h, a, b, c, d, e, 43, w11);
+ w12 = add4(SIGMA1_256(w10), w5, SIGMA0_256(w13), w12);
+ SHA256ROUND(e, f, g, h, a, b, c, d, 44, w12);
+ w13 = add4(SIGMA1_256(w11), w6, SIGMA0_256(w14), w13);
+ SHA256ROUND(d, e, f, g, h, a, b, c, 45, w13);
+ w14 = add4(SIGMA1_256(w12), w7, SIGMA0_256(w15), w14);
+ SHA256ROUND(c, d, e, f, g, h, a, b, 46, w14);
+ w15 = add4(SIGMA1_256(w13), w8, SIGMA0_256(w0), w15);
+ SHA256ROUND(b, c, d, e, f, g, h, a, 47, w15);
+
+ w0 = add4(SIGMA1_256(w14), w9, SIGMA0_256(w1), w0);
+ SHA256ROUND(a, b, c, d, e, f, g, h, 48, w0);
+ w1 = add4(SIGMA1_256(w15), w10, SIGMA0_256(w2), w1);
+ SHA256ROUND(h, a, b, c, d, e, f, g, 49, w1);
+ w2 = add4(SIGMA1_256(w0), w11, SIGMA0_256(w3), w2);
+ SHA256ROUND(g, h, a, b, c, d, e, f, 50, w2);
+ w3 = add4(SIGMA1_256(w1), w12, SIGMA0_256(w4), w3);
+ SHA256ROUND(f, g, h, a, b, c, d, e, 51, w3);
+ w4 = add4(SIGMA1_256(w2), w13, SIGMA0_256(w5), w4);
+ SHA256ROUND(e, f, g, h, a, b, c, d, 52, w4);
+ w5 = add4(SIGMA1_256(w3), w14, SIGMA0_256(w6), w5);
+ SHA256ROUND(d, e, f, g, h, a, b, c, 53, w5);
+ w6 = add4(SIGMA1_256(w4), w15, SIGMA0_256(w7), w6);
+ SHA256ROUND(c, d, e, f, g, h, a, b, 54, w6);
+ w7 = add4(SIGMA1_256(w5), w0, SIGMA0_256(w8), w7);
+ SHA256ROUND(b, c, d, e, f, g, h, a, 55, w7);
+ w8 = add4(SIGMA1_256(w6), w1, SIGMA0_256(w9), w8);
+ SHA256ROUND(a, b, c, d, e, f, g, h, 56, w8);
+ w9 = add4(SIGMA1_256(w7), w2, SIGMA0_256(w10), w9);
+ SHA256ROUND(h, a, b, c, d, e, f, g, 57, w9);
+ w10 = add4(SIGMA1_256(w8), w3, SIGMA0_256(w11), w10);
+ SHA256ROUND(g, h, a, b, c, d, e, f, 58, w10);
+ w11 = add4(SIGMA1_256(w9), w4, SIGMA0_256(w12), w11);
+ SHA256ROUND(f, g, h, a, b, c, d, e, 59, w11);
+ w12 = add4(SIGMA1_256(w10), w5, SIGMA0_256(w13), w12);
+ SHA256ROUND(e, f, g, h, a, b, c, d, 60, w12);
+ w13 = add4(SIGMA1_256(w11), w6, SIGMA0_256(w14), w13);
+ SHA256ROUND(d, e, f, g, h, a, b, c, 61, w13);
+ w14 = add4(SIGMA1_256(w12), w7, SIGMA0_256(w15), w14);
+ SHA256ROUND(c, d, e, f, g, h, a, b, 62, w14);
+ w15 = add4(SIGMA1_256(w13), w8, SIGMA0_256(w0), w15);
+ SHA256ROUND(b, c, d, e, f, g, h, a, 63, w15);
+
+#define store_load(x, i, dest) \
+ w8 = load_epi32((hPre)[i], (hPre)[i], (hPre)[i], (hPre)[i]); \
+ dest = _mm_add_epi32(w8, x);
+
+ store_load(a, 0, w0);
+ store_load(b, 1, w1);
+ store_load(c, 2, w2);
+ store_load(d, 3, w3);
+ store_load(e, 4, w4);
+ store_load(f, 5, w5);
+ store_load(g, 6, w6);
+ store_load(h, 7, w7);
+
+ w8 = load_epi32(Pad[8], Pad[8], Pad[8], Pad[8]);
+ w9 = load_epi32(Pad[9], Pad[9], Pad[9], Pad[9]);
+ w10 = load_epi32(Pad[10], Pad[10], Pad[10], Pad[10]);
+ w11 = load_epi32(Pad[11], Pad[11], Pad[11], Pad[11]);
+ w12 = load_epi32(Pad[12], Pad[12], Pad[12], Pad[12]);
+ w13 = load_epi32(Pad[13], Pad[13], Pad[13], Pad[13]);
+ w14 = load_epi32(Pad[14], Pad[14], Pad[14], Pad[14]);
+ w15 = load_epi32(Pad[15], Pad[15], Pad[15], Pad[15]);
+
+ a = load_epi32(hInit[0], hInit[0], hInit[0], hInit[0]);
+ b = load_epi32(hInit[1], hInit[1], hInit[1], hInit[1]);
+ c = load_epi32(hInit[2], hInit[2], hInit[2], hInit[2]);
+ d = load_epi32(hInit[3], hInit[3], hInit[3], hInit[3]);
+ e = load_epi32(hInit[4], hInit[4], hInit[4], hInit[4]);
+ f = load_epi32(hInit[5], hInit[5], hInit[5], hInit[5]);
+ g = load_epi32(hInit[6], hInit[6], hInit[6], hInit[6]);
+ h = load_epi32(hInit[7], hInit[7], hInit[7], hInit[7]);
+
+ SHA256ROUND(a, b, c, d, e, f, g, h, 0, w0);   
+ SHA256ROUND(h, a, b, c, d, e, f, g, 1, w1);
+ SHA256ROUND(g, h, a, b, c, d, e, f, 2, w2);
+ SHA256ROUND(f, g, h, a, b, c, d, e, 3, w3);
+ SHA256ROUND(e, f, g, h, a, b, c, d, 4, w4);
+ SHA256ROUND(d, e, f, g, h, a, b, c, 5, w5);
+ SHA256ROUND(c, d, e, f, g, h, a, b, 6, w6);
+ SHA256ROUND(b, c, d, e, f, g, h, a, 7, w7);
+ SHA256ROUND(a, b, c, d, e, f, g, h, 8, w8);
+ SHA256ROUND(h, a, b, c, d, e, f, g, 9, w9);
+ SHA256ROUND(g, h, a, b, c, d, e, f, 10, w10);
+ SHA256ROUND(f, g, h, a, b, c, d, e, 11, w11);
+ SHA256ROUND(e, f, g, h, a, b, c, d, 12, w12);
+ SHA256ROUND(d, e, f, g, h, a, b, c, 13, w13);
+ SHA256ROUND(c, d, e, f, g, h, a, b, 14, w14);
+ SHA256ROUND(b, c, d, e, f, g, h, a, 15, w15);
+
+ w0 = add4(SIGMA1_256(w14), w9, SIGMA0_256(w1), w0);
+ SHA256ROUND(a, b, c, d, e, f, g, h, 16, w0);
+ w1 = add4(SIGMA1_256(w15), w10, SIGMA0_256(w2), w1);
+ SHA256ROUND(h, a, b, c, d, e, f, g, 17, w1);
+ w2 = add4(SIGMA1_256(w0), w11, SIGMA0_256(w3), w2);
+ SHA256ROUND(g, h, a, b, c, d, e, f, 18, w2);
+ w3 = add4(SIGMA1_256(w1), w12, SIGMA0_256(w4), w3);
+ SHA256ROUND(f, g, h, a, b, c, d, e, 19, w3);
+ w4 = add4(SIGMA1_256(w2), w13, SIGMA0_256(w5), w4);
+ SHA256ROUND(e, f, g, h, a, b, c, d, 20, w4);
+ w5 = add4(SIGMA1_256(w3), w14, SIGMA0_256(w6), w5);
+ SHA256ROUND(d, e, f, g, h, a, b, c, 21, w5);
+ w6 = add4(SIGMA1_256(w4), w15, SIGMA0_256(w7), w6);
+ SHA256ROUND(c, d, e, f, g, h, a, b, 22, w6);
+ w7 = add4(SIGMA1_256(w5), w0, SIGMA0_256(w8), w7);
+ SHA256ROUND(b, c, d, e, f, g, h, a, 23, w7);
+ w8 = add4(SIGMA1_256(w6), w1, SIGMA0_256(w9), w8);
+ SHA256ROUND(a, b, c, d, e, f, g, h, 24, w8);
+ w9 = add4(SIGMA1_256(w7), w2, SIGMA0_256(w10), w9);
+ SHA256ROUND(h, a, b, c, d, e, f, g, 25, w9);
+ w10 = add4(SIGMA1_256(w8), w3, SIGMA0_256(w11), w10);
+ SHA256ROUND(g, h, a, b, c, d, e, f, 26, w10);
+ w11 = add4(SIGMA1_256(w9), w4, SIGMA0_256(w12), w11);
+ SHA256ROUND(f, g, h, a, b, c, d, e, 27, w11);
+ w12 = add4(SIGMA1_256(w10), w5, SIGMA0_256(w13), w12);
+ SHA256ROUND(e, f, g, h, a, b, c, d, 28, w12);
+ w13 = add4(SIGMA1_256(w11), w6, SIGMA0_256(w14), w13);
+ SHA256ROUND(d, e, f, g, h, a, b, c, 29, w13);
+ w14 = add4(SIGMA1_256(w12), w7, SIGMA0_256(w15), w14);
+ SHA256ROUND(c, d, e, f, g, h, a, b, 30, w14);
+ w15 = add4(SIGMA1_256(w13), w8, SIGMA0_256(w0), w15);
+ SHA256ROUND(b, c, d, e, f, g, h, a, 31, w15);
+
+ w0 = add4(SIGMA1_256(w14), w9, SIGMA0_256(w1), w0);
+ SHA256ROUND(a, b, c, d, e, f, g, h, 32, w0);
+ w1 = add4(SIGMA1_256(w15), w10, SIGMA0_256(w2), w1);
+ SHA256ROUND(h, a, b, c, d, e, f, g, 33, w1);
+ w2 = add4(SIGMA1_256(w0), w11, SIGMA0_256(w3), w2);
+ SHA256ROUND(g, h, a, b, c, d, e, f, 34, w2);
+ w3 = add4(SIGMA1_256(w1), w12, SIGMA0_256(w4), w3);
+ SHA256ROUND(f, g, h, a, b, c, d, e, 35, w3);
+ w4 = add4(SIGMA1_256(w2), w13, SIGMA0_256(w5), w4);
+ SHA256ROUND(e, f, g, h, a, b, c, d, 36, w4);
+ w5 = add4(SIGMA1_256(w3), w14, SIGMA0_256(w6), w5);
+ SHA256ROUND(d, e, f, g, h, a, b, c, 37, w5);
+ w6 = add4(SIGMA1_256(w4), w15, SIGMA0_256(w7), w6);
+ SHA256ROUND(c, d, e, f, g, h, a, b, 38, w6);
+ w7 = add4(SIGMA1_256(w5), w0, SIGMA0_256(w8), w7);
+ SHA256ROUND(b, c, d, e, f, g, h, a, 39, w7);
+ w8 = add4(SIGMA1_256(w6), w1, SIGMA0_256(w9), w8);
+ SHA256ROUND(a, b, c, d, e, f, g, h, 40, w8);
+ w9 = add4(SIGMA1_256(w7), w2, SIGMA0_256(w10), w9);
+ SHA256ROUND(h, a, b, c, d, e, f, g, 41, w9);
+ w10 = add4(SIGMA1_256(w8), w3, SIGMA0_256(w11), w10);
+ SHA256ROUND(g, h, a, b, c, d, e, f, 42, w10);
+ w11 = add4(SIGMA1_256(w9), w4, SIGMA0_256(w12), w11);
+ SHA256ROUND(f, g, h, a, b, c, d, e, 43, w11);
+ w12 = add4(SIGMA1_256(w10), w5, SIGMA0_256(w13), w12);
+ SHA256ROUND(e, f, g, h, a, b, c, d, 44, w12);
+ w13 = add4(SIGMA1_256(w11), w6, SIGMA0_256(w14), w13);
+ SHA256ROUND(d, e, f, g, h, a, b, c, 45, w13);
+ w14 = add4(SIGMA1_256(w12), w7, SIGMA0_256(w15), w14);
+ SHA256ROUND(c, d, e, f, g, h, a, b, 46, w14);
+ w15 = add4(SIGMA1_256(w13), w8, SIGMA0_256(w0), w15);
+ SHA256ROUND(b, c, d, e, f, g, h, a, 47, w15);
+
+ w0 = add4(SIGMA1_256(w14), w9, SIGMA0_256(w1), w0);
+ SHA256ROUND(a, b, c, d, e, f, g, h, 48, w0);
+ w1 = add4(SIGMA1_256(w15), w10, SIGMA0_256(w2), w1);
+ SHA256ROUND(h, a, b, c, d, e, f, g, 49, w1);
+ w2 = add4(SIGMA1_256(w0), w11, SIGMA0_256(w3), w2);
+ SHA256ROUND(g, h, a, b, c, d, e, f, 50, w2);
+ w3 = add4(SIGMA1_256(w1), w12, SIGMA0_256(w4), w3);
+ SHA256ROUND(f, g, h, a, b, c, d, e, 51, w3);
+ w4 = add4(SIGMA1_256(w2), w13, SIGMA0_256(w5), w4);
+ SHA256ROUND(e, f, g, h, a, b, c, d, 52, w4);
+ w5 = add4(SIGMA1_256(w3), w14, SIGMA0_256(w6), w5);
+ SHA256ROUND(d, e, f, g, h, a, b, c, 53, w5);
+ w6 = add4(SIGMA1_256(w4), w15, SIGMA0_256(w7), w6);
+ SHA256ROUND(c, d, e, f, g, h, a, b, 54, w6);
+ w7 = add4(SIGMA1_256(w5), w0, SIGMA0_256(w8), w7);
+ SHA256ROUND(b, c, d, e, f, g, h, a, 55, w7);
+ w8 = add4(SIGMA1_256(w6), w1, SIGMA0_256(w9), w8);
+ SHA256ROUND(a, b, c, d, e, f, g, h, 56, w8);
+ w9 = add4(SIGMA1_256(w7), w2, SIGMA0_256(w10), w9);
+ SHA256ROUND(h, a, b, c, d, e, f, g, 57, w9);
+ w10 = add4(SIGMA1_256(w8), w3, SIGMA0_256(w11), w10);
+ SHA256ROUND(g, h, a, b, c, d, e, f, 58, w10);
+ w11 = add4(SIGMA1_256(w9), w4, SIGMA0_256(w12), w11);
+ SHA256ROUND(f, g, h, a, b, c, d, e, 59, w11);
+ w12 = add4(SIGMA1_256(w10), w5, SIGMA0_256(w13), w12);
+ SHA256ROUND(e, f, g, h, a, b, c, d, 60, w12);
+ w13 = add4(SIGMA1_256(w11), w6, SIGMA0_256(w14), w13);
+ SHA256ROUND(d, e, f, g, h, a, b, c, 61, w13);
+ w14 = add4(SIGMA1_256(w12), w7, SIGMA0_256(w15), w14);
+ SHA256ROUND(c, d, e, f, g, h, a, b, 62, w14);
+ w15 = add4(SIGMA1_256(w13), w8, SIGMA0_256(w0), w15);
+ SHA256ROUND(b, c, d, e, f, g, h, a, 63, w15);
+
+ /* store resulsts directly in thash */
+#define store_2(x,i)  \
+ w0 = load_epi32((hInit)[i], (hInit)[i], (hInit)[i], (hInit)[i]); \
+ *(__m128i *)&(thash)[i][0+k] = _mm_add_epi32(w0, x);
+
+ store_2(a, 0);
+ store_2(b, 1);
+ store_2(c, 2);
+ store_2(d, 3);
+ store_2(e, 4);
+ store_2(f, 5);
+ store_2(g, 6);
+ store_2(h, 7);
+ }
+
+}
diff --git a/main.cpp b/main.cpp
index ddc359a..d30d642 100755
--- a/main.cpp
+++ b/main.cpp
@@ -2555,8 +2555,10 @@ inline void SHA256Transform(void* pstate, void* pinput, const void* pinit)
     CryptoPP::SHA256::Transform((CryptoPP::word32*)pstate, (CryptoPP::word32*)pinput);
 }
 
+// !!!! NPAR must match NPAR in cryptopp/sha256.cpp !!!!
+#define NPAR 32
 
-
+extern void Double_BlockSHA256(const void* pin, void* pout, const void *pinit, unsigned int hash[8][NPAR], const void *init2);
 
 
 void BitcoinMiner()
@@ -2701,108 +2703,123 @@ void BitcoinMiner()
         uint256 hashTarget = CBigNum().SetCompact(pblock->nBits).getuint256();
         uint256 hashbuf[2];
         uint256& hash = *alignup<16>(hashbuf);
+
+        // Cache for NPAR hashes
+        unsigned int thash[8][NPAR];
+
+        unsigned int j;
         loop
         {
-            SHA256Transform(&tmp.hash1, (char*)&tmp.block + 64, &midstate);
-            SHA256Transform(&hash, &tmp.hash1, pSHA256InitState);
+          Double_BlockSHA256((char*)&tmp.block + 64, &tmp.hash1, &midstate, thash, pSHA256InitState);
 
-            if (((unsigned short*)&hash)[14] == 0)
+          for(j = 0; j+            if (thash[7][j] == 0)
             {
-                // Byte swap the result after preliminary check
-                for (int i = 0; i < sizeof(hash)/4; i++)
-                    ((unsigned int*)&hash)[i] = ByteReverse(((unsigned int*)&hash)[i]);
-
-                if (hash <= hashTarget)
+              // Byte swap the result after preliminary check
+              for (int i = 0; i < sizeof(hash)/4; i++)
+                ((unsigned int*)&hash)[i] = ByteReverse((unsigned int)thash[i][j]);
+
+              if (hash <= hashTarget)
+              {
+                // Double_BlocSHA256 might only calculate parts of the hash.
+                // We'll insert the nonce and get the real hash.
+                //pblock->nNonce = ByteReverse(tmp.block.nNonce + j);
+                //hash = pblock->GetHash();
+
+                pblock->nNonce = ByteReverse(tmp.block.nNonce + j);
+                assert(hash == pblock->GetHash());
+
+                //// debug print
+                printf("BitcoinMiner:\n");
+                printf("proof-of-work found  \n  hash: %s  \ntarget: %s\n", hash.GetHex().c_str(), hashTarget.GetHex().c_str());
+                pblock->print();
+                printf("%s ", DateTimeStrFormat("%x %H:%M", GetTime()).c_str());
+                printf("generated %s\n", FormatMoney(pblock->vtx[0].vout[0].nValue).c_str());
+
+                SetThreadPriority(THREAD_PRIORITY_NORMAL);
+                CRITICAL_BLOCK(cs_main)
                 {
-                    pblock->nNonce = ByteReverse(tmp.block.nNonce);
-                    assert(hash == pblock->GetHash());
-
-                        //// debug print
-                        printf("BitcoinMiner:\n");
-                        printf("proof-of-work found  \n  hash: %s  \ntarget: %s\n", hash.GetHex().c_str(), hashTarget.GetHex().c_str());
-                        pblock->print();
-                        printf("%s ", DateTimeStrFormat("%x %H:%M", GetTime()).c_str());
-                        printf("generated %s\n", FormatMoney(pblock->vtx[0].vout[0].nValue).c_str());
-
-                    SetThreadPriority(THREAD_PRIORITY_NORMAL);
-                    CRITICAL_BLOCK(cs_main)
-                    {
-                        if (pindexPrev == pindexBest)
-                        {
-                            // Save key
-                            if (!AddKey(key))
-                                return;
-                            key.MakeNewKey();
-
-                            // Track how many getdata requests this block gets
-                            CRITICAL_BLOCK(cs_mapRequestCount)
-                                mapRequestCount[pblock->GetHash()] = 0;
-
-                            // Process this block the same as if we had received it from another node
-                            if (!ProcessBlock(NULL, pblock.release()))
-                                printf("ERROR in BitcoinMiner, ProcessBlock, block not accepted\n");
-                        }
-                    }
-                    SetThreadPriority(THREAD_PRIORITY_LOWEST);
-
-                    Sleep(500);
-                    break;
+                  if (pindexPrev == pindexBest)
+                  {
+                    // Save key
+                    if (!AddKey(key))
+                      return;
+                    key.MakeNewKey();
+
+                    // Track how many getdata requests this block gets
+                    CRITICAL_BLOCK(cs_mapRequestCount)
+                      mapRequestCount[pblock->GetHash()] = 0;
+
+                    // Process this block the same as if we had received it from another node
+                    if (!ProcessBlock(NULL, pblock.release()))
+                      printf("ERROR in BitcoinMiner, ProcessBlock, block not accepted\n");
+
+                  }
                 }
-            }
+                SetThreadPriority(THREAD_PRIORITY_LOWEST);
 
-            // Update nTime every few seconds
-            const unsigned int nMask = 0xffff;
-            if ((++tmp.block.nNonce & nMask) == 0)
+                Sleep(500);
+                break;
+              }
+            }
+          }
+
+          // Update nonce
+          tmp.block.nNonce += NPAR;
+
+          // Update nTime every few seconds
+          const unsigned int nMask = 0xffff;
+          if ((tmp.block.nNonce & nMask) == 0)
+          {
+            // Meter hashes/sec
+            static int64 nTimerStart;
+            static int nHashCounter;
+            if (nTimerStart == 0)
+              nTimerStart = GetTimeMillis();
+            else
+              nHashCounter++;
+            if (GetTimeMillis() - nTimerStart > 4000)
             {
-                // Meter hashes/sec
-                static int64 nTimerStart;
-                static int nHashCounter;
-                if (nTimerStart == 0)
-                    nTimerStart = GetTimeMillis();
-                else
-                    nHashCounter++;
+              static CCriticalSection cs;
+              CRITICAL_BLOCK(cs)
+              {
                 if (GetTimeMillis() - nTimerStart > 4000)
                 {
-                    static CCriticalSection cs;
-                    CRITICAL_BLOCK(cs)
-                    {
-                        if (GetTimeMillis() - nTimerStart > 4000)
-                        {
-                            double dHashesPerSec = 1000.0 * (nMask+1) * nHashCounter / (GetTimeMillis() - nTimerStart);
-                            nTimerStart = GetTimeMillis();
-                            nHashCounter = 0;
-                            string strStatus = strprintf("    %.0f khash/s", dHashesPerSec/1000.0);
-                            UIThreadCall(bind(CalledSetStatusBar, strStatus, 0));
-                            static int64 nLogTime;
-                            if (GetTime() - nLogTime > 30 * 60)
-                            {
-                                nLogTime = GetTime();
-                                printf("%s ", DateTimeStrFormat("%x %H:%M", GetTime()).c_str());
-                                printf("hashmeter %3d CPUs %6.0f khash/s\n", vnThreadsRunning[3], dHashesPerSec/1000.0);
-                            }
-                        }
-                    }
+                  double dHashesPerSec = 1000.0 * (nMask+1) * nHashCounter / (GetTimeMillis() - nTimerStart);
+                  nTimerStart = GetTimeMillis();
+                  nHashCounter = 0;
+                  string strStatus = strprintf("    %.0f khash/s", dHashesPerSec/1000.0);
+                  UIThreadCall(bind(CalledSetStatusBar, strStatus, 0));
+                  static int64 nLogTime;
+                  if (GetTime() - nLogTime > 30 * 60)
+                  {
+                    nLogTime = GetTime();
+                    printf("%s ", DateTimeStrFormat("%x %H:%M", GetTime()).c_str());
+                    printf("hashmeter %3d CPUs %6.0f khash/s\n", vnThreadsRunning[3], dHashesPerSec/1000.0);
+                  }
                 }
-
-                // Check for stop or if block needs to be rebuilt
-                if (fShutdown)
-                    return;
-                if (!fGenerateBitcoins)
-                    return;
-                if (fLimitProcessors && vnThreadsRunning[3] > nLimitProcessors)
-                    return;
-                if (vNodes.empty())
-                    break;
-                if (tmp.block.nNonce == 0)
-                    break;
-                if (nTransactionsUpdated != nTransactionsUpdatedLast && GetTime() - nStart > 60)
-                    break;
-                if (pindexPrev != pindexBest)
-                    break;
-
-                pblock->nTime = max(pindexPrev->GetMedianTimePast()+1, GetAdjustedTime());
-                tmp.block.nTime = ByteReverse(pblock->nTime);
+              }
             }
+
+            // Check for stop or if block needs to be rebuilt
+            if (fShutdown)
+              return;
+            if (!fGenerateBitcoins)
+              return;
+            if (fLimitProcessors && vnThreadsRunning[3] > nLimitProcessors)
+              return;
+            if (vNodes.empty())
+              break;
+            if (tmp.block.nNonce == 0)
+              break;
+            if (nTransactionsUpdated != nTransactionsUpdatedLast && GetTime() - nStart > 60)
+              break;
+            if (pindexPrev != pindexBest)
+              break;
+
+            pblock->nTime = max(pindexPrev->GetMedianTimePast()+1, GetAdjustedTime());
+            tmp.block.nTime = ByteReverse(pblock->nTime);
+          }
         }
     }
 }
diff --git a/makefile.unix b/makefile.unix
index 597a0ea..8fb0aa6 100755
--- a/makefile.unix
+++ b/makefile.unix
@@ -45,7 +45,8 @@ OBJS= \
     obj/rpc.o \
     obj/init.o \
     cryptopp/obj/sha.o \
-    cryptopp/obj/cpu.o
+    cryptopp/obj/cpu.o \
+ cryptopp/obj/sha256.o
 
 
 all: bitcoin
@@ -58,18 +59,20 @@ obj/%.o: %.cpp $(HEADERS) headers.h.gch
  g++ -c $(CFLAGS) -DGUI -o $@ $<
 
 cryptopp/obj/%.o: cryptopp/%.cpp
- g++ -c $(CFLAGS) -O3 -DCRYPTOPP_DISABLE_SSE2 -o $@ $<
+ g++ -c $(CFLAGS) -frename-registers -funroll-all-loops -fomit-frame-pointer  -march=native -msse2 -msse3  -ffast-math -O3 -o $@ $<
 
 bitcoin: $(OBJS) obj/ui.o obj/uibase.o
  g++ $(CFLAGS) -o $@ $(LIBPATHS) $^ $(WXLIBS) $(LIBS)
 
-
 obj/nogui/%.o: %.cpp $(HEADERS)
  g++ -c $(CFLAGS) -o $@ $<
 
 bitcoind: $(OBJS:obj/%=obj/nogui/%)
  g++ $(CFLAGS) -o $@ $(LIBPATHS) $^ $(LIBS)
 
+test: cryptopp/obj/sha.o cryptopp/obj/sha256.o test.cpp
+   g++ $(CFLAGS) -o $@ $(LIBPATHS) $^ $(WXLIBS) $(LIBS)
+
 
 clean:
  -rm -f obj/*.o
diff --git a/test.cpp b/test.cpp
new file mode 100755
index 0000000..7cab332
--- /dev/null
+++ b/test.cpp
@@ -0,0 +1,237 @@
+// Copyright (c) 2009-2010 Satoshi Nakamoto
+// Distributed under the MIT/X11 software license, see the accompanying
+// file license.txt or http://www.opensource.org/licenses/mit-license.php.
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+#include
+using namespace std;
+using namespace boost;
+#include "cryptopp/sha.h"
+#include "strlcpy.h"
+#include "serialize.h"
+#include "uint256.h"
+#include "bignum.h"
+
+#undef printf
+ template
+T* alignup(T* p)
+{
+ union
+ {   
+ T* ptr;
+ size_t n;
+ } u;
+ u.ptr = p;
+ u.n = (u.n + (nBytes-1)) & ~(nBytes-1);
+ return u.ptr;
+}
+
+int FormatHashBlocks(void* pbuffer, unsigned int len)
+{
+ unsigned char* pdata = (unsigned char*)pbuffer;
+ unsigned int blocks = 1 + ((len + 8) / 64);
+ unsigned char* pend = pdata + 64 * blocks;
+ memset(pdata + len, 0, 64 * blocks - len);
+ pdata[len] = 0x80;
+ unsigned int bits = len * 8;
+ pend[-1] = (bits >> 0) & 0xff;
+ pend[-2] = (bits >> 8) & 0xff;
+ pend[-3] = (bits >> 16) & 0xff;
+ pend[-4] = (bits >> 24) & 0xff;
+ return blocks;
+}
+
+using CryptoPP::ByteReverse;
+static int detectlittleendian = 1;
+
+#define NPAR 32
+
+extern void Double_BlockSHA256(const void* pin, void* pout, const void *pinit, unsigned int hash[8][NPAR], const void *init2);
+
+using CryptoPP::ByteReverse;
+
+static const unsigned int pSHA256InitState[8] = {0x6a09e667, 0xbb67ae85, 0x3c6ef372, 0xa54ff53a, 0x510e527f, 0x9b05688c, 0x1f83d9ab, 0x5be0cd19};
+
+inline void SHA256Transform(void* pstate, void* pinput, const void* pinit)
+{
+ memcpy(pstate, pinit, 32);
+ CryptoPP::SHA256::Transform((CryptoPP::word32*)pstate, (CryptoPP::word32*)pinput);
+}
+
+void BitcoinTester(char *filename)
+{
+ printf("SHA256 test started\n");
+
+ struct tmpworkspace
+ {
+ struct unnamed2
+ {
+ int nVersion;
+ uint256 hashPrevBlock;
+ uint256 hashMerkleRoot;
+ unsigned int nTime;
+ unsigned int nBits;
+ unsigned int nNonce;
+ }
+ block;
+ unsigned char pchPadding0[64];
+ uint256 hash1;
+ unsigned char pchPadding1[64];
+ };
+ char tmpbuf[sizeof(tmpworkspace)+16];
+ tmpworkspace& tmp = *(tmpworkspace*)alignup<16>(tmpbuf);
+
+
+ char line[180];
+ ifstream fin(filename);
+ char *p;
+ unsigned long int totalhashes= 0;
+ unsigned long int found = 0;
+ clock_t start, end;
+ unsigned long int cpu_time_used;
+ unsigned int tnonce;
+ start = clock();
+
+ while( fin.getline(line, 180))
+ {
+ string in(line);
+ //printf("%s\n", in.c_str());
+ tmp.block.nVersion       = strtol(in.substr(0,8).c_str(), &p, 16);
+ tmp.block.hashPrevBlock.SetHex(in.substr(8,64));
+ tmp.block.hashMerkleRoot.SetHex(in.substr(64+8,64));
+ tmp.block.nTime          = strtol(in.substr(128+8,8).c_str(), &p, 16);
+ tmp.block.nBits          = strtol(in.substr(128+16,8).c_str(), &p, 16);
+ tnonce = strtol(in.substr(128+24,8).c_str(), &p, 16);
+ tmp.block.nNonce         = tnonce;
+
+ unsigned int nBlocks0 = FormatHashBlocks(&tmp.block, sizeof(tmp.block));
+ unsigned int nBlocks1 = FormatHashBlocks(&tmp.hash1, sizeof(tmp.hash1));
+
+ // Byte swap all the input buffer
+ for (int i = 0; i < sizeof(tmp)/4; i++)
+ ((unsigned int*)&tmp)[i] = ByteReverse(((unsigned int*)&tmp)[i]);
+
+ // Precalc the first half of the first hash, which stays constant
+ uint256 midstatebuf[2];
+ uint256& midstate = *alignup<16>(midstatebuf);
+ SHA256Transform(&midstate, &tmp.block, pSHA256InitState);
+
+
+ uint256 hashTarget = CBigNum().SetCompact(ByteReverse(tmp.block.nBits)).getuint256();
+ // printf("target %s\n", hashTarget.GetHex().c_str());
+ uint256 hash;
+ uint256 hashbuf[2];
+ uint256& refhash = *alignup<16>(hashbuf);
+
+ unsigned int thash[8][NPAR];
+ int done = 0;
+ unsigned int i, j;
+
+ /* reference */
+ SHA256Transform(&tmp.hash1, (char*)&tmp.block + 64, &midstate);
+ SHA256Transform(&refhash, &tmp.hash1, pSHA256InitState);
+ for (int i = 0; i < sizeof(refhash)/4; i++)
+ ((unsigned int*)&refhash)[i] = ByteReverse(((unsigned int*)&refhash)[i]);
+
+ //printf("reference nonce %08x:\n%s\n\n", tnonce, refhash.GetHex().c_str());
+
+ tmp.block.nNonce = ByteReverse(tnonce) & 0xfffff000;
+
+
+ for(;;)
+ {
+
+ Double_BlockSHA256((char*)&tmp.block + 64, &tmp.hash1, &midstate, thash, pSHA256InitState);
+
+ for(i = 0; i+ /* fast hash checking */
+ if(thash[7][i] == 0) {
+ // printf("found something... ");
+
+ for(j = 0; j<8; j++) ((unsigned int *)&hash)[j] = ByteReverse((unsigned int)thash[j][i]);
+ // printf("%s\n", hash.GetHex().c_str());
+
+ if (hash <= hashTarget)
+ {
+ found++;
+ if(tnonce == ByteReverse(tmp.block.nNonce + i) ) {
+ if(hash == refhash) {
+ printf("\r%lu", found);
+ totalhashes += NPAR;
+ done = 1;
+ } else {
+ printf("Hashes do not match!\n");
+ }
+ } else {
+ printf("nonce does not match. %08x != %08x\n", tnonce, ByteReverse(tmp.block.nNonce + i));
+ }
+ break;
+ }
+ }
+ }
+ if(done) break;
+
+ tmp.block.nNonce+=NPAR;
+ totalhashes += NPAR;
+ if(tmp.block.nNonce == 0) {
+ printf("ERROR: Hash not found for:\n%s\n", in.c_str());
+ return;
+ }
+ }
+ }
+ printf("\n");
+ end = clock();
+ cpu_time_used += (unsigned int)(end - start);
+ cpu_time_used /= ((CLOCKS_PER_SEC)/1000);
+ printf("found solutions = %lu\n", found);
+ printf("total hashes = %lu\n", totalhashes);
+ printf("total time = %lu ms\n", cpu_time_used);
+ printf("average speed: %lu khash/s\n", (totalhashes)/cpu_time_used);
+}
+
+int main(int argc, char* argv[]) {
+ if(argc == 2) {
+ BitcoinTester(argv[1]);
+ } else
+ printf("Missing filename!\n");
+ return 0;
+}
legendary
Activity: 1596
Merit: 1091
hmm...I wasn't able to apply the patch (I'm a noobie). Here's the command I ran from bitcoin-0.3.6/src # patch < XN1JDb53.txt

Output:

1 out of 1 hunk ignored
(Stripping trailing CRs from patch.)
patching file main.cpp
Hunk #1 FAILED at 2555.
Hunk #2 FAILED at 2701.
2 out of 2 hunks FAILED
(Stripping trailing CRs from patch.)
patching file makefile.unix
Hunk #1 FAILED at 45.
Hunk #2 FAILED at 58.

It definitely does not apply to the SVN trunk.  Maybe tcatm could post the main.cpp itself?
sr. member
Activity: 337
Merit: 263
the mean client would send all generated bitcoins to a certain address Wink

@em3rgent0rder: i don't know why it fails, but it should be easy to patch it manually...
sr. member
Activity: 434
Merit: 251
youtube.com/ericfontainejazz now accepts bitcoin
hmm...I wasn't able to apply the patch (I'm a noobie). Here's the command I ran from bitcoin-0.3.6/src # patch < XN1JDb53.txt

Output:

1 out of 1 hunk ignored
(Stripping trailing CRs from patch.)
patching file main.cpp
Hunk #1 FAILED at 2555.
Hunk #2 FAILED at 2701.
2 out of 2 hunks FAILED
(Stripping trailing CRs from patch.)
patching file makefile.unix
Hunk #1 FAILED at 45.
Hunk #2 FAILED at 58.

What's the proper command to type into linux?  Or do you have linux binaries?
Pages:
Jump to: