Author

Topic: bitcoin generation broken in 0.3.8? (64-bit) (Read 11419 times)

lfm
full member
Activity: 196
Merit: 100
August 12, 2010, 06:19:55 AM
#37
In recognition of the sluthiness exhibited by lfm and ArtForz, I will send them the first coins that my healed 64-bit Linux machine discovers.
Nice work guys.  (lfm, post up an address, eh)

Note significant clues were also supplied by lechesis and tcatm if you're feeling generous. :-)
member
Activity: 111
Merit: 10
My Linux 64-bit machine is provably healed, and all 11,000 khash/sec are being used For Good at last.

ArtForz, lfm, You have incoming coin. Smiley
full member
Activity: 307
Merit: 101
In recognition of the sluthiness exhibited by lfm and ArtForz, I will send them the first coins that my healed 64-bit Linux machine discovers.
Nice work guys.  (lfm, post up an address, eh)

It would be great if some future version of bitcoin/bitcoind had an internal self-test, just to confirm that the hashing is working.  As more folks try different options for specialized hardware, it seems all we have today is the indeterminate "difficulty 1" point-to-point test.  A "known good" hardcoded hash seed could make this better.

ArtForz actually made a patch to check up on BitcoinMiner(), http://pastebin.com/YGUcqPYK
lfm
full member
Activity: 196
Merit: 100
In recognition of the sluthiness exhibited by lfm and ArtForz, I will send them the first coins that my healed 64-bit Linux machine discovers.
Nice work guys.  (lfm, post up an address, eh)

here: 1HKXYYPCzQptzJsaq2nt8xUgsWNVFRfJWD
or here:  75.158.131.108

Quote
It would be great if some future version of bitcoin/bitcoind had an internal self-test, just to confirm that the hashing is working.  As more folks try different options for specialized hardware, it seems all we have today is the indeterminate "difficulty 1" point-to-point test.  A "known good" hardcoded hash seed could make this better.

Ya a couple "test vectors" at runtime might be nice but then again if they're hacking they might hack out the test vectors cuz "they just slow ya down!" (logical or not)

If there was a well defined hook for the SHA variants it might make it less likely the test code would get hacked out. Not exactly sure how a hook should be designed to work for all the players out there. Stuff like GPU codes and SSE codes where several nonces might be tried in parallel need to be supported.

member
Activity: 111
Merit: 10
In recognition of the sluthiness exhibited by lfm and ArtForz, I will send them the first coins that my healed 64-bit Linux machine discovers.
Nice work guys.  (lfm, post up an address, eh)

It would be great if some future version of bitcoin/bitcoind had an internal self-test, just to confirm that the hashing is working.  As more folks try different options for specialized hardware, it seems all we have today is the indeterminate "difficulty 1" point-to-point test.  A "known good" hardcoded hash seed could make this better.
founder
Activity: 364
Merit: 3077
I uploaded 0.3.8.1 for Linux with re-built 64-bit.  I ran a difficulty 1 test with it and it has generated blocks.

https://bitcointalksearch.org/topic/version-0381-update-for-linux-64-bit-765

Download:
http://sourceforge.net/projects/bitcoin/files/Bitcoin/bitcoin-0.3.8/bitcoin-0.3.8.1-linux.tar.gz/download
newbie
Activity: 53
Merit: 0
Yet another argument for cmake or similar.

It's a fact.

I intend to eventually sort out all building hassles and have a reliable build procedure across all win/unix 32/64-bit platform combinations.

On that note, I'm currently tackling x64 build for windows and notice that for 64-bit MSVC, the X86_SHA256_HashBlocks function is deferred to an external definition that is not present in the project. In the original CryptoPP library it seems to be in a separate asm module. I wonder how are people building x64 on windows, are they setting defines to use the C-source sha version?
sr. member
Activity: 308
Merit: 252
I found that SSE2 only added a slight speedup, about 2%, which didn't seem worth the incompatibility. 
I do the see point because a non-SSE2 machine, the client would crash?
Quote
It doesn't look to me like Crypto++ could be deciding whether to use SSE2 at runtime.  There's one place where it detects SSE2 for deciding some block count parameter, but the SSE2 stuff is all #ifdef at compile time and I can't see how that would switch at runtime.  Maybe I'm not looking in the right place.

Should we enable SSE2 in all the makefiles?  It seems like we must in case someone compiles with 64-bit.

I will recompile the 64-bit part of the Linux 0.3.8 release.
Depends on which is easier really. If you enable something that is going to break compatibility, the only people who will feel the pain are the non-technical users of the client. From their perspective, it just doesn't work for some reason.

I think some builds notes would be better, example (If compiling on a 64bit system, be sure to do this part).

Have one branch of source that cross-compiles across multiple operating systems is always tricky business.  Grin
founder
Activity: 364
Merit: 3077
I found that SSE2 only added a slight 2% speedup, which didn't seem worth the incompatibility.  I was trying to take the safer option.

It doesn't look to me like Crypto++ could be deciding whether to use SSE2 at runtime.  There's one place where it detects SSE2 for deciding some block count parameter, but the SSE2 stuff is all #ifdef at compile time and I can't see how that would switch at runtime.  Maybe I'm not looking in the right place.

Should we enable SSE2 in all the makefiles?  It seems like we must in case someone compiles with 64-bit.

I will recompile the 64-bit part of the Linux 0.3.8 release.
full member
Activity: 210
Merit: 100
I guess that flag was put in for old 32 bit machines that might not have sse2. Unfortunatly there is no such thing as a 64 bit x86_64 without sse2 so the conditional compilation produced an empty body which did exactly nothing.
Yet another argument for cmake or similar.
lfm
full member
Activity: 196
Merit: 100
The 32 bit Linux build seems ok for those who don't care to try to build it themselves. It's only a few percent slower than the 64 when built right.

I guess that flag was put in for old 32 bit machines that might not have sse2. Unfortunatly there is no such thing as a 64 bit x86_64 without sse2 so the conditional compilation produced an empty body which did exactly nothing.
full member
Activity: 210
Merit: 100
Just had a "fun" gdb session with the official 0.3.8 linux 64 bit binary on debian sid.
Same bug, output state on return from SHA256Transform always == initial state.
So... did someone really generate a block running the official 0.3.8 64 bit linux binary (the one in bin/64)?
Oh, and from a quick glance at the svn changelog, that bug probably has been there since r118 = 0.3.6.
Oh, and who THE FUCK thought stripping the official binary was a good idea? I just wasted half an hour hunting down SHA256Transform in a disassembler.
Nice work mate! Time to PM Satoshi.

Sent you ฿5.01 for your troubles.
sr. member
Activity: 308
Merit: 252
Just had a "fun" gdb session with the official 0.3.8 linux 64 bit binary on debian sid.
Same bug, output state on return from SHA256Transform always == initial state.
So... did someone really generate a block running the official 0.3.8 64 bit linux binary (the one in bin/64)?
Oh, and from a quick glance at the svn changelog, that bug probably has been there since r118 = 0.3.6.
Oh, and who THE FUCK thought stripping the official binary was a good idea? I just wasted half an hour hunting down SHA256Transform in a disassembler.


Not on 64bit linux, just 32bit and 64bit everything else in between (Linux/Windows)

I didn't notice until I checked the generation dates against the release dates of the software.

If you don't strip the debug symbols the binary is 8 times it's normal size.
sr. member
Activity: 406
Merit: 253
Just had a "fun" gdb session with the official 0.3.8 linux 64 bit binary on debian sid.
Same bug, output state on return from SHA256Transform always == initial state.
So... did someone really generate a block running the official 0.3.8 64 bit linux binary (the one in bin/64)?
Oh, and from a quick glance at the svn changelog, that bug probably has been there since r118 = 0.3.6.
Oh, and who THE FUCK thought stripping the official binary was a good idea? I just wasted half an hour hunting down SHA256Transform in a disassembler.
sr. member
Activity: 337
Merit: 263
I don't really know what it changes. I wonder why the define was made in the first place. The code should be able to determine whether it's running on a SSE2 CPU or not. I discovered it while comparing my 4xSSE2 hash function to the original one in 0.3.6 so it really means, that no 64bit build since 0.3.6 was able to generate any coins. Maybe satoshi can tell us why he put the define in the makefile.
sr. member
Activity: 308
Merit: 252
Did you compile it yourself? There's a define in makefile.unix. Something like -DCRYPTOPP_DISABLE_SSE2. Remove that. Else the SHA256_Transform will return the initstate. I had the same problem when switchting to 0.3.6.
This works for the 64bit builds, strange that it has no affect on the 32bit builds? Should it be disabled for 32bit builds as well?
sr. member
Activity: 308
Merit: 252
After testing the 32bit and 64bit builds, the problem I can confirm what was already posted here, it just affects the 64bit builds. When I go back and trace what's being produced, it's the 64bit Linux machines (the 64bit windows machines don't appear to have this problem) that aren't producing any coin since the latest releases. All of the 32bit builds (stock or custom) appear to be just fine.
sr. member
Activity: 308
Merit: 252
This seems to be a different problem. The blocks do not seem to be "stuck" on my systems. The getinfo shows them up to date

It seems the sha256 code is not getting built right for linux 64. Not sure if/how it could work on some and not on others.

There is one behavior that I've noticed.

Block sync stalling.

When the client (be it Windows or Linux) has been running for a few days, *sometimes* they get stuck in block limbo. What happens is your client keeps trying to solve the current block and loses track of the entire block system. So as time goes by, other blocks are solved and your PC is still stuck on the same block.

...

I'm starting to wonder if the two are connected now since the hashing seems to get stuck, maybe that's why it gets stuck on a block.
sr. member
Activity: 308
Merit: 252
@lachesis: 404 not found

@knightmb: What about a fixed CentOS build?
Should be simple enough for CentOS, I was going to test it out first to see if those suffered the same issue, thought it seems like all the builds would suffer this issue if the compile flag is what makes the difference.
sr. member
Activity: 252
Merit: 250
@lachesis: 404 not found

@knightmb: What about a fixed CentOS build?
sr. member
Activity: 308
Merit: 252
Did you compile it yourself? There's a define in makefile.unix. Something like -DCRYPTOPP_DISABLE_SSE2. Remove that. Else the SHA256_Transform will return the initstate. I had the same problem when switchting to 0.3.6.

That seems to be the root of the problem. I think even the bundled binary for Linux 64 in 0.3.8 was compiled wrong then.

I just tested it out, I get the same results. Repeating values with stock, more random values with the flag modification. Probably needs to be fixed for the next release, though I'm not sure how mine are generating anything then?
full member
Activity: 210
Merit: 100
That seems to be the root of the problem. I think even the bundled binary for Linux 64 in 0.3.8 was compiled wrong then.
That appears to fix it for me too.

EDIT:
I've uploaded corrected Linux builds to http://www.alloscomp.com/bitcoin/binaries/release-r123-2010-08-08/. Enjoy!
lfm
full member
Activity: 196
Merit: 100
Did you compile it yourself? There's a define in makefile.unix. Something like -DCRYPTOPP_DISABLE_SSE2. Remove that. Else the SHA256_Transform will return the initstate. I had the same problem when switchting to 0.3.6.

That seems to be the root of the problem. I think even the bundled binary for Linux 64 in 0.3.8 was compiled wrong then.

sr. member
Activity: 337
Merit: 263
Did you compile it yourself? There's a define in makefile.unix. Something like -DCRYPTOPP_DISABLE_SSE2. Remove that. Else the SHA256_Transform will return the initstate. I had the same problem when switchting to 0.3.6.
lfm
full member
Activity: 196
Merit: 100
This seems to be a different problem. The blocks do not seem to be "stuck" on my systems. The getinfo shows them up to date

It seems the sha256 code is not getting built right for linux 64. Not sure if/how it could work on some and not on others.

There is one behavior that I've noticed.

Block sync stalling.

When the client (be it Windows or Linux) has been running for a few days, *sometimes* they get stuck in block limbo. What happens is your client keeps trying to solve the current block and loses track of the entire block system. So as time goes by, other blocks are solved and your PC is still stuck on the same block.

...

sr. member
Activity: 308
Merit: 252
There is one behavior that I've noticed.

Block sync stalling.

When the client (be it Windows or Linux) has been running for a few days, *sometimes* they get stuck in block limbo. What happens is your client keeps trying to solve the current block and loses track of the entire block system. So as time goes by, other blocks are solved and your PC is still stuck on the same block. The problem is, if you stop and restart the client, it just picks up where it left off on the same stuck block. The only way I've seen to fix this is to delete the block chain so that the client will re-download it.

I've had to clear out a few block chains for a few servers because of this, I'll notice they are stuck on block 70,000 for example, while the rest of the network is working on block 70,839 for example.

That might explain why you go weeks without winning a block with a 24/7 PC going.

The server farm I had running before were spitting out blocks all day, but now that I'm just down to a few dozen servers and the difficulty is way up, I barely see about 3 or 4 blocks a day spread across all of them. But I know the 0.3.8 version is working like it should, except for the occasional block freeze.
lfm
full member
Activity: 196
Merit: 100
Ok chatting with lachesis in irc he tried this and I get the same result: We added some prints in main.cpp at the SHA calls like so :

Code:
        loop
        {
            SHA256Transform(&tmp.hash1, (char*)&tmp.block + 64, &midstate);
            printf("mid hash =\n");
            for (int i = 0; i < 8; i++)
              printf(" %08x", ((unsigned *)&tmp.hash1)[i]);
            printf("\n");
            SHA256Transform(&hash, &tmp.hash1, pSHA256InitState);
            printf("full hash =\n");
            for (int i = 0; i < 8; i++)
              printf(" %08x", ((unsigned *)&hash)[i]);
            printf("\n");

            if (((unsigned short*)&hash)[14] == 0)

and then in the log we get:

mid hash =
 6a09e667 bb67ae85 3c6ef372 a54ff53a 510e527f 9b05688c 1f83d9ab 5be0cd19
full hash =
 6a09e667 bb67ae85 3c6ef372 a54ff53a 510e527f 9b05688c 1f83d9ab 5be0cd19
mid hash =
 6a09e667 bb67ae85 3c6ef372 a54ff53a 510e527f 9b05688c 1f83d9ab 5be0cd19
full hash =
 6a09e667 bb67ae85 3c6ef372 a54ff53a 510e527f 9b05688c 1f83d9ab 5be0cd19

repeating! The hash call isn't doing anything!
(he maybe got a different repeating value, I don't know)
full member
Activity: 210
Merit: 100
I'm having a similar problem with my own compile of the SVN trunk. I just reverted to the stock builds to test and see what happens. I should try setting up my own network of two nodes with my builds.
lfm
full member
Activity: 196
Merit: 100
I haven't had any problems with 0.3.8 release, generated some coin on windows and Linux clients (32/64)bit just fine this week.

I am wondering if its something odd in the way I have my systems set up or Huh just various plain ubuntu installs so far as I know.
sr. member
Activity: 406
Merit: 253
At ~ 8Mhash/s combined not generating a block at 1.0 diff in over 1h is pretty unlikely.
As in, under 0.1% probability unlikely (50% is ~6 min, 1% ~42 min)

Forgot to mention, he's running a custom compile of stock 0.3.8 sources.
lfm
full member
Activity: 196
Merit: 100
I haven't had any problems with 0.3.8 release, generated some coin on windows and Linux clients (32/64)bit just fine this week.

I guess I want to know how long I need to run 8000 khash/s at difficulty 1.0 to have any reasonable evidence of a problem. ArtForz said it should tell me in an hour or so.
legendary
Activity: 1246
Merit: 1014
Strength in numbers
The difficulty went up again just after 3.8 was out. I have not gotten any with 4000khash in about 15 days, so 90 minutes of 8000k doesn't mean much. I'm sure we'd know if 3.8 wasn't generating any blocks.


This is on an isolated test net with just two machines, connections = 1, difficulty = 1


Oh, that is odd then, that would be like not finding one in over 500 hours, possible I guess, but getting out there
sr. member
Activity: 308
Merit: 252
I haven't had any problems with 0.3.8 release, generated some coin on windows and Linux clients (32/64)bit just fine this week.
lfm
full member
Activity: 196
Merit: 100
The difficulty went up again just after 3.8 was out. I have not gotten any with 4000khash in about 15 days, so 90 minutes of 8000k doesn't mean much. I'm sure we'd know if 3.8 wasn't generating any blocks.

maybe only a problem with 64 bit linux or something

lfm
full member
Activity: 196
Merit: 100
The difficulty went up again just after 3.8 was out. I have not gotten any with 4000khash in about 15 days, so 90 minutes of 8000k doesn't mean much. I'm sure we'd know if 3.8 wasn't generating any blocks.


This is on an isolated test net with just two machines, connections = 1, difficulty = 1
legendary
Activity: 1246
Merit: 1014
Strength in numbers
The difficulty went up again just after 3.8 was out. I have not gotten any with 4000khash in about 15 days, so 90 minutes of 8000k doesn't mean much. I'm sure we'd know if 3.8 wasn't generating any blocks.
lfm
full member
Activity: 196
Merit: 100
I was starting to wonder when my systems seemed to quit generating coins if there was something going on. They went from about 1 block / day to none in a week.

ArtForz in irc suggested I run a test isolated net with two nodes only connected to each other with empty wallet dir. I took a couple of quad core systems and set them up. they have produced no blocks in about 90 minutes now while hashing at a combined rate of over 8000 khash/sec. Is this evidence of a problem yet or is it more bad luck?

The systems are Linux 64 bit one Intel quad q6600 and one AMD quad phenom II 940.

Jump to: