Author

Topic: Why would bitcoind not receive new blocks? (Read 1460 times)

legendary
Activity: 1344
Merit: 1024
Mine at Jonny's Pool
April 12, 2016, 09:56:45 AM
#13
So, hammering bitcoind with GBT calls definitely makes it unhappy.  Has someone done any benchmarking on this?  It would be great to see how things pan out on different hardware specs vs calls to GBT.
legendary
Activity: 1344
Merit: 1024
Mine at Jonny's Pool
I'm still researching this to give a better bug report.

It definitely boils down to calling getblocktemplate.  I know there were a bunch of issues around memory (see jtoomin's posts about it), but that's not the problem I'm having here.  The problem I'm seeing is that at some point - which almost always corresponds to a long time between blocks on the network - the coin daemon just gets stuck.

For example, I pulled one of the bitcoin daemons out of the pool and suddenly it's behaving perfectly normally.  Block updates come through fine, CPU usage is virtually nil.  Sure, you could argue that I could reduce the number of calls to getblocktemplate - but that doesn't solve the problem it merely tries to avoid it.  If there's a limitation on calling getblocktemplate, I haven't seen it documented anywhere.  It's not like I'm hardware or bandwidth limited, either.  Unless I missed the notice somewhere that states you need 16 cores and 64G of RAM with 1GBit network to successfully use bitcoin daemon to call getblocktemplate every 5 seconds.  As it is, I've tested this on machines (bare metal and VPS) ranging from 4 cores / 4G RAM to 8 cores / 32G RAM.  On the 4 core / 4G RAM server, the effect was almost immediately noticeable.  Any time network blocks took longer than a few minutes, the daemon would enter this zombie-like stuck state.  On the 8 core / 16G RAM server, it would usually require blocks to not be found for around 45 minutes to an hour.
copper member
Activity: 1498
Merit: 1528
No I dont escrow anymore.
Some more information...

When the bitcoin daemon gets "stuck" I notice that CPU usage remains constant.  For example, one of the servers is a quad-core Xeon processor (effectively 8 CPU).  It's a dedicated server with 16G RAM and a 256G SDD running Debian 8.3.  As the time passes between blocks, the CPU usage steadily creeps up until a new block is received, at which point the CPU usage drops.  If too long passes between blocks being received (i.e. like the last block which took nearly an hour) the CPU usage levels out and the bitcoin daemon exhibits the behavior of being "stuck".  Once that point is reached, no new blocks are received.  Calling any function via bitcoin-cli results in either the call returning that it couldn't parse the results from the server, or a wait of quite a few seconds before it returns the results.

Restarting the bitcoin daemon immediately resolves the problem.  The CPU usage drops to nearly zero.  This cycle continuously repeats itself.

At no point is the server using anywhere over 8G of RAM total out of the 16G available.  It doesn't even touch the swap (which is 4G).

I see this exact behavior on every single bitcoin daemon.  I have tried daemons compiled from source and those precompiled and installed via the bitcoin repository.  I see it on VPS instances as well as bare metal.

Sounds like something you should post on github[1]. Might be a bug, might not be, but chances are higher than someone there has an idea what you can try to come closer to a solution.

[1] https://github.com/bitcoin/bitcoin/issues
legendary
Activity: 1344
Merit: 1024
Mine at Jonny's Pool
Some more information...

When the bitcoin daemon gets "stuck" I notice that CPU usage remains constant.  For example, one of the servers is a quad-core Xeon processor (effectively 8 CPU).  It's a dedicated server with 16G RAM and a 256G SDD running Debian 8.3.  As the time passes between blocks, the CPU usage steadily creeps up until a new block is received, at which point the CPU usage drops.  If too long passes between blocks being received (i.e. like the last block which took nearly an hour) the CPU usage levels out and the bitcoin daemon exhibits the behavior of being "stuck".  Once that point is reached, no new blocks are received.  Calling any function via bitcoin-cli results in either the call returning that it couldn't parse the results from the server, or a wait of quite a few seconds before it returns the results.

Restarting the bitcoin daemon immediately resolves the problem.  The CPU usage drops to nearly zero.  This cycle continuously repeats itself.

At no point is the server using anywhere over 8G of RAM total out of the 16G available.  It doesn't even touch the swap (which is 4G).

I see this exact behavior on every single bitcoin daemon.  I have tried daemons compiled from source and those precompiled and installed via the bitcoin repository.  I see it on VPS instances as well as bare metal.
legendary
Activity: 1344
Merit: 1024
Mine at Jonny's Pool
Other than the OS, bitcoind is the only thing running on server 2.  The bitcoin daemon is acting as a redundant node for my pool's stratum server, so RPC calls to create blocks are coming from the stratum server.  Also, I've noticed the outgoing traffic is absolutely absurd - and am likely going to have to shut down this server because of it (over 225G of data in the past couple hours).
copper member
Activity: 1498
Merit: 1528
No I dont escrow anymore.
Restarting the bitcoin daemon on server 2 caused exactly the same result as I posted originally: it quickly synchronized up the missing blocks.

This behavior of getting "stuck" for lack of a better term is completely baffling to me.

odd indeed, it sounds like server 2 is bussy (indicated by the few seconds wait). Could it be bussy with something else that is keeping it from syncing?
legendary
Activity: 1344
Merit: 1024
Mine at Jonny's Pool
Restarting the bitcoin daemon on server 2 caused exactly the same result as I posted originally: it quickly synchronized up the missing blocks.

This behavior of getting "stuck" for lack of a better term is completely baffling to me.
legendary
Activity: 1344
Merit: 1024
Mine at Jonny's Pool
Alright... so it's happening right as of this moment.  I have bitcoin daemons running on two servers.  On server 1, the daemon is fully synchronized.  On server 2, it is a few blocks behind.  Server 1 is connected to server 2 and vice-versa via addnode= in the bitcoin.conf file.

Every now and then when I run bitcoin-cli getinfo command on server 2, I get the following result:
Code:
bitcoin-cli getinfo
error: couldn't parse reply from server

When it does successfully execute, I see this:

Code:
bitcoin-cli getinfo
{
  "version": 120000,
  "protocolversion": 70012,
  "blocks": 406654,
  "timeoffset": 0,
  "connections": 17,
  "proxy": "",
  "difficulty": 166851513282.7772,
  "testnet": false,
  "relayfee": 0.00005000,
  "errors": ""
}

Running exactly that same thing on server 1:

Code:
bitcoin-cli getinfo
{
  "version": 120000,
  "protocolversion": 70012,
  "blocks": 406659,
  "timeoffset": 0,
  "connections": 53,
  "proxy": "",
  "difficulty": 166851513282.7772,
  "testnet": false,
  "relayfee": 0.00005000,
  "errors": ""
}

I also notice that on server 2, the bitcoin-cli getinfo command takes a few seconds to respond.

This makes absolutely no sense to me.  Why would the bitcoin daemon on server 2 be behind that on server 1?
legendary
Activity: 1344
Merit: 1024
Mine at Jonny's Pool
Thanks for the update shorena.  Thankfully I haven't seen that behavior again recently on my nodes, so maybe it was indeed just a bunch of bad nodes I was connected to.  If I do encounter it again, I'll certainly ping the nodes to which I am connected to see if they're up to date.  I added your node, so next time I restart the coin daemons, they'll connect up to you.
copper member
Activity: 1498
Merit: 1528
No I dont escrow anymore.
So, I've got a few bitcoin daemons running on multiple OS.  Things typically go smoothly; however, every now and then one of them will simply stop getting new blocks.  Even though it is fully connected to the network and is responding to queries fine, it's just not caught up.  I could understand if it were a few seconds back because of block propagation, but sometimes it's many minutes and multiple blocks behind.

The process itself isn't frozen, the debug log shows nothing amiss - typical advertise local messages, connections coming in, etc.  Yes, the proper ports are opened.  The CPU isn't spiked.  There's plenty of free RAM.  It's like it just decides to forget about the fact that it's supposed to get new blocks.  If I restart the process, it immediately gets the blocks it was behind.

Any thoughts?

Do your peers have the blocks you dont?
That's a good question.  I can't imagine that the 80 some-odd peers I'm typically connected to don't have those blocks, but certainly worth investigating the next time it happens.

I try to connect to decent peers; however, nobody's perfect I suppose.  Is there a good, up to date list of known well-behaving peers somewhere?

I dont think so. I used to keep a short list to help people sync faster, but the last version is already months old. You can try adding mine (188.68.53.44:8333)  though.

  Also, I'm always connected to Matt's relay network, so wouldn't that be notifying me of blocks quickly?

Looks like its having problems -> https://bitcointalksearch.org/topic/m.14160508
legendary
Activity: 1344
Merit: 1024
Mine at Jonny's Pool
So, I've got a few bitcoin daemons running on multiple OS.  Things typically go smoothly; however, every now and then one of them will simply stop getting new blocks.  Even though it is fully connected to the network and is responding to queries fine, it's just not caught up.  I could understand if it were a few seconds back because of block propagation, but sometimes it's many minutes and multiple blocks behind.

The process itself isn't frozen, the debug log shows nothing amiss - typical advertise local messages, connections coming in, etc.  Yes, the proper ports are opened.  The CPU isn't spiked.  There's plenty of free RAM.  It's like it just decides to forget about the fact that it's supposed to get new blocks.  If I restart the process, it immediately gets the blocks it was behind.

Any thoughts?

Do your peers have the blocks you dont?
That's a good question.  I can't imagine that the 80 some-odd peers I'm typically connected to don't have those blocks, but certainly worth investigating the next time it happens.

I try to connect to decent peers; however, nobody's perfect I suppose.  Is there a good, up to date list of known well-behaving peers somewhere?  Also, I'm always connected to Matt's relay network, so wouldn't that be notifying me of blocks quickly?
copper member
Activity: 1498
Merit: 1528
No I dont escrow anymore.
So, I've got a few bitcoin daemons running on multiple OS.  Things typically go smoothly; however, every now and then one of them will simply stop getting new blocks.  Even though it is fully connected to the network and is responding to queries fine, it's just not caught up.  I could understand if it were a few seconds back because of block propagation, but sometimes it's many minutes and multiple blocks behind.

The process itself isn't frozen, the debug log shows nothing amiss - typical advertise local messages, connections coming in, etc.  Yes, the proper ports are opened.  The CPU isn't spiked.  There's plenty of free RAM.  It's like it just decides to forget about the fact that it's supposed to get new blocks.  If I restart the process, it immediately gets the blocks it was behind.

Any thoughts?

Do your peers have the blocks you dont?
legendary
Activity: 1344
Merit: 1024
Mine at Jonny's Pool
So, I've got a few bitcoin daemons running on multiple OS.  Things typically go smoothly; however, every now and then one of them will simply stop getting new blocks.  Even though it is fully connected to the network and is responding to queries fine, it's just not caught up.  I could understand if it were a few seconds back because of block propagation, but sometimes it's many minutes and multiple blocks behind.

The process itself isn't frozen, the debug log shows nothing amiss - typical advertise local messages, connections coming in, etc.  Yes, the proper ports are opened.  The CPU isn't spiked.  There's plenty of free RAM.  It's like it just decides to forget about the fact that it's supposed to get new blocks.  If I restart the process, it immediately gets the blocks it was behind.

Any thoughts?
Jump to: