Pages:
Author

Topic: Starting preliminary 0.94 testing - "Headless fullnode" - page 4. (Read 15290 times)

newbie
Activity: 10
Merit: 0
PLEASE checkout the ffreeze branch and help test!  Once we get a little bit more testing we'll do a semi-official testing release with a proper bug bounty!

After successful sync and several successful restarts, I run into the following error. This is happening with the Bitcoin Core 0.11 fully synced and running fine in the background.

Code:
➜  git log
commit 083fc5b (HEAD -> ffreeze, origin/ffreeze)
Author: Alan Reiner
Date:   Fri Aug 7 19:34:08 2015 -0400

    Added download script to get offline bundle deps

Code:
➜  uname -a
Linux qertoip 3.13.0-37-generic #64-Ubuntu SMP Mon Sep 22 21:28:38 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

Code:
➜  ~ python /ArmoryQt.py
(ERROR) ArmoryQt.py:1329 - 4 attempts to load blockchain failed.  Remove mempool.bin.
(ERROR) ArmoryQt.py:1334 - File mempool.bin does not exist. Nothing deleted.
-INFO  - 1439063351: (BlockUtils.cpp:873) blkfile dir: /home/qertoip/.bitcoin/blocks
-INFO  - 1439063351: (BlockUtils.cpp:874) lmdb dir: /home/qertoip/.armory/databases
-INFO  - 1439063351: (lmdb_wrapper.cpp:446) Opening databases...
-INFO  - 1439063351: (BlockUtils.cpp:1230) Executing: doInitialSyncOnLoad
"sni-qt/7281" WARN  21:49:11.804 void StatusNotifierItemFactory::connectToSnw() Invalid interface to SNW_SERVICE
-INFO  - 1439063351: (BlockUtils.cpp:1314) Total number of blk*.dat files: 317
-INFO  - 1439063351: (BlockUtils.cpp:1315) Total blockchain bytes: 42,434,701,647
-INFO  - 1439063351: (BlockUtils.cpp:1823) Reading headers from db
(ERROR) announcefetch.py:312 - Could not verify data in signed message block
Traceback (most recent call last):
  File "/home/qertoip/Projects/BitcoinArmory/announcefetch.py", line 304, in __runFetchSequence
    sig, msg = readSigBlock(digestData)
  File "/home/qertoip/Projects/BitcoinArmory/jasvet.py", line 589, in readSigBlock
    name = r.split(BEGIN_MARKER)[1].split(DASHX5)[0]
IndexError: list index out of range
-INFO  - 1439063355: (BlockUtils.cpp:1849) Found 369016 headers in db
-DEBUG - 1439063355: (Blockchain.cpp:214) Organizing chain w/ rebuild
-WARN  - 1439063356: (BlockUtils.cpp:1343) --- Fetching SSH summaries for 346 registered addresses
-INFO  - 1439063356: (BlockUtils.cpp:1356) Left off at file 316, offset 71970929
-INFO  - 1439063356: (BlockUtils.cpp:1359) Reading headers and building chain...
-INFO  - 1439063356: (BlockUtils.cpp:1360) Starting at block file 316 offset 71970929
-INFO  - 1439063356: (BlockUtils.cpp:1362) Block height 368994
-INFO  - 1439063356: (BlockUtils.cpp:345) parsing headers in file 316
-DEBUG - 1439063357: (Blockchain.cpp:214) Organizing chain w/ rebuild
-INFO  - 1439063358: (BlockUtils.cpp:1399) Looking for first unrecognized block
-INFO  - 1439063358: (BlockUtils.cpp:1403) Updating Headers in DB
-INFO  - 1439063358: (BlockUtils.cpp:1677) Loading block data... file 316 offset 71970929
-INFO  - 1439063358: (BlockUtils.cpp:395) reading blocks from file 316
-INFO  - 1439063358: (BlockUtils.cpp:1417) Wrote blocks to DB in 0.102265s
-WARN  - 1439063358: (BlockUtils.cpp:1113) Scanning from 368991 to 368996
-ERROR - 1439063358: (BlockWriteBatcher.cpp:359) Header heigh&dup is not in BLKDATA DB
-ERROR - 1439063358: (BlockWriteBatcher.cpp:359) Header heigh&dup is not in BLKDATA DB
-ERROR - 1439063358: (BlockWriteBatcher.cpp:360) (368994, 0)
-ERROR - 1439063358: (BlockWriteBatcher.cpp:360) (368992, 0)
-ERROR - 1439063358: (BlockWriteBatcher.cpp:359) Header heigh&dup is not in BLKDATA DB
-ERROR - 1439063358: (BlockWriteBatcher.cpp:360) (368993, 0)
-ERROR - 1439063358: (BlockWriteBatcher.cpp:359) Header heigh&dup is not in BLKDATA DB
-ERROR - 1439063358: (BlockWriteBatcher.cpp:360) (368991, 0)
-ERROR - 1439063358: (BlockWriteBatcher.cpp:2175) hit interruption marker from pull threads
-INFO  - 1439063358: (BlockUtils.cpp:1457) checking scan integrity
-WARN  - 1439063358: (BlockUtils.cpp:1462) Top scanned block does not match top block header
-WARN  - 1439063358: (BlockUtils.cpp:1525) Issue is benign, moving on
-ERROR - 1439063358: (BDM_mainthread.cpp:430) BDM thread failed: bad block meta value

(python:7281): Gtk-CRITICAL **: IA__gtk_progress_configure: assertion 'value >= min && value <= max' failed

If you need anything else please let me know.
legendary
Activity: 3430
Merit: 3080
Nothing installed that measures disk bandwidth, unless there's something in standard Gnome Debian that does. Using an SSD for the databases, not more than 2/3rds full. The drive responds as it ever has to other workloads.
legendary
Activity: 3794
Merit: 1375
Armory Developer
What's your disk bandwidth at?
legendary
Activity: 3430
Merit: 3080
Just got to the point with re-building supernode Db where the scanning begins. I think the threading sync is a little wrong for my platform somehow, range of CPU usage is 5-15%, and I state it that way because it's very bursty. Still getting long estimates for a complete scan (1.5 days currently).
jje
newbie
Activity: 1
Merit: 0
got 0.94 working on fedora 22. very enthused about the smaller database. hats off to the devs!
legendary
Activity: 3794
Merit: 1375
Armory Developer
Forgot to mention the DB format has changed quite a lot, you are better off getting rid of that older DB and starting fresh.

What could be accounting for low CPU usage (20%) and a 2 days estimate scan time?

Eventually it all comes down to your drive's bandwidth. A lot of in depth optimization (that I skipped this time) would be to modify the DB engine to line up all new written data sequentially. Another big optimization I skipped would be to fragment to spentness DB. The history DB is easy to fragment into smaller subsets so you can throw a thread at each of them while keeping the subset per thread fairly small. It speeds up searches and reduces the effort to realign data within each subset (as opposed to one massive DB).

The spentness DB however is one single block and is thus written to by a single thread. LMDB only allows a single writer per DB and a single transaction per thread (which is a common sense approach to a transactional DB design: 1 writer, unlimited readers). There's quite an effort to provide to split down that data in a way multiple threads can write several subsets concurrently. History is keyed by addresses so it's pretty simple to break it down into group of addresses. Spentness is keyed by block height & transaction index & txout index, so breaking down the subset is more complicated. You need to fragment it in a way where you can still resolve arbitrary searches without more context that a transaction hash and txout id.

I got a good idea to implement that, and this change (which I would apply to the history DB as well) would allow for crazy fast scanning on SSD and moderately fast scanning on HDD. However it is massive, requires some dynamic parameters that will add a lot of interlocking in certain corner cases, and I had to wrap this version up at some point.

Keep in mind that the target use case for supernode is still a medium to large server meant to run as the backend to a web service like bc.info or as a bootstrap server for litenodes. We don't have the time nor the resources to get supernode working on HDD. I'm not sure this will ever be a target hardware for supernode either, although I may err on that path for pure hobbyist satisfaction.
legendary
Activity: 3430
Merit: 3080
After splitting it, I scanned the history in 1h30 and built balance in 5min.

The good part, besides the speed boost, is robustness. Since the 2 are now separate, I added an option in supernode to run only the balance tallying part for quick fixing a damaged DB. It's called "Rescan SSH". Should fix the DB in 5~20min depending on the machine.

PS: There still is room for some very significant optimization, but I've concluded they are out of the scope of this release.

What could be accounting for low CPU usage (20%) and a 2 days estimate scan time? The number of addresses Armory scans? One thing I perhaps should have mentioned is that I did not delete the database folder from previous experiences with supernode (which ended midway through a troubled tx scan once you hinted supernode changes a few weeks back)
legendary
Activity: 3794
Merit: 1375
Armory Developer
With supernode, I'm now getting no thread toggling (1.5 days to scan history). Logging for the thread toggling got removed from headless too, and yet that mode clearly multithreads the scanning workload.

That was way too much verbose anyways. After profiling a bit of profiling it seems thread toggling is pointless. Better off setting all processes to max thread count (as returned by std::thread::hardware_concurrency()). There already is a RAM ceiling coded in, so the different parts of the scan (reading data, parsing, serializing, writing) cannot get ahead of one another. In this case it's simpler to max out thread count for each and everyone of them and let the OS sort things out.

Each part waits on the next one through mutexes and condition variables, so all these threads are sleeping until they're allowed to work again. No harm done and it squeezes as much CPU time as possible. On the other hand toggling is a pain to tune properly. With the current toggler, a mainnet fullnode scan takes me ~8m30. With all thread counts maxed out it takes short of 5m.

Quote
I like the "resume initialising from blockfile xxx" behaviour, serious productivity boost when testing supernode.

A lot changed there. The DB is now write ahead only. The previous version would modify earlier entries to mark spent TxOuts. Now it always writes ahead and keeps spentness in a dedicated DB. It speeds the process a lot (reduces rewrites) and guarantees that the DB can overwrite data by starting at the top of the last properly committed batch with no risk of corrupting the dataset.

The one thing that did get corrupted a lot was the balance and transaction count for each address. There's a whole new section of code to handle that now, independently of scanning history. You need context to compute balance, since you are tallying the effect of each TxOut and Txin for each address. The previous version of supernode tallied balance while scanning history. If the DB failed to resume in the exact same state as before it crashed, there was a decent chance a least one balance got corrupted, and that meant rescanning from scratch.

This version separates the 2 processes entirely. It first scans history, then computes balances. This simplifies and speeds up a lot of code. First of all, keeping track of balance at all times creates a lot a rewrites: every time an address appears in a batch, you need to pull the existing balance from the DB, update it and write it back. Before splitting the 2 processes, 0.94 scanned supernode in 4h30. After splitting it, I scanned the history in 1h30 and built balance in 5min.

The good part, besides the speed boost, is robustness. Since the 2 are now separate, I added an option in supernode to run only the balance tallying part for quick fixing a damaged DB. It's called "Rescan SSH". Should fix the DB in 5~20min depending on the machine.

PS: There still is room for some very significant optimization, but I've concluded they are out of the scope of this release.
legendary
Activity: 3430
Merit: 3080
In standard (headless?) mode, I'm getting block Db rebuilds every quit/restart. No apparent errors in logs, either in the build or the restarts. Db rebuilt/rescanned using latest commit. Using 0.11 for Core.

Fixed.

Confirmed.

With supernode, I'm now getting no thread toggling (1.5 days to scan history). Logging for the thread toggling got removed from headless too, and yet that mode clearly multithreads the scanning workload. I like the "resume initialising from blockfile xxx" behaviour, serious productivity boost when testing supernode.
legendary
Activity: 3794
Merit: 1375
Armory Developer
Fixed. Just a dumb omission on my part.
legendary
Activity: 3430
Merit: 3080
In standard (headless?) mode, I'm getting block Db rebuilds every quit/restart. No apparent errors in logs, either in the build or the restarts. Db rebuilt/rescanned using latest commit. Using 0.11 for Core.

Fixed.

Will try it out shortly.

Here's something from a crash I got in supernode mode:

Code:
-WARN  - 1438705827: (BlockWriteBatcher.cpp:505) Finished applying blocks up to 40000
-WARN  - 1438705827: (BlockWriteBatcher.cpp:505) Finished applying blocks up to 42500
-WARN  - 1438705827: (BlockWriteBatcher.cpp:505) Finished applying blocks up to 47500
-WARN  - 1438705828: (BlockWriteBatcher.cpp:505) Finished applying blocks up to 52500
-WARN  - 1438705828: (BlockWriteBatcher.cpp:2621) Readjusting thread count:
-WARN  - 1438705828: (BlockWriteBatcher.cpp:2622) 0 readers
-WARN  - 1438705828: (BlockWriteBatcher.cpp:2623) 4 workers
-WARN  - 1438705828: (BlockWriteBatcher.cpp:2624) 4 writers
-WARN  - 1438705828: (BlockWriteBatcher.cpp:2625) 1 old reader count
-WARN  - 1438705828: (BlockWriteBatcher.cpp:2626) 4 old worker count
-WARN  - 1438705828: (BlockWriteBatcher.cpp:2627) 4 old writer count
-WARN  - 1438705828: (BlockWriteBatcher.cpp:505) Finished applying blocks up to 55000
Floating point exception

Could try it again using gdb if that helps.
legendary
Activity: 3794
Merit: 1375
Armory Developer
In standard (headless?) mode, I'm getting block Db rebuilds every quit/restart. No apparent errors in logs, either in the build or the restarts. Db rebuilt/rescanned using latest commit. Using 0.11 for Core.

Fixed.
legendary
Activity: 3794
Merit: 1375
Armory Developer
In standard (headless?) mode, I'm getting block Db rebuilds every quit/restart

Will look into it.
legendary
Activity: 3430
Merit: 3080
In standard (headless?) mode, I'm getting block Db rebuilds every quit/restart. No apparent errors in logs, either in the build or the restarts. Db rebuilt/rescanned using latest commit. Using 0.11 for Core.
legendary
Activity: 3794
Merit: 1375
Armory Developer
New commit, calling out the testers =P

Most of the changes are stability and supernode improvements.
legendary
Activity: 3430
Merit: 3080
https://bitcointalksearch.org/topic/m.11911942

The 0.94 release makes use of multi-threading to improve syncing performance, but multi-threading is a notoriously difficult discipline from which to iron out the kinks. Your enthusiasm (as well as that displayed by others in the thread) will be very useful when ATI do put testing builds out; getting this sort of code as solid as can be is important to qualify for release quality.
full member
Activity: 147
Merit: 100
Do you like fire? I'm full of it.
Much good news around here. When do we start preliminarily testing the build cross-platform too? I would very much love to dive right in with a 0.9.4 Win x64 build despite whatever remaining uncertainties, and I doubt I'd be the only one. Smiley
sr. member
Activity: 260
Merit: 251
Fix has been checked in. Give it a go.

Works like a champ. Thank you very much.
sr. member
Activity: 255
Merit: 250
Senior Developer - Armory
current revision of ffreeze (hash b627160) won't build on debian 7.8 32-bit:

Quote
g++  -Icryptopp -Imdb -DUSE_CRYPTOPP -D__STDC_LIMIT_MACROS -I/usr/include/python2.7 -I/usr/include/python2.7 -std=c++11 -O2 -pipe -fPIC -c ScrAddrObj.cpp
ScrAddrObj.cpp: In member function 'void ScrAddrObj::purgeZC(const std::set&)':
ScrAddrObj.cpp:160:45: error: invalid initialization of non-const reference of type 'TxRef&' from an rvalue of type 'TxRef'
ScrAddrObj.cpp:175:46: error: invalid initialization of non-const reference of type 'TxRef&' from an rvalue of type 'TxRef'
make[1]: *** [ScrAddrObj.o] Error 1

I've reverted to 90586da for the time being which does still compile.

Confirmed. I have a fix but will check with goatpig before it gets checked in.

Fix has been checked in. Give it a go.
sr. member
Activity: 255
Merit: 250
Senior Developer - Armory
current revision of ffreeze (hash b627160) won't build on debian 7.8 32-bit:

Quote
g++  -Icryptopp -Imdb -DUSE_CRYPTOPP -D__STDC_LIMIT_MACROS -I/usr/include/python2.7 -I/usr/include/python2.7 -std=c++11 -O2 -pipe -fPIC -c ScrAddrObj.cpp
ScrAddrObj.cpp: In member function 'void ScrAddrObj::purgeZC(const std::set&)':
ScrAddrObj.cpp:160:45: error: invalid initialization of non-const reference of type 'TxRef&' from an rvalue of type 'TxRef'
ScrAddrObj.cpp:175:46: error: invalid initialization of non-const reference of type 'TxRef&' from an rvalue of type 'TxRef'
make[1]: *** [ScrAddrObj.o] Error 1

I've reverted to 90586da for the time being which does still compile.

Confirmed. I have a fix but will check with goatpig before it gets checked in.
Pages:
Jump to: