Author

Topic: Quiz: Are you a Satoshi client guru developer? (Read 987 times)

legendary
Activity: 1428
Merit: 1093
Core Armory Developer
Rather than quizzes, I've been working on enumerating some programming exercises for people to do if they want to start Bitcoin programming.   They mainly consist of "Write code to open the blk*.dat files and.... (1) Count the number of blocks (2) Count the number of transactions (3) Verify the proof-of-work and merkle root for each block (4) Identify all the different types of TxOut scripts and address strings associated with them (5) Find the orphan blocks, (6) Calculate the number of UTXO in the pruned blockchain (7) etc".  With references to the on-the-wire bytemaps and the Protocol Specification.

I'm sure I could come up with some cool quizzes, though not specifically about the Bitcoin-Qt code... more about Bitcoin in general and it's quirks (count the endianness switches needed for this process on a LE machine: ...".  But I'm not sure how useful that is (not that threads like this need to be highly efficient in their usefulness).  But I think it would be neat to have something more elaborate like above to give people concrete ways to get into Bitcoin dev.
I guess you forgot about: (0) - I dare you to build this client yourself.
IMO, this would rule out like 90% of candidates Smiley

Yup, 90% of developers probably don't have the skill and patience to become a real Bitcoin developer.  If they can't reverse engineer bytemaps and understand UTXO lists, then they're certainly not going to figure out how to deal with reorgs, and the plethora of other things that need special attention and care to get right.
legendary
Activity: 2053
Merit: 1356
aka tonikt
Rather than quizzes, I've been working on enumerating some programming exercises for people to do if they want to start Bitcoin programming.   They mainly consist of "Write code to open the blk*.dat files and.... (1) Count the number of blocks (2) Count the number of transactions (3) Verify the proof-of-work and merkle root for each block (4) Identify all the different types of TxOut scripts and address strings associated with them (5) Find the orphan blocks, (6) Calculate the number of UTXO in the pruned blockchain (7) etc".  With references to the on-the-wire bytemaps and the Protocol Specification.

I'm sure I could come up with some cool quizzes, though not specifically about the Bitcoin-Qt code... more about Bitcoin in general and it's quirks (count the endianness switches needed for this process on a LE machine: ...".  But I'm not sure how useful that is (not that threads like this need to be highly efficient in their usefulness).  But I think it would be neat to have something more elaborate like above to give people concrete ways to get into Bitcoin dev.
I guess you forgot about: (0) - I dare you to build this client yourself.
IMO, this would rule out like 90% of candidates Smiley
legendary
Activity: 1232
Merit: 1094
Rather than quizzes, I've been working on enumerating some programming exercises for people to do if they want to start Bitcoin programming.   

I was thinking that a nice open source project would be to build a node validator.  This would not have to be fast (or capable to handling 1000+ connections).

You would start the validator and then start your node-under-test with checkpoints disabled and an addnode=localhost:12345 (is there a "disable blacklist option"?).

The validator would listen on that port and when it gets a connection, it would make a 2nd connection to the node-under-test.

The validator would feed it transactions and blocks and see how it responds on the 2nd connection.  If the node-under-test forwards a bad tx or fails to forward a valid one, then it would fail that test.

It could go through a sequence and test all functionality that a node is supposed to perform.  It could also check the "protocol guidelines", so make sure the node acts "socially responsibly".

Things like bandwidth limiting of forwarding transactions could be an issue.  However, in many cases, receiving the inv message (tx or block) would be sufficient to indicate if the node planned to forward it.

One of the tests could be to see if the node limits bandwidth per client.

Once finished, it would display a list of tests and the result.

The benefit is that it would be a way for new clients to be tested in a systematic way.  It wouldn't matter what language the new client was programmed in, since the test uses TCP connections.

Another project would be a "gatekeeper" for miners. 

This would be a piece of software that only forwards inv messages for transactions and blocks which are forwarded by a group of test nodes.

The setup would be

Internet <----> <---->   <---->    <---->

The passthrough node would connect to the internet.  It would guarantee that each of the test nodes gets the exact same ordering of blocks, but do no checking other than that.

When a new block is found, it would be processed by the test nodes and when verified, they would send an inv to the gatekeeper.  The gatekeeper would only request the block, when the last test node sends an inv for it.  It then pulls it from one of them and forwards it (unchecked) to the mining node.

This means that blocks and transactions which aren't accepted by all test nodes wouldn't be sent to the miner.

The test nodes could be the latest versions of the reference client and a few other popular clients.

This slows things down for the mining node. 

The passthrough could act as a switch and normally forward all messages to the mining node directly.  If the gatekeeper detects disagreement between the test nodes, then it would order the passthrough node to stop forwarding to the mining node.  It would also send an email/text to the admin.

This prevents the miner from following a building on a fork which is not accepted by some of the other clients.
legendary
Activity: 1428
Merit: 1093
Core Armory Developer
Rather than quizzes, I've been working on enumerating some programming exercises for people to do if they want to start Bitcoin programming.   They mainly consist of "Write code to open the blk*.dat files and.... (1) Count the number of blocks (2) Count the number of transactions (3) Verify the proof-of-work and merkle root for each block (4) Identify all the different types of TxOut scripts and address strings associated with them (5) Find the orphan blocks, (6) Calculate the number of UTXO in the pruned blockchain (7) etc".  With references to the on-the-wire bytemaps and the Protocol Specification.

I'm sure I could come up with some cool quizzes, though not specifically about the Bitcoin-Qt code... more about Bitcoin in general and it's quirks (count the endianness switches needed for this process on a LE machine: ...".  But I'm not sure how useful that is (not that threads like this need to be highly efficient in their usefulness).  But I think it would be neat to have something more elaborate like above to give people concrete ways to get into Bitcoin dev.
legendary
Activity: 1232
Merit: 1094
Uh is that the protocol, or a different serialization?
staff
Activity: 4284
Merit: 8808
BYW: The code in serialize.h states over "Variable-length integers" : "Every integer has exactly one encoding".
This is mistaken.
Fun list!  Though on this point you're incorrect, or at least its debatable. The code in question is generic and doesn't only work on fixed length types. The encoding is non-redundant, but the current code doesn't bother to prevent overflow, and I believe this is actually an oversight. Perhaps it should  probably changed to take a maximum size so that data on the range 0-255 could be encoded without overhead.  This encoding is used for the ultraprune databases, not external IO.

Why another format?  There are 6676408 TX outs: increasing each one by only 1 byte would increase the working set size by 3%. Since this serialization is used only internally there aren't a lot of ecosystem costs from it... but the space savings matters. Be glad we don't have a full range coder in there. Tongue
hero member
Activity: 555
Merit: 654
If you think you are, then you should tell the difference between all the possibilities in each category, without looking at the source code.

What's the difference between:

1. Arbitrary length integers:

A. CAutoBN_CTX (hint: bignum.h)
B. BIGNUM (hint: bignum.h)
C. CBigNum (hint: bignum.h)
D. base_uint  (hint: uint256.h)

2. Representation of long integers:

A. "Compact" representation of CBigNum (used for difficulty storage)
B. Hex representation of CBigNum (not used in production code)
C. Base(b) representation of CBigNum (only used for debug messages)
D. Script Stack representation of CBigNum OR Serialized CBigNum (hint: CBigNum.setvch in bignum.h)
E. Script CastToBigNum() representation of CBigNum (hint: script.cpp: CastToBigNum())
F. OpenSSL mpi representation

3. Representation of sizes:

A. var_int or "Compact size"  (hint: serialize.h)
B. "Variable-length integers"  (hint: serialize.h)


These are real interpretation problems I had to face while digging into the Satoshi source code.

BYW: The code in serialize.h states over "Variable-length integers" : "Every integer has exactly one encoding".
This is mistaken. When you decode a var-length integer into an arbitrary fixed length type, an infinite number of encodings refer to the same number.
E.g.: decoding into an uint32: "0x80 0x80 0x80 0x80 0x00" = "0x80 0x80 0x80 0x80 0x80 0x00"  = "0x80 .... 0x80 0x00"
Please no more malleability!!! I wonder why it was necessary to invent a new encoding anyway.

I encourage Bitcoin pros to write their own quizzes and share them with the community.

Best regards, Sergio.
Jump to: