Sorry for the slow response, Cryptonomist.
1) Is it correct to assume that the instance of the Tracker class in p2pool/util/forrest.py is the actual share chain?
That's one part of the relevant code. The actual tracker for the share chain is the
OkayTracker class which inherits from forest.tracker, as instantiated as
node.tracker. I think OkayTracker's code is more relevant and interesting.
By the way, it's forest.py with one 'r' (as in a bunch of trees), not forrest.py (as in the author's name).
2) Can someone suggest a way to get the time between the "Time first seen" and the addition to the share chain.
I would suggest printing out the difference between the time first seen and time.time() at the end of
data.py:OkayTracker.attempt_verify(). That seems like useful information for everyone. If you put it under the
--bench switch and submit it as a PR to 1mb_segwit, I'd likely merge it. Don't worry about it if you're not good with git/github, as it's not a big deal either way.
3) the flow of a share between the moment the node detects its existence and the final addition of it to the share chain is not very clear to me.
Yeah, that code is kinda spaghettified. It might help to insert a "raise" somewhere and then run it so you can get a printout of the stack trace at that point.
Quick from-memory version: the stuff in data.py (the BaseShare class and its child classes) gets called during share object instantiation and deserialization. When p2p.py receives a serialized share over the wire, it deserializes it and turns it into an object, then asks the node object in node.py what to do with it. node.py then passes it along to the node.tracker object and asks the tracker if it fits in the share chain; if it does, then node.tracker adds it to node.tracker.verified, and the next time node.tracker.think() is run (which is probably immediately afterward), node.tracker may choose to use that new share for constructing work to be sent to miners. This causes work.py:get_work() to generate a new stratum job (using data.py:*Share.generate_transaction() to make a coinbase transaction stub and block header) which gets passed via bitcoin/worker_interface.py and bitcoin/stratum.py to the mining hardware.
4a) Under the rules in the main p2pool network the shares are 100kb. So after 300 seconds on average the shares will have a total of 1mb transactions, and after 600 seconds on average the bitcoin blocks would be 2mb. Is this correct?
Sorta. It's a limit of 100 kB of new transactions. Less than 100 kB of new transactions can be added per share. The serialized size of the share is much lower than this, since the transactions are referred to by hash instead of as the full transaction; the serialized size of the candidate block that the share represents is much larger than this, and includes the old (reused) transactions as well as the new ones.
If the transaction that puts it over 100 kB is 50 kB in size, and has 51 kB of new transactions preceding it, then only 51 kB of transactions get added. If some of the old transactions from previous shares have been removed from the block template and replaced with other transactions, then those old transactions don't get included in the new share and your share (and candidate block) size goes down.
In practice, the candidate block sizes grow slower than 100 kB per share. I haven't checked very thoroughly how much slower, but in the one instance that I followed carefully it took around 25 shares to get to 1 MB instead of 10 shares.
4b) The hash of the header of the bitcoin block contains the merkle tree of the transactions the block contains. ... How can the transactions of several shares be added to get for example after 300 seconds 1 mb of transactions in a bitcoin block.
The hash of a share *is equal to* the hash of the corresponding bitcoin block header. The share structure includes a hash of all p2pool-specific metadata embedded into the coinbase transaction (search for gentx in data.py). The share has two different serializations: the long serialization (which is exactly equal to the block serialization, and which only includes the hash of the share-specific metadata), and the short serialization (which includes the block header plus the share-specific metadata such as the list of hashes of new transactions, the 2-byte or 3-byte reference links for the old transactions, the share difficulty, timestamps, etc.). Any synced p2pool node can recreate the long serialization from the short serialization, data in the last 200 shares of the share chain, and the hash:full_tx map in the node's known_txs_var.
The transactions aren't "added". If a transaction has been included in one of the last 200 shares in the share chain, then a share can reference that share using the number of steps back in the share chain (1 byte) and the index of the transaction within that share (1 or 2 bytes). These transactions -- "old" transactions -- do not count toward the 100 kB limit. If a transaction has not been included before, then the share will reference this transaction using its full 32-byte hash, and counts its full size (e.g. 448 bytes) against the 100 kB limit. Both types of references are committed into the metadata hash in the gentx, so both are immutable and determined at the time the stratum job is sent to the mining hardware.
https://github.com/jtoomim/p2pool/blob/9692d6e8f9980b057ae67e8970353be3411fe0fe/p2pool/data.py#L156My code currently has a soft limit of 1000 kB (instead of 100 kB or 50 kB) on new transactions per share, but unlike the p2ool master branch, this is not enforced at the consensus layer, so anyone can modify their code to exceed this limit without consequences from should_punish_reason().