the "nonce" presented for the final hashing against the difficulty function is presented as a full-width collision nonce plus two n-bit offsets, where your table size (and collision requirement) is 2n. If you don't update the table for every hash, evicting old values and inserting new ones, the things stored in the table will get too old to be within the range of offsets.
Probably I didn't understand your trick, how PoW will be verified on small nodes?
Okay... What's visible in the block is the nonce presented in the block header, which, when the block header is hashed, yields a hash that meets the difficulty target. You take *that* nonce and deconstruct it into a (64-bit) base nonce and two (smaller, depends on table size) offsets. The 64-bit nonce being one of the three momentum nonces, The 64-bit nonce minus the first of the two offsets being the second momentum nonce, and the 64-bit nonce plus the second of the two offsets being the third momentum nonce (and yes, I have made the nonce field in the header much wider to accommodate real memory-hard algorithms.)
Next you take the header and hash it three times - once with each of the three momentum nonces, instead of the original presented nonce. If the resulting hashes have the required degree of collision, then the presented nonce is valid. Meaning, it is one of the
very few possible presentable nonces among billions which satisfies the triplet property required by momentum hashing.
Nothing that's hard for a small node to check; it has to do a hash, check the difficulty, separate the fields of the presented nonce, do one addition and one subtraction, then do three more hashes and compare the results for collision.
The trick is that because the two offsets are limited in width, the three momentum nonces must all be found in a narrow subrange of the 64-bit search space. That means that, once you have populated your table with two collisions per hash target, you can't just start hashing at top speed presenting whatever's in the table along with the new value as a triplet; because if you do, your hashing will rapidly overshoot the subrange and then you won't be able to form triplets that can be encoded that way because you'd have to use offsets that are too big to fit in the limited offset bit width.
Instead, you have to keep updating your table. Every time you find a new nonce, you have to evict the oldest nonce found in its table location (the one nearest, or past, the trailing edge of the subrange you're currently searching) and replace it with the new nonce (which is on the absolute leading edge of the subrange you're currently searching). That way, when you find the
next nonce whose hash matches that target, you don't have the too-old entry in your table (now past the trailing edge of the subrange) preventing you from forming a valid presentable hash. And this means that you have to do a table read
and a table write with every new nonce, instead of just a table read. Writes are much slower, and mean that hashing can't be parallelized much because table contention.
Another potential point of failure is hash function used in Merkle tree.
Oh? I haven't heard of anything dodgy or suspect about the Merkle tree implementation. What's better about something else?