I think I've found a good way to solve the replay protection thing and another issue at the same time.
Currently, p2pool weights each share by its difficulty alone. The expected revenue for the pool from each share equals the difficulty times the block reward (with fees). This means that shares with low fees get overcompensated, and shares with high fees get undercompensated, which means that the optimal strategy for most p2pool miners is to make small shares/blocks and hope that the rest of the pool irrationally mines big blocks.
This misaligned incentive can be fixed by weighting shares by their difficulty times the block reward they would have generated. Pretty simple. As it affects payment calculations, which need to be precise down to the satoshi, t's a completely backwards-incompatible change, but that's something I want for this fork anyway. Yay!
I'm trying to figure out the implementation now. These revenue calculations are done with a SkipList implementation so that p2pool doesn't have to add up every single share's weight every time it assigns new work, and it's taking me a while to understand the code well enough to change it without breaking everything. But I think it will be working soon. Once I've got the calculations correct, I'll deploy them along with a share version change to v64 (just to give the legacy chain room for plenty of upgrades if it continues to be used), and then I think the code will be ready for everyone else to use.
After that, I might take a break from coding on p2pool, or I might continue for another week or two. If I continue, I have some ideas for optimizations that could improve p2pool's performance and orphan rates. They fall into two categories: CPU performance, and network performance.
For CPU performance, almost all of p2pool's CPU time is spent packing and unpacking shares and transactions between Python objects and serialized bytestreams. I can probably accelerate this process either by rewriting the Python code into faster Python code, simplifying the Python data structures for transactions, and/or adding Cython support. If I go the Cython route, I can make it so that p2pool can run without Cython by stripping the static typing information from the code, but if you have Cython and a C compiler installed (or if you downloaded a binary package), you will use C compiled versions of a couple of the most performance-critical modules. This should give us about the same CPU performance benefit of pypy (or better) without any of pypy's 2GB memory overhead. Improving CPU performance could improve share propagation speed and new work generation, thereby reducing orphan/DOA rates and making revenue allocation more fair. Also, it will make it cheaper to run a p2pool node, and help p2pool scale.
For network performance, it seems that the current code has a lot of room for improvement. I'd like to add a new network message that sends a list of transaction hashes to a p2pool peer and asks them to download the corresponding transactions from their bitcoind if possible. This would allow p2pool peers to get most of their transactions via localhost tcp instead of using the p2pool network, which should save a lot of bandwidth. I'd think it would reduce traffic by more than half, since p2pool is a lot more aggressive about forgetting transactions than bitcoind is. Another thing that can be done is to add a shorthash implementation like the one in xthin and compact blocks for share transmission. Once these two optimizations are in place, it should be practical for nodes to send the template of the share they're working on mining *before* it gets mined, and send only the nonce values and miner address once a share is found, at least to one's immediate peers. If I get that done, then I could also teach nodes to encode a message to each peer using the diff between the share being sent and the recipient's current work-in-progress, which would allow 1 MB shares with 95% new transactions to potentially be sent with only a dozen kilobytes or so of one-way traffic in the latency-critical path. After that, maybe I could play with bloom filters or IBLTs and and and...
Anyway, first things first. Gonna try to get the properly weighted share rewards done in the next couple of days, and then things should be ready for others to join in on the fork fun. (By the way, my fork is over 1 PH/s of non-rented miners now.)