I'm looking into the memory consumption issue now. Here's some data when running CPython:
Of the first 100 shares in the share chain, the average memory consumption (using pympler.asizeof) is 57 kB per share, with most shares between 20 kB and 100 kB.
Of this 57.3 kB average, 43.5 kB on average is taken up in share_info.transaction_hash_refs. See the background paragraph at the end if you want to know what that does. Each of these ints takes 24 bytes of RAM in CPython, even though we only need 1 or 2 bytes for each one based on the size of each int. It might be possible to encode these numbers as an array of uint8 and/or uint16 integers instead of regular python integers, and reduce the CPython memory consumption of this data by about 10-15x, or of the full share size by a little more than 3x, without any loss of functionality.
On the other hand, the list of new transaction hashes averaged 7 kB per share. These are 32 byte integers, but they take 64 bytes of Python memory. We might be able to save some memory here by using an array of strings (or just one really long string), but that would be inconvenient, and would only save about 2x memory on that variable, so it's probably not worth it.
Another option is to just forget all of the transaction data after the share is >200 shares away from the head of the chain, and reload it from disk if it's requested. This will probably be more work, and might open up some DoS vulnerabilities if I'm not careful with the code (since someone could mine a share that requires us to reload 200 shares off disk, and share parsing is hella slow right now), but would probably be able to reduce memory consumption by around 20x if done well.
I can't do the same tests on pypy, since sys.getsizeof() and asizeof() don't work on pypy, unfortunately.
Background on how p2pool's share format handles transactions: For each transaction in a share/block, p2pool will check to see if that transaction hash was encoded into one of the 200 previous shares in the share chain. P2pool then encodes that in share_info.transaction_hash_refs by referring to the share number (where 0 is this share, 1 is this share's parent, 2 is the grandparent, etc) and the index of that transaction in that share's share_info.new_transaction_hashes. If the transaction hash wasn't in a previous share, then p2pool also sticks that transaction hash into share_info.new_transaction_hashes. When sent over the network, these numbers are encoded as var_ints, so it's usually 1 byte for the share index, and 1-2 bytes for the transaction index, plus 32 bytes for each of the new hashes.
Edit: Yeah, the array of uint16 thing seems to work pretty well, at least on CPython. 2170 MB -> 489 MB = 4.44x reduction.
Edit 2: Seems to benefit pypy (5.7.1) even more. 5310 MB -> 785 MB = 6.76x reduction. Now I just need to make sure the new code can successfully mine shares...
Edit 3: Seems to mine shares just fine. Neat.
https://github.com/jtoomim/p2pool/commits/lowmem for now, but I'll merge it into 1mb_hardforked once it's been tested for more than a few minutes.