So, I'm currently rebuilding the block index using txindex and once that's done, I'll rerun the code so I'll have all of the data.
Obvious, but easer than having to deal with removing characters below ' ' or above '~' and the problems they can cause in scripts and output
blockchain has this (incomplete) list of coinbase strings they use:
https://github.com/blockchain/Blockchain-Known-Pools
I had one many years ago (last time I updated it was Sep-2012) ... but half the pools in it don't exist any more
Now for a stupid question... and I only say it's stupid because I've been mining on p2pool for well over a year and don't know the answer myself ... is p2pool the only pool to use "/P2SH/" in the coinbase? I swear I saw that in the coinbase of another... maybe I'm wrong and that really is the p2pool identifier. Guess I'll find out when I run my code and do some checks.