If by DAG you mean Directed Acyclic Graph, I fail to see how that solves the problem much less even addresses the problem -ck was referring to.
The problems with p2pool performance are that (a) miners have to switch work frequently, which causes switching losses (which are particularly bad for Antminers, due to bad firmware, but are also problematic for high-latency connections, or for p2pool nodes with slow CPUs), and (b) the amount of variance a small miner experiences is inversely proportional to the time in between each share. If you have each miner switch work once per share, these two issues are in opposition to each other, and you have to pick a share frequency that balances these two issues.
However, there is no need to have miners switch work once per share. Bitcoin's blocks and transactions need to be serialized in a canonical ordering in order to prevent double-spending or transaction conflicts, but with p2pool there's no equivalent need for each share to come one after another. All that a share needs to do is prove that work was done in a way that benefits other p2pool users, which means (a) accurately pays out to other p2pool users and (b) is working on the correct block height at any given time.
So instead of arranging the shares as a chain of single-parent single-child links, we can come up with more imaginative arrangements. My favorite is where the first share for a new block (the switcher) gets a revenue bonus, and that switcher refers to not one parent share but as many parent shares as you know of, and the size of the switcher's bonus is proportional to the number of parents it has. Once one or more switchers have been found, the other shares (the fillers) refer to a single switcher as the parent. If there are multiple competing switchers to build off of, then the miners will want to choose the switcher that has the most fillers already laid on top of it, as those fillers will give the next switcher the greatest revenue bonus on the next block. Thus, consensus quickly emerges around one switcher, and everyone goes happily on their way. Miners only have to switch work twice per block, and you can get a 1 second (or faster) share time instead of the 30 seconds we tolerate now.
Since the miners don't have to switch work as often, the pool software doesn't have to do as much work (and only needs to recompute rewards twice per block), which means that python's performance issues will become less important.
The GHOST idea is basically that you reward uncles fractionally in the share chain. This should also work, but you'd have to tweak it a bit versus e.g. ethereum's version of GHOST if you wanted to allow for a significantly lower work switching rate than share rate, and it would probably be difficult to get it to perform as well as the odd/even heigh scheme described above.