I'm sure one of the reasons centralised pools are more popular is simply that the profit motive pushes them to a higher level of professionalism, and miners respond to that.
Otherwise known as the "I like the interface" position.
The main issue to me is the competitivity of the p2pool node hardware. I believe that increased usage could be easier to encourage if something can be done to slim down the resource overheads on p2pool, but it's difficult to see how the disk space and disk access performance could be improved (at least with the present sharechain design). The memory and CPU requirements could be improved within the current design, but the obvious solution would sacrifice the ease of platform portability that is currently enjoyed with using the python runtime. And obviously a C or C++ re-write would be a massive job, particularly if it were to embrace a range of platforms, a problem that's already solved.
I guess there is some respite in the form of improvements to bitcoind's memory usage with the -disablewallet configuration option coming soon/available now in a testing branch, but I can't help wondering that time passing and the progress it brings might be most decisive. Did Pieter Wuille's custom ECDSA re-implementation ever get merged? That sort of thing will be more important when the upper limit to the block size inevitably changes, in whatever way is eventually decided upon.
I'm thinking along the lines of being able to easily adapt low-cost computing devices into p2pool nodes. A change to the block size may force the disk space and performance requirements out of the feasibility zone, but every other requirement is unlikely to become so unwieldy. In just a few years, the typical low cost computing device in the Arduino/RasPi mold may be more than capable of all the performance characteristics that good operation of the p2pool/bitcoind setup requires (taking into account the balance of increasing transaction frequency, downward revision in processing requirements per transaction and the upward direction of the processing capabilities of the latest ARM designs). At the present time though, some form of high performance desktop machine just has to be used as a p2pool node, there is no real alternative if you want to make the most of your hashrate.
The reason for the low-cost device angle is obvious: all new mining devices either include or rely on a computing device with networking controller, ethernet or Wifi, and at least enough performance to run unoptimised builds of the mining software (until the developers can get a device to test and code with). To make the stage where unoptimised miner code/drivers with as much comfort margin as possible will become the norm, and so the manufacturers may begin to choose over-specified devices as the more prudent option (it could even begin to help drive down the unit cost of these sorts of low cost computing devices in itself). Once the mining code for a given device is optimised, the now vacated headroom could be leveraged for running p2pool. Can it be done now with the current version of python and it's memory management, on our current generation of mining ASICs? No is the answer. Can native p2pool (and bitcoind) builds be practicably produced for the processor architecture of every possible low-cost computer used as a mining controller? I expect no is the answer to that question too.
But there must be some opportunity to leverage the processing controllers that inevitably form a part of nearly every typical miner that rolls out of the manufacturers doors. Maybe then someone might be (more necessarily) motivated to work on a shiny-tastic web interface