You're definitely right, there is another ongoing issue. That is that pushpool gets overloaded periodically. (a few seconds every few minutes.) I've expanded file descriptors and every single restriction I can think of to work around this, but have been unsuccessful.
So, unable to expand pushpool I ended up spinning up two pushpools and load balance them using nginx. I made that change ~6 months ago. Now we're back to one pushpool or the other getting overloaded periodically, and with the load balancing what happens is the overloaded pushpool gets removed from the balancing and you get sent to the pushpool that is still answering. If you're making a request to submit work, and that work came from the other pushpool, then the current pushpool doesn't recognize it and it gets flagged as invalid.
Oddly enough, pushpool has memcache functionality and both pushpools are pointed at the same memcache. I thought initially this was so that you could run a bunch of them and have them share the work between them, but clearly that is not the case. I'm not really sure what pushpool is using memcache for.
As you digest all this you're probably wondering why then can't I just add a third pushpool to the balancing. The problem is that in order to make sure that you always get sent to the same backend pushpool (because of the issue where your work is invalid if you don't) I had to configure the balancing to be by IP address. And, naturally, since we're behind a DDoS service, 80% of our traffic comes from... the same IP address. Ugh. So even with the two pushpools, one takes like 80% of the traffic and I have no way to split it out beyond that. I need like 3 DDoS services with each one running a pushpool behind 'em.
Firstly things do seem to be running much smoother this morning.
Thank you for your explanation above. I have to admit I had to read it twice; but, your explanation is very clear. If pushpool keeps a running log of what work it has issued and that log is not being shared in the memcache where is it? Do both running pushpools share a common database where information about each workers contributing shares ect. are maintained? And is the log of work issued also apart of that database? If it was might this be a database issue?
One other thing if the memcache is not for sharing of information of what work has been issued should both pushpools be pointing at the same cache or should they have seperate memory blocks?
Forgive my ignorance here because I really don't know. I've just found in my line of work it's good to bounce ideas off each other and sometimes even a wrong idea triggers a right solution.
Edit: I don't know if this really means much; but, I notice something today when I was monitoring the situation. When I went to the My Stats tab it seemed to hang up. While it was hanging I checked one of my rigs and the unknown work rejects we're happening. I had been running at below 0.5 percent stales up to that point (suddenly I was at 20 percent; although my miner calls them unknown work). The My Stats tab has to access the database to display. Interesting coincidence?