I see that connection got lost to bonuspool initially at 06:41:35, but the switch to the backup poo happens almost 40s later with GPUs and BFLs idling in between. This happens continuously and causes total hash-rate to be 20% worse with bonuspool than e.g. ABCpool. I was assuming that the pool scheduling assures that there is always some work available, i.e. it is pre-fetched and ready when a device needs fresh work.
Is this a bug or a feature? If it's a bug, would a more verbose log help you to isolate the problem? (cgminer is latest GIT source, BTW).
No, you are correct. It needs to see that a pool has been unresponsive for a full minute before switching pools. The problem with resorting to backup work is that it can't tell that it has completely utterly run out of work until nothing is coming in at all for a demonstrable amount of time. If it keeps getting work from the backup pools, it can't really tell that the primary pool has failed. There is some crossover, but it has to detect that things have truly gone idle with nothing to do before saying "fuck this pool, it's dead".
edit: if a pool is that unreliable, you really have to consider whether it's worth mining there or not.
Would it be too aggressive towards pools to request more work than you actually work on? My layman approach would be: I know I have e.g. 4GH power and will need new work once a second. As soon as work is given to a device, immediately ask primary pool for more (instead of waiting for a device to do so) and if that one does not respond within 600ms, ask backup pool(s). The cost for such an pro-active approach would be that you ask for more work than you can handle, but you would not have to wait that long to see some pool is dead.
As for my cgminer version: you're just too fast/active
, it is from 24h ago and includes the pool scheduling improvements you added for 2.4.0. Can't catch up pulling with the speed of improvements you add.