KDB Restart NOW
Getting a resource problem - all is OK so far but I can see it causing problems shortly if I don't give it a kick.
Mining is unaffected - just the web site will show lotsa zeros and '?' while restarting.
Mine on.
Unfortunately the pool dropped all miners and thus everyone failed over initially from 14:08 until 14:13
Not long after this the pool crashed - however in the crash it wrote a corrupt worker status record to the reload file and KDB couldn't handle the corrupt record and crashed also, until I found and removed the corrupt record from the reload file - which required stopping the pool while I did that.
The console log says all is OK, but I'll rerun the shifts on the backup server and make sure they match/correct any differences before any more payouts.Sorry about the long mining outage - but all OK again finally.
Unfortunately I've had to do a quick (resource) patch on KDB so I had to restart KDB again.
Mining is still (currently) all OK.
The web site will be back up once KDB has finished restarting.
Unfortunately the resource issue continued due to the long KDB reloads as it got further out of sync due ... to each reload failing and thus compounding the issue.
After shutting down a few things that mining can function without and yet again ckpool crashing, KDB finally was able to catch up and all finally OK at 17:05 UTC
One important point to note, however, is that no shares were lost in all this.
As per normal, a KDB restart does scan back to before it was up to when it stopped, then run forward using the reload files and in this case that was all OK other than the original corruption that did not lose any shares - all share sequence numbers were accounted for correctly.
Of course everyone was failed over to their backup pools for quite a while during this, but any shares submitted have been accounted for.
Sorry for the long delay fixing the problems, but finally all is OK again - and of course the web site is back up and all functioning OK again.
As I mentioned I'd do above in
red, I reran the shifts on the backup server from before to after the problem occurred.
I reprocessed, on the backup server, all shifts from after "a9wsh rem" to before "aa8bz rem"
Stats of the results on both servers were exactly same so all was OK, as the console said and I expected.