Hey everyone, well that's it; we completed and survived our second trial run. I can't believe it, nothing actually caught on fire, wow.
Not that we did not have a few hickups but overall it was reasonably stable and i think a successful run. Also the guys testing the system have been amazing. thank you all for trying it and especially to everyone who reported bugs.
Here are my observations on the system at this point. The overall code seems to work very well. The mining also works, which is great news. I did identify more serious issues in the performance of the network stack. At this point, I think that it is our Achilles heel. We actually had many nodes on the network during the last 24 hours, it was a popular run! Being a p2p it's impossible to know how many joined, but on the nodes that I put up I saw the connection pools fill up completely. And this is where the problems happened. I noticed that under heavy load, the network chokes.
the good news is that I know why. A design choice that seemed like a good idea 10 months ago is proving to be a bad idea under load. (launching a thread per gossip messages. they end up filling the workflow queue and on weaker machines like containers, the CPU chokes. then it starts skipping message frames and all hell breaks loose with it's peers since we start having timeouts and losing messages. this hurts the sync and some peers lose connection entirely).
the good news is that this is the only major issue that I spotted. the rest are minor bugs that we can fix in time. SO, at this moment in time, here is how I see the schedule happening. I will need to retire from public life starting tomorrow for 3 full days to refactor the network stack and eliminate these issues. so you might not hear much from us because I need to focus to concentrate otherwise I wont get anything done. It might be silent, but that's because we are working hard to improve the code.
Then, we will then launch a third and last trial run of 24h next Wednesday evening to test the new stack on the network load. then, depending on the results, we will reevaluate and decide if we are ready to move on for a full week to test for better longevity.
voila. before I close and disappear for 3 days, I want to add the following:
A VERY, VERY sincere thank you to everyone who trusted us, installed and tried out our node. Its really amazing that you were part of this and we REALLY appreciate it. And more than anything, we appreciate the trust that you extended us; and we want to live up to it. so yes, thanks
see you soon.