FINAL UPDATE: The pool will close. All balances will be PAID once they are mature. I WILL release a solo miner.
If you change your mind, I'm happy to help a bit. I think you've taken some really nice steps forward with your architecture thus far and it'd be cool to see a new pool design succeed. (Particularly if you were, in the long run, interested in open sourcing parts of the protocol so that others could build upon it). Getting rid of the hacked up mix of binary gloop and stuff-over-http would be great.
A good place to start is here:
http://www.lognormal.com/blog/2012/09/27/linux-tcpip-tuning/http://pic.dhe.ibm.com/infocenter/wasinfo/v6r1/index.jsp?topic=%2Fcom.ibm.websphere.edge.doc%2Fedge%2Fcp%2Fadmingd45.htmA few are important - tuning time_wait will reduce your susceptibility to grade-school-level TCP attacks such as those that were happening the last few days. But don't set it to 1 second like those guys did - 10 seconds should be adequate.
echo "1024 61000" > /proc/sys/net/ipv4/ip_local_port_range
echo "10" > /proc/sys/net/ipv4/tcp_fin_timeout
echo 32768 > /proc/sys/fs/file-max
syncookies should already be enabled, but if they're not:
echo 1 > /proc/sys/net/ipv4/tcp_syncookies
and check this page's suggestion for using iptables to rate-limit inbound synfloods:
http://www.liquidcomm.net/news/tech_tips/linux_os/how-to-manage-a-ddos-or-dos-attempt-directed-at-your-linux-server.htmlIt's very possible that the crash you're seeing from ConnectSocketDirectly is related still to running out of file descriptors or something very similar. General server scalability tuning might make it disappear.
http://www.nateware.com/linux-network-tuning-for-2013.htmlhas a few more - particularly the per-user open file limits, etc. Don't bother with the congestion window and rmem/etc. stuff in there - your server isn't aiming for high tcp throughput on a single connection.
Doing it all via sysctl config in /etc/sysctl.conf is the most straightforward way to have your changes persist after a reboot.
You were right. This is the issue:
/usr/include/linux/posix_types.h:#define __FD_SETSIZE 1024
Basically when a process opens more than 1024 sockets any select() call will cause a buffer overflow.
And this is exactly what ConnectSocketDirectly does in netbase.cpp on line 359:
int nRet = select(hSocket + 1, NULL, &fdset, NULL, &timeout);
The issue boils down to the bitcoin code using the default value of __FD_SETSIZE as defined in posix_types.h
But you can't blame the bitcoin devs, they never imagined to open more than 1024 sockets.
Tomorrow i might implement a fix for this and start the pool again. I still have all your ips in the firewall settings