I do get the disconnects on my usb miners, so I don't think it is anything to do with hash rate. I generally only see the problem on the sha256 port (but don't use scrypt). I wonder what the specs of the server are? There are a lot of wallets running on the server, and a lot of stratum client connections to serve. That means a large number of active threads, and even with a lot of cores there is going to be a large run queue. Also, all those wallets are going to use a lot of RAM, so the system maybe swapping. I've only started learning about wallet RPC calls, but I have noticed that sometimes they can be slow to respond, and that's with only a handful of wallets running.
So it sounds like crackfoo may be right about the stratum processes needing to handle the situation more gracefully.
EDIT: And I suppose a lot of fast coins require more frequent getblocktemplate calls?
FURTHER EDIT: It's over 15 years since I was analysing performance issues on an 8 CPU server on a system that run 100's of processes. The system was only using 80-90% CPU, so it took quite a bit of convincing to get them to spend a few $million on upgrading to a 24 CPU server. The performance issues where solved on the new box. So it just goes to show, that on a complex interdependent system, processing capacity can be hit without seeing 100% CPU usage. They where big boxes in those days!
In the case of the RPC call being called in user space by the stratum process, this has to be served by a system thread in the kernel, to be served by the wallet user threads, for a response to be sent through another system thread, to be picked up again by the stratum user threads. On a system with a large run queue to be served, that may take some time.
zpool started off on one machine and is now running on 3. The stratum server has only the stratum and front end/db processes running on 8 core Xeon , 32gb ram & ssds. Coins are split up over 2 machines, each, 12 core Xeons, 64 gb, just not on SSD's simply because bitcoin is an ass of a blockchain and the SSD's weren't big enough to be comfortable especially in a RAID configuration. None of them swap out at all and carry a load of ~0.4, ~15-20% CPU usage and ~30% RAM usage. All housed within the same data center.
ulimits have been adjusted way out to 102400 and a couple of the stratums use ~50k of the limit.
They appear to handle it without issue... but I wouldn't be surprised if there is something with the threading...