I had similar estimates in a different thread and did some further digging to get a more solid understanding.
Bottom line is, if the BFLS is not throttling, you will get some long term hash rate of (system clock - 14) MH/s, that is a device with 864MHz firmware will deliver 850MH/s at most. The loss is taken for the communication to/from device. Converting it down based on a nonce time of ~5.2s, the idle time is around 80ms per cycle. That matches the real numbers for cgminer operation, that are:
- baudrate of 9600 => ~1ms (937.5 us exactly) to read/write a single character
- submitting work: write ">>>>>>>>[32 bytes midstate][last 12 bytes of block header]>>>>>>>>" => 56ms
- polling for shares: after work submission, first wait 4.5 seconds, then poll every 10ms by writing "ZFX" => 8ms average delay
- reading shares: read "NONCE-FOUND:1234ABCD,2468EFAB,1111BBBB" => (11 + #nonces * 9) ms => 18.7ms average
One could maybe implement some pre-fetching of work and caching of nonces functionality to add the potential 12MH/s - but at the same time maybe the FPGAs need this small idle time to cool down a little bit.
Yes, I also wouldn't worry too much about tweaking this inefficiency out. If a non-throttling 816 unit throttles with 832 firmware (a 2% increase), then that same 816 unit would likely throttle if the 80ms idle time was eliminated.