I just don't care what you think...you are not very good at it...toodles.
I have Ufasoft going on 10 CPU's (with the owners consent I might add, I let family and close friends pay me for computer maintenance in idle-time generated coins), all systems are Core 2 Duo / Quad class CPU's, averaging 35 to 50 MH/s (very optimistic) for that little cluster. If you extrapolate those numbers, that would mean you have it running on well over 400 computers, or even at the very least hundreds if you assume they are constantly mining...
While I do agree with you to some degree about Ufasoft being faster, with a little tweaking and changes of the C-code to take advantage of more optimizations, some of the bottlenecks in work passing and hashing could be removed from cgminer to reach beyond Ufasoft's hashing rate. On my own PC, my Core2 Quad will reach up to 16.1 with Ufasoft and then drop to the upper 14's. In cgminer, each core will reach up to 4.7 Mhash/s but will slowly level out to 3.9 due to the rate at which data is passed to the hashing algorithm. With that said, Ufasoft's hashing likes to fluctuate drastically while mining whereas cgminer's will flux initially with rather high numbers and then level out as the bottlenecks in some of the code is reached. This is because of its large dependence upon only SSE2 instructions which are not implemented so well on many (or most) processors. To demonstrate this, the C code utilizes xmmintrin.h which only allows for up to SSE2 instructions to be used regardless of the CPUs capabilities and is written only to use them. I don't know what Ufasoft uses, but cgminer can be much more than it is if rewritten to be. For example, once the C-code is compiled, over half of the computations are on double-words instead of putting them all into double-quads before-hand and computing up to 4 double's at the same time. Though it is the eventual result, the result comes too little too late and slows the entire process down.
I will say this for cgminer though, its efficiency is off the charts compared to other miners. But without efficiently implementing the efficiency, it comes at too high of a cost.
I will edit this post with the disassembled code I'm referencing once I return home to my own PC in 2 or 3 days.