GPU are massively parallel and are extremely good at solving computation that don't require moving large amounts of information or making large numbers of decisions. CPUs are somewhat parallel and are extremely good at solving computations that require moving large amount of information and making large numbers of decisions. Hashing, as it happens, doesn't require moving large amounts of information and doesn't require making large number of decisions. So GPUs win.
A GPU is like a CPU with 1,000 very limited cores.
Ahhh. That explains quite a bit. The loss then, would be incurred by the de-serialization algorithms. Like an ADC, it must introduce bottle necks, that require a processing speed, proportional to the volume of data being hashed. Likewise, emulating a CPU on GPU, would force the opposite issue, like a DAC, I imagine it would shift every bit it had to move into registers to address a virtual UART, implemented for a serial interface. I'm just guessing here so don't shoot me now
. I would assume these visualization apps, have virtual UARTS for parallel to serial interfacing and vice versa. Now (if I'm right so far), I have to wonder how differently FPGA's work and if they are more amenable to improvisation. I'm trying to see how a mobile device (or older computer) could have the best environment, to churn out hashes better than one with no particular hash processing optimizing code (virtual or otherwise), yet still be efficient at processing regular OS/application code, while addressing the system bus, dynamic memory allocation, handling interrupts, and so on.
If we talk in terms of say a virtual UART, dedicated to this optimization, are we still still throwing good machine cycles after bad? Keep in mind of course, that the code in service of this task, has the system memory at it's disposal (I think). Where as my basic 3D on board video, has a piddling 128 meg of DRAM, My whole system has a couple of Gig. With good tight, clean, code that dynamically pulls the required memory into service, don't I have more room for flexible parallel assignment like a processor that grows extra process threads or something? Even an expensive video card with a Gig of on board memory is limited to it's capacity. If you implement the GPU in memory (unless I'm mistaken) then each instance of it's parallel processing would not require a new parallel image of it in memory, but only an implementation of the internal clock and I/O addressing the virtual UART as quick as it's little legs will carry it. OK. So my CPU is only 3.7 gig, but my GPU is far worse at 1 point something. Surely I can squeeze a bit more processing power from a 3-4):1 ratio of processing speed, especially if I have almost 10 times the RAM to depose of in idle time.
You see what I'm saying? Even if the capacity of CPU is nothing like GPU in terms of hardware performance, I still may have much more CPU power and RAM for modeling (even perhaps somewhat more SMP) than could attained by CPU processing alone, if only we isolate the bottle neck and virtualise only the components/functionality that give the GPU it's advantage. A dynamic VGPU, might chew up available system resources and spit out a SMP implementation of scalable/adaptable clocks, I/Os and VUARTs that even granting that my piddly video card may be better in hardware design, my virtual video GPU, may be somewhat better given the much better internal CPU clock speed and the disposable memory available to it's registers on demand. Efficient is something the CPU may not be (for hashing), but in terms of being resource enriched, and having dynamically delegable functionality and for all round versatility, it seems like a resource undervalued. The challenge may be virtulising no more of the GPU architecture, than needed, to re-purpose a reasonably fast CPU to outperform a relatively slow GPU (or no GPU). I can afford the overheads, if putting a graphics card into my machine is not an option and the VGPU is basically a targeted optimization of GPU architecture that could interface with the CPU and perform better than the CPU trying to crunch hashes all on it's own.
My understanding of how a GPU performs hash calculations or draws polygons, by comparison to ordinary CPU processing, call upon an analogy (perhaps a less than satisfactory one) with the difference between bitmap (grid plotting and pixel rendering) vs. vector graphics. The latter calculating the absolute coordinates of points on a plane, whereas, the later uses minimal differences between frames of reference and optimal mathematical descriptions of lines, planes and the relationship between points (geometry), whereas bitmaps were only intended to describe pixels and their attributes one by one, as they were scanned to the display. OK, So a CPU can calculate vectors and do the same tricks, but a GPU (in my naive understanding), was designed to describe the contents of a given chunk video memory, by the geometrical relationships and differences between frames and render the result in the same way as the dumb electronics would accept (Ie, scan lines and fixed values for each pixel), but to calculate that, it only needs know the relatively small differences between each frame and send the information that would be relevant. Given the typical similarity between each frame, the difference could be subtracted and only the changes refreshed on each pass of the scan line.
As I understand, this preservation approach to describing only what has changed, is particularly favorable to description by vector relationships. reducing data to all so many lines, angles, points and the relationships between them (geometry). You can. as I understand it, describe a point, plane or polygon far more efficiently in terms of vectors. As typical video images contain lots of objects moving (or perspective of them shifting) incrementally, which while their perspective/position has changed their general consistency is preserved. The total share of pixels in each frame then, that must be adjusted, is usually a much smaller fraction (compared to redrawing the whole screen) and so their description in a multidimensional array, is simplified by what minimal pixels are to be re-rendered and the incremental degree of change, rather than of absolute whole values. In vectors terms this may be trivial, except for the colour encoding, shading and ray-tracing dimensions. As I understand it, modern cards have a fair bit of separate dedicated circuitry and chips to handle support for this separate dimension of contextual complexity. They really are marvelous and clever devices on close inspection. Not that CPUs and computers in general aren't, but this clever complementarity, so well devised to model and render, complex 3D fields of vision, beyond the resolution required by human eyesight, it's just a marvel to behold.
Comprehending the complexity of what is being drawn on the screen, gives some sense of magnitude, of the information processing that modern computing deals with, and the clever ways being used to compress and minimize the preprocessing of redundant information . Yet that is but a fraction of the processing power, of a modest desktop box. It's beguiling to think of the amount of information being processed by the whole machine and all those lines of code, with all the conditional tests being tried, ports being read, I/O accessed, buffers read and written to, while registers are being peeked, poked and mov'ed. People may look at their screen and not only take it all for granted, yet only assume that what they see up there, IS the program and IS about all that is happening.
Anyhow. I digress. Now I have to wonder about the future potential of field programmable gate arrays and how differently they work. The potential to have mobile devices with dedicated processors, optimized to be adaptable for any processes presently demanding greater resources (whether they be CPU like GPU like or otherwise), is an interesting line of inquiry; costly as it may be for R&D. I wonder if the FPGA could allow a device to be just as efficient at playing CPU as GPU depending on present needs and dynamically adjusting it's provisions to suit each process accordingly. I hear that some FPGA's are down to below $300 now. Also I wonder how the advent of the massively parallel processing of hyper-visors like KVM, might affect the potential of a VGPU implementation. The potential I'm investigating BTW, is not so much for individual/personal use, as for aggregating and for SMP over networks. I can almost imagine bitcoin being farmed (like existing mining pools), but with many, many CPUs, on participating mobile devices, all sharing the hashing work, of a cloud based VGPU server, deployed over a VPN and using a hyper-visor like KVM for load balancing. The question of viability, then has to account for the performance leverage, afforded to the whole system, given memory constraints may be handled by the data center serving the VGPU, while parallel processing of hash power is handled by the hyper-visor. Would CPU hashing still be more efficient in a massively parallel cloud of clients, than some kind of virtual appliance, emulating the essential features of a GPU? Interesting possibilities methinks. Anyhow thanks for the info. You can learn so much, just by asking stupid questions to clever people.