Alright, guys, I'll lay this down.
It's been awhile since I've been actively mining, for a number of reasons (mainly an iPhone app I'm in the process of writing).
I would like to work on a custom miner project centered on mining Litecoins, but as I'm still 'new' to cryptocurrencies in the idea of how the P2P information channels work (and, let's face it, I'm not a cryptographer or cipher specialist, either), I would like some help. I'm a hardware engineer - hardware I understand, live, and breathe; software, not as much.
I will say this, and I'll bold it so after all the information y'all don't forget the potential gold in this: if anyone is interested in helping me with this project beyond simple advice and actually want to participate/contribute/work to develop this idea, I will make all the hardware plans an open-source project if you can help me write the software side of things.
Obviously, Scrypt is differentiated from the SHA-256 cipher due to it being memory-hard; it revolves around memory-intensive operations that are exceedingly difficult to implement in ASIC or FPGA grades that would be price-worthy to end-users. Because of that, CPUs generally are used due to them having the onboard 'L' caches as well as access to MUCH more memory.
Being a processor hardware designer primarily, I've wanted to mess with very-high-speed parallel arrays for awhile, but couldn't find chips that met the specs I was looking for (face it, we're all picky in some regard), until I stumbled upon the PS3's chipset.
As you might know, the Cell processor used in the PS3 is a menace. It's a 9-heterogenous-core beast that works at 3.2GHz and power levels that make your i7s look like a money hole. Data on this processor is EXTREMELY hard to find if you're not a 6-figure-income engineer or a firm that specializes in server design, because while its impetus was the PS3 for design, IBM is looking at it as a cash cow, and not through end-user sale. You can't buy these chips directly. You can't get a datasheet. You can't even get ballouts or pinouts for this thing.
I'm self-admittedly very butt-headed about my work, and won't take no for an answer when it comes to certain things, however, so I realized a simple solution. Reballed chips or chips salvo'd from defunct PS3s, of which eBay is no shortage of a supplier on. I've already reconstructed the package design from reballing stencils and photos, and I'm pulling the pinouts and schematic information off service manuals.
If you aren't at least intrigued by the implications of custom, open-source hardware involving this beast, you probably should be. Looking at the architecture and IEEE texts show that this processor is particularly suited to high-speed cryptography and deciphering. At consuming nominally under 20W of power, you could build an array of 20 of these and if bussed correctly with external hardware, you'd have access to 150-180 physical cores for under 400W.
My idea is to put the main processors and necessary RAM on a 2-to-a-DIMM module configuration, with multiple sockets on a mainboard. That way, if one CPU goes under, it doesn't jeopardize the whole unit, and that way, modules (and thus, processing horsepower) can be added-on to increase throughput.
There's a LOT more to be done than that, but I feel it's a good operating concept.
So, here are my questions, although this isn't all of them:
1. How is data communication achieved? We treat mining clients like black-boxes. That is, we don't worry about what is actually going on under the hood or how it is pulling data from the server and processing it to feed it back. How is this process achieved? The first step in building quite frankly, one bitchin' Bitcoin/Litecoin rig is figuring out how to get the data out of the clouds and into the hardware. It's doable by people because things like cgminer are around, although I've usually found that looking at another person's source isn't usually helpful if you're not experienced with the coding methodologies of that person. I find it to be like a signature. Can anyone shed light on how to accomplish this? I'm not afraid to admit I don't have much of a clue about what is actually going on here.
2. This may seem like a n00b question, but HOW is the process of 'mining' achieved? My cryptography/decryption exploits to date have centered around high-speed arrays designed to do nothing more than the good ol' brute-forcing. Obviously, with things such as SHA-256 and large-key-size algos, if you try to brute-force them, we'd all be here waiting on results until our bones turn to dust. How is a 'block' generated? What is the actual process of contributing one's part to the mining effort via a server?
I'm not coming into this question unprepared, I've done some looking into it, but the easy-to-find answers are clear as mud.
To operate these things at 100% of possible efficiency, at least part of it probably needs to be done in ASM (assembly) - that is, the actual algo processing needs to be in assembly so we are ABSOLUTELY sure we are getting the most optimized code for the device. Lot of use advanced microprocessors are if they're operating at less than their maximum efficiency.
Is there anyone who either can help me answer those questions or would want to turn this into a project? If I can get people to tackle this with me, we get it done faster, and we all win.