You're right. Powering this type of ASIC is a real challenge. However, there are solutions available for CPUs and GPUs - typically, these are complex multi-phase systems with sophisticated feedback systems carefully tuned for the precise application. While you can buy the voltage controller ASIC off-the-shelf, you still need the inductors, switches, and compensation network, as well as a suitable circuit design.
Yup, and the smallest mistakes can provide all sorts of hilarity. Part of the reason I prefer using R(ds) for overload sensing, DCM is great for sensing power through the choke, but that RC circuit can be affected by noise, component layout, and doesn't tell you if the FETs are cutting through. But who wants to spend the money for an op-amp limiting circuit on both sides of the push/pull? Not to mention the fun that happens when the voltage drop from the left side of the chip to the right side of the chip (500 amps, .6 volts...) causes problems.
The off-shelf stuff is great for testing, but very expensive to use for mass production. And in Alpha's case I wonder how stable it is when half the engines are running light power pulls while farting with the memory and the other half are pulling full current for the math engines. At random across the die of course, this is a tougher problem. So they ran everything at full current and now the chips overloads the power supplies. Yep.
The latest spins of the board appear to use Altera monolithic DC-DC converters - an impressive ASIC with integrated switches and integrated inductor! Again, suggestive of the fact that there is no will or capability to build a DC-DC solution in house. The Altera converters are even more expensive - $19 each in 1k quanitites - with 1 converter needed per viper ASIC. The other issue is that they seem desperate to keep the cost down, they've said that the viper ASIC needs 15A @ 1.2 V - so it's rather optimistic to use a monolithic DC-DC converter rated at a maximum current rating of 15 A.
Yeah. There's a lot of weird pressures and unbridled optimism in some of these mining firms. This is complex shit, and happy pony magic doesn't work well when the 600amp IGBT goes foom. Building this kind of stuff right takes months/years, not "wave wand and it works in a week".
Ah well. They fucked up.