I'm sorry if I sound a little cynical about all this, but Hashfast's end-of-2013 announcements seem a bit odd, to put it mildly.
Firstly, they claim chip performance of 'up to' 664GH/sec. Real engineers don't do 'up to', they quote maximum and minimum and specify under
what conditions each is valid.
Secondly, they say this performance test was " conducted by running a single GN die directly from bechtop power supplies, as opposed to powering it through the module". I'm assuming by die they mean one of the 4 functional blocks. Why not use the actual system power supplies? Something wrong with them?
Thirdly, because "This approach allows us to obtain data about what the ASIC itself can do, without having to make subjective estimates regarding the efficiency of the power supply on the module. However, doing things this way also has it’s own set of disadvantages.For example, the reason we are “only” able to announce a top speed of 664 Ghash per chip is purely because that’s the point at which we ran out of power to put through the chip. " then that means their chip, with all four cores running will use 664GH x 0.67w/GH = 442 watts, all from a silicon ares of 664/2 (their figures of 2GH/mm2 of silicon), ie 332 mm2. This, frankly, is impossible. You would need a heatsink with a NEGATIVE thermal coefficient to keep the die junction temperature below 75 degrees C. As far as I'm aware, none exist.
Can't they afford a decent PSU?
Anyone else care to add their observations?
both your assumptions could be argued arent safe assumptions...
first, since hf were testing just one die in isolation (presumably with the others turned off) then they were specifically benchmarking one die and it should be treated more as academic interest than a marketing statistic. Its an interesting and exciting statistic but it ignores the reality of the power supply, the thermal characteristics of operating 4 dies concurrently in the same package, and presumably also avoiding the thermal limits of the package & cooling system that would be a different scenario with all four dies turned on in one package.
Its a very exciting marketing statistic, but by testing one die in isolation and then presumably multiplying the result by four... does that still count as a legitimate benchmark for what the system is capable of? I say YES, provided all four dies, when run together can also achieve the same number... but conversely if that cant be achieved with all four dies, then testing one die in isolation and multiplying the result by four could be argued that its an artificial performance metric of academic interest only. Much the same as if you have a 4 core intel cpu and turn off 3 of the cores, that the remaining single core also will run much faster than when all 4 cores are turned on together.
Of course, they could redesign the substrates and package and just put one die in each package... and that would allow them a board re-design.. and then have 4 chips in a baby jet instead of one big one... (which follows the Bitmine argument that using multiple smaller chips may achieve a better outcome than using fewer bigger chips).
is it valid to measure the performance of just one die, and then multiply the result by 4 to give you what the total of 4 dies wouldve couldve done (.. in a perfect world where they had infinite power and cooling available to them)... when those 4 dies, in the same package, when run together, may not be able to achieve the same result? And, i should stress, if it CAN.. thatd be awesome and extremely impressive...!
then there's the issue of the two stats. the two data points. The 664GH performance claim, And the 0.67w/GH power consumption are two separate stats. Hashfast didnt link them together. You did (incorrectly assuming they were done using the same conditions). HF didnt claim that when they were running at 664GH they were ALSO only consuming 0.67w/gh.. though thatd be simply fantastic if true! Those two are independent stats and its safer to assume that the 0.67w/gh ultra low power achievement was probably achieved when running at a lower voltage setting than when when the benchmark was showing a die running at the 664gh (/4) equivalent performance achievement.
also, as they also identified... the tests were achieved when running off a bench power supply, without the inefficiencies of the dc/dc converters nor the limits of atx power supplies... so its an isolated measure of performance. Its testing the die on its own, but not testing the dies, in situ in the system as it will be supplied. we of course would love to know what the die will do on its own... but the more important statistic for us as customers is what the chip will do, when its on a production board using production power supplies and production cooling... and though its exciting to hear what it can do (with the wind behind it) in the lab, connected to a bench power supply, isolated from the other dies. thats a special case scenario that isnt necessarily representative of what will be achieved in the real world use case.
heck, if they want to quote even higher performance numbers (quite legitimately) they should be pouring liquid nitrogen down a tube directly onto the die... for the ultimate in cooling - the way that pc overclockers do it. But you have to bear in mind that Intel never makes claims as to the performance that the overclocking teams hit, as theyre doing it using extreme methods that arent available to regular customers.
Hi Aerobatic, good of you to take the time to reply. Like I said before, I have no interest in Hashfast other than that I'd like to see them fulfill their obligations to their paying (or rather paid up) customers, and I'm sure said customers would agree. So to me it's odd to trumpet what they 'may' be able to achieve rather than supplying tracking numbers for what they have dispatched, especially when those number don't add up - from the engineering point of view.
I'd like to address your points in more detail, but it's 9 pm in the UK just now and I'm just about to watch Sherlock Holmes (the new series) on BBC. Don't know if you get it where you live, but it's well worth a watch on the iPlayer. So I'll reply tomorrow, but one thing I can confirm to you is that the watts/GH figure remains constant no matter what clock speed is used. It's generated from the equation:
watts/Hash = Ng * Pg * F / (Nc * F) where:
Ng = number of gates switching per clock cycle - a design constant which depends upon the pipeline stage architecture
Pg = average switching power per gate per Mhz - a silicon process constant; about 0.6 nanowatts per Mhz for most LP 28nm processes
F = clock frequency (variable)
Nc = number of cores (pipelines) in the device; each core produces one result (hash) out of the pipeline every clock cycle, the pipeline latency is
ignored as it's irrelevant in practice. So hashes = NC * F.
Or to simplify, P = Ng*Pg/Nc. F cancels out, ie frequency is irrelevant. Static device power is also ignored here as it's relatively low in comparison.
If your pipeline design is very efficient then P goes down. Inefficient design = up.
Hope this helps.