Author

Topic: $0.98/Mh/s scrypt at 6.4W, on ancient 50nm process? plausible? (Read 2375 times)

newbie
Activity: 54
Merit: 0
I showed the paper to an ASIC designers and the reply was:

"Dual-rail domino logic is glitchy and fickle as f**k, and needs only the slightest excuse (e.g. you put a domino module next to another one) not to work. Clockless logic is interesting, but it's virtually impossible to debug it once you've got your silicon.  You could build the chips, and spend a decade debugging and still have no idea what went wrong"


That guy must be a moron. People have been designing domino logic for 20+ years. It's has some special requirements, but those are not hard to meet if you know what you are doing.

I know eldentyrell, and this paper is legit. I'm proud of him, this is an excellent result for which he should be proud. This kind of thing is real engineering; it's so nice to see this kind of result vs. the uninspired standard-cell crapola that everyone else churns out.

full member
Activity: 129
Merit: 100
I showed the paper to an ASIC designers and the reply was:

That is not too surprising.  Most self-identified "ASIC designers" can't handle domino design (it is a full-custom technique and not supported by "ASIC tools").  Haters gonna hate.

Intel, AMD, NVidia, and ATI all used domino logic exclusively for the critical path of all of their ~50nm-node chips.  Apple still uses it at 16nm, Intel has given conflicting information, and I don't know about the other two.
member
Activity: 85
Merit: 10
Miner and technician
I showed the paper to an ASIC designers and the reply was:

"Dual-rail domino logic is glitchy and fickle as f**k, and needs only the slightest excuse (e.g. you put a domino module next to another one) not to work. Clockless logic is interesting, but it's virtually impossible to debug it once you've got your silicon.  You could build the chips, and spend a decade debugging and still have no idea what went wrong"

member
Activity: 85
Merit: 10
Miner and technician
I don't design ASICs, but I've tinkered with FPGAs a bit.

The optimisations listed in the paper all appear plausible. The omission of DRAM refresh is clever; I seem to recall having heard of a similar trick in the distant past, but I may be mistaken.

Domino logic is a technique for ultra fast, reduced power, reduced die-size logic design. It has a lot of pitfalls and is difficult to use and time-consuming to design. However, if your design has one particular circuit which is slower than all the others, and is the limiting factor for your clock speed (the so-called, critical path), then its benefits may be worth the effort. Salsa makes extremely heavy use of addition, so it's not surprising that the addition is a critical path.

The key thing about ASIC design is that it's one thing to design a digital circuit in an FPGA and port it to an ASIC. However, if you have enough time and enough experts, you can produce in-depth analysis of the circuit, and hand-draw the most critical parts, or use a variety of other clever tricks.

It's the difference between expert careful design and rapid turn-around design that you see between Bitfury's 55 nm ASIC and KNC's 20 nm ASIC - which have almost identical cost and performance.
full member
Activity: 129
Merit: 100
The links you quoted are in relation to SHA256, not Scrypt.

The PDF link is the one that is quoted and it talks only about Scrypt.

I think you've mixed up the designer's past accomplishments (SHA256 FPGA) and current project (Scrypt ASIC).
sr. member
Activity: 542
Merit: 250
The links you quoted are in relation to SHA256, not Scrypt. I don't think anyone has attempted a Scrypt FPGA simply because the ASIC route is significantly faster and more efficient. The only coins people are using FPGA units on now are X11 and Blake256 algo based coins.
full member
Activity: 129
Merit: 100

Wondering if people can provide any insight on how credible this is.  The figures are pretty out there, but on the other hand the guy (or gal?) behind it still holds the record for fastest publicly-released mining bitstream, verified by everyone who ran it in the FPGA mining days.  Dunno.  Credible?  Claims to use on-chip/same-die DRAM with no refresh (!?) since the data doesn't need to stick around long:

Quote
At foundry-recommended voltage the silicon needed to generate 1Mh/s costs $0.98 and consumes 6.4W. The circuit can be undervolted to consume as little as 1.44J/Mh (W/Mh/s) at 252kh/s. There is plenty of headroom for overvolting. Bear in mind all of these numbers are for 50-70nm processes; on a 28nm process the power is cut roughly in half and the price per Mh/s drops by a factor of 4-8.  All performance numbers use real inputs and full taped-out layout parasitics. Quoted performance figures do not include any time-memory tradeoff (TMTO).

Full pdf is here.
Jump to: