SHA256d IC design question - page 2.

alh

legendary

Activity: 1846

Merit: 1052

Quote from: ?? on ??

Quote from: the_electronrancher on January 08, 2018, 03:38:10 PM

Are you by any chance a marketing guy?

No.

Ph.D in engineering. I worked in process R&D for Intel for over a decade before I escaped.

Given your experience and education, why would you start asking questions here? I am lost as to what you are seeking from a truly ransom collection of folks here.....

Entropy-uc

hero member

Activity: 756

Merit: 501

Quote from: alh on January 08, 2018, 06:26:38 PM

I am not a semiconductor guy, but just my discussions with folks that are suggest that the "design rules" and associated tool chains change on a regular basis as the node size shrinks. What that means to me is that the "rules and tools" for a 40nm process don't work for a 28nm process which don't work at 16nm. I think that means that your idea of developing with "cheap" process and then shrinking down won't work since the Fab for 16nm can't use "masks" from a 40nm process. Voltages are all wrong, leakage current and a whole host of things that don't manifest at 40nm become hugely important at 16nm. I expect the testing and packaging also changes.

I know that very well.

The arrangement of transistor gates required to implement a double SHA-256 hash would not change. The implement would change for the target process node and Fab.

By demonstrating that your transistor level design is correct, the risk is dramatically reduced. It's not zero because, as you say, the implementation at each node would require a unique design layout.

Would it really move the need on the costs to implement a bitcoin hash chip? I can't say for certain. I can tell you that a surface discussion with somebody exposed to semi design won't give you a valid answer because they are thinking in terms of tool chains and standard cells that decouple you from the transistor level by several layers. It simply isn't industry standard practice. But results delivered by Bitfury make it clear it's the only way to be competitive in the crypto mining space.

Moderator's note: This post was edited by frodocooper to remove a nested quote.

alh

legendary

Activity: 1846

Merit: 1052

Quote from: Entropy-uc on January 07, 2018, 03:34:58 PM

That's the whole point of doing the transistor design as open hardware. You eliminate the biggest barrier to entry by putting the transistor layout into the public domain. There still would only be a handful of folks that would go to masks at 10 nm or lower nodes, but they would be forced to keep pricing competitive because there are dozens of entities capable of entering the market.

I am sure you could find faculty that would find this a worthwhile project, and you could easily fund a few spins at an 8 inch 64 nm fab for under $1M. That's 50 BTC. I paid that much as a bounty to fix my FPGA supplier's garbage code back in 2012!

I am not a semiconductor guy, but just my discussions with folks that are suggest that the "design rules" and associated tool chains change on a regular basis as the node size shrinks. What that means to me is that the "rules and tools" for a 40nm process don't work for a 28nm process which don't work at 16nm. I think that means that your idea of developing with "cheap" process and then shrinking down won't work since the Fab for 16nm can't use "masks" from a 40nm process. Voltages are all wrong, leakage current and a whole host of things that don't manifest at 40nm become hugely important at 16nm. I expect the testing and packaging also changes.

the_electronrancher

jr. member

Activity: 112

Merit: 4

Are you by any chance a marketing guy?

Entropy-uc

hero member

Activity: 756

Merit: 501

Quote from: the_electronrancher on January 07, 2018, 06:34:29 PM

Well, if you still have 50BTC you want to throw at it I have a couple of layout guys who will moonlight doing it. Do the first tapeout at MOSIS, then buy a mask set once it's verified. At that point, if you want to open source the GDS you're free to do so.

It would take a lot more than that. Form a 501(c), build a project and test plan and publish a budget. Identify a qualified team that's committed to moving forward if the needed resources are available, and the key milestones where funds are needed.

With that it really wouldn't be hard to raise the funds. Whether it's via angel investors for a start up, or a kickstarter approach with a public domain solution as the end point would be up to you.

the_electronrancher

jr. member

Activity: 112

Merit: 4

Well, if you still have 50BTC you want to throw at it I have a couple of layout guys who will moonlight doing it. Do the first tapeout at MOSIS, then buy a mask set once it's verified. At that point, if you want to open source the GDS you're free to do so.

Entropy-uc

hero member

Activity: 756

Merit: 501

Quote from: the_electronrancher on January 07, 2018, 02:58:07 PM

Your idea about starting at a larger node is a good one, you would certainly want to debug on a cheap process.

Quote from: Entropy-uc on January 07, 2018, 01:48:32 PM

An open sourced transistor layout for SHA-256 would seriously break open the whole competitive oligopoly that exists now. There are plenty of folks with double digits millions from bitcoin at this point that would see the benefit, so fund raising should be feasible.

This I think is the tough part. Bitcoin has changed from a cool open-source environment to ultra-greed mode. Those who have the ability to do this design certainly aren't going to want to do it for free and see some other Chinese or Russian shop take the design, kill them on manufacturing cost so the original project creators get pushed out of business, and then the takers become the next Bitmain on the originator's backs.

That's the whole point of doing the transistor design as open hardware. You eliminate the biggest barrier to entry by putting the transistor layout into the public domain. There still would only be a handful of folks that would go to masks at 10 nm or lower nodes, but they would be forced to keep pricing competitive because there are dozens of entities capable of entering the market.

I am sure you could find faculty that would find this a worthwhile project, and you could easily fund a few spins at an 8 inch 64 nm fab for under $1M. That's 50 BTC. I paid that much as a bounty to fix my FPGA supplier's garbage code back in 2012!

the_electronrancher

jr. member

Activity: 112

Merit: 4

Your idea about starting at a larger node is a good one, you would certainly want to debug on a cheap process.

Quote from: Entropy-uc on January 07, 2018, 01:48:32 PM

An open sourced transistor layout for SHA-256 would seriously break open the whole competitive oligopoly that exists now. There are plenty of folks with double digits millions from bitcoin at this point that would see the benefit, so fund raising should be feasible.

This I think is the tough part. Bitcoin has changed from a cool open-source environment to ultra-greed mode. Those who have the ability to do this design certainly aren't going to want to do it for free and see some other Chinese or Russian shop take the design, kill them on manufacturing cost so the original project creators get pushed out of business, and then the takers become the next Bitmain on the originator's backs.

Entropy-uc

hero member

Activity: 756

Merit: 501

Quote from: investorpgroovy on January 07, 2018, 02:28:02 AM

I would never settle for just borderline aspergers on my engineering teams when I can hire full on aspies instead.

Actually you want engineers with a balance if it's any sort of a team. With full on cases they will only be effective with a strong manager who can command respect on a technical level. There's an amusing article on medium from a few months back talking about an example of this; it was something like 'We fired out best programmer and it was the smartest thing I ever did'.

Semi design isn't my area of expertise. But as an outsider I don't understand why it wouldn't be feasible to have the transistor level layout be done in a platform independent way. The would then allow for design debug to be completed at a low cost node for less than $1 M, then you could focus on building at the expensive node with confidence there won't be a catastrophe. If that approach is feasible I don't really see why the whole thing couldn't be done in an open source fashion. An open sourced transistor layout for SHA-256 would seriously break open the whole competitive oligopoly that exists now. There are plenty of folks with double digits millions from bitcoin at this point that would see the benefit, so fund raising should be feasible.

I don't think there's much promise in pursuing other algorithms. Basically ether's algo is the only one without existing silicon and a chance to survive long term. I guarantee you there are people working on it already.

the_electronrancher

jr. member

Activity: 112

Merit: 4

Hashes per cycle means number of hash cores. Each core is an unrolled sha engine, so you continuously feed data into the front end, and finished hashes come out the back end and you check result to see what difficulty result a particular nonce generated.

There is delay in filling the engine, but once it's full they all give one hash per clock as each clock starts a new hash on the front end, and spits out a finished one on the back end.

So you naturally want to stuff as many copies of the engine in as your little power supply lines can handle.

investorpgroovy

newbie

Activity: 58

Merit: 0

I would never settle for just borderline aspergers on my engineering teams when I can hire full on aspies instead.

My main objective was to try and figure out if there was a way to get something to market quickly enough to challenge the incumbent players, the thing that bothers me specifically with bitmain is that they unfairly mine, driving up difficulty then release the parts into the market..

I started out in dram so I know once you get to the point that you need to do transistor level layout its a lot harder to jump into a new market essentially from scratch...and its become clear to me that there is no shortcut in terms of buying the IP from a defunct company as everything has advanced so much since anyone with a reasonably fast chip has been in the market... a better focus would be a "breaking" a different algo

that being said Its beyond my scope of knowledge but I wonder if there is a more efficient way to go about hashing fundamentally.

Bitmain BM1382 calculates 63 hashes per clock cycle (Hz) and BM1384 calculates 55 hashes per clock cycle.
BitFury's BF756C55 is claimed to have 756 cores for about 11.6 hashes per clock cycle.

NotFuzzyWarm

legendary

Activity: 3822

Merit: 2703

Evil beware: We have waffles!

It is mainly all about efficient layout of the signal paths between the cores and coms. Like the Cray super computers proved decades ago, using very short and direct pathways with minimal reliance on multiple layers has a very dramatic effect on speed and power consumption. Standard Foundry IP blocks only care about functions and not optimum I/O speed between the blocks.

the_electronrancher

jr. member

Activity: 112

Merit: 4

Borderline aspergers, lol.

I'd like to learn a little more about this transistor level implementation, I'm having a hard time picturing what could reasonably be exploded or minimized in the hash core. Xor? It's just flops and wiring otherwise, I would be surprised if the flop was exploded, but maybe - if you have any links to check out, it would be an interesting read.

Entropy-uc

hero member

Activity: 756

Merit: 501

Quote from: investorpgroovy on January 04, 2018, 10:47:53 PM

so I had assumed that all these guys were using standard IP libraries.. If what your saying is true, then its obvious why no one has picked up the IP from a defunct firm and ran with it.

Based on your comment I found a few details... seems like you have a good point

The BFL at 28nm was I guess 400GH/s at .27J/Gh or 1600GHS at .76J/GH ..but the layout size was massive compared to bitmain/bitfury, its seems like in addition to using standard libraries they had a significantly different design methodology

The bitfury at 28nm was at around .2 J/GH and supposedly the 16nm is .1J/GH..
The bitmain 1385 is listed at .18 J/G in 16nm at the slowest speed (21 ghs)

Interestingly Global Foundries (formerly the AMD fab) fabbed the BFL device

I think Jensen (CEO) of Nvidia already has plans to build specialized mining "GPUS" for ether.

So far the public intentions has been to offer mining gpus that don't have video outputs. The sole purpose is to prevent miners from dumping their gpu gear onto the market used when the inevitable crash comes and they can't mine profitably.

Global Foundries operates on a standard contract fab model so it's not really surprising that they built the BFL devices.

I don't think the transistor level design requirement is that big of a barrier. Bitfury did it by himself on a kitchen table over the course of a year. The problem is you won't find a design house willing to work that way. They have their tool sets and their work flows and they aren't going to diverge from it. So you will need to buy your own set of design tools and find a team of borderline Asperger's cases to do the transistor design.

Somebody should really fund a Professor to do the design work under an open hardware license. One the transistor design for SHA256 is done you just have to bring that into the fab's design tools and optimize for placement. Conductor losses are becoming dominant at these process nodes so that is where the biggest optimizations will be found.

investorpgroovy

newbie

Activity: 58

Merit: 0

so I had assumed that all these guys were using standard IP libraries.. If what your saying is true, then its obvious why no one has picked up the IP from a defunct firm and ran with it.

Based on your comment I found a few details... seems like you have a good point

The BFL at 28nm was I guess 400GH/s at .27J/Gh or 1600GHS at .76J/GH ..but the layout size was massive compared to bitmain/bitfury, its seems like in addition to using standard libraries they had a significantly different design methodology

The bitfury at 28nm was at around .2 J/GH and supposedly the 16nm is .1J/GH..
The bitmain 1385 is listed at .18 J/G in 16nm at the slowest speed (21 ghs)

Interestingly Global Foundries (formerly the AMD fab) fabbed the BFL device

I think Jensen (CEO) of Nvidia already has plans to build specialized mining "GPUS" for ether.

Entropy-uc

hero member

Activity: 756

Merit: 501

I believe the Hashfast design ended up in the hands of their silicon integrator, which did the design work in the first place. Similarly, Terrahash went bankrupt, so their design IP ended up in play and is probably held by someone who assigns it a modest value. BFL - I don't know what became of them at all, but they had a design.

The catch is that all of these designs were laid out using VHDL to standard cell libraries.

Bitfury clearly demonstrated that laying out at the transistor level gave massive advantage for power efficiency. His 64 nm chip performed better than the 28 nm generation. It was also buggy as hell.

I don't think it's feasible to deliver a power competitive design with standard cells. You will need to start with transistor level design of an unrolled hashing core. From there, it's likely there are power optimizations that are possible. The design will likely need to be optimized thermally as well, to limit hot spots.

Delivering a working SHA256 hash core isn't all that hard. Being competitive from a power efficiency standpoint will be difficult. I doubt it's practical to expect you will be within 20% of Bitmain on a given node until your 3rd or 4th generation.

Good luck, but I think you will find your money would be better invested convincing a major player like AMD or NVIDIA to develop a solution.

the_electronrancher

jr. member

Activity: 112

Merit: 4

I didn't say simple, I said there are not many unique ways to make an optimized sha core to give the ideal performance of one hash per clock once pipelined. In my opinion, only one.

You're probably going to want to license the pll from the foundry, but the logic would be built from gates. Some company built a hashcore they licensed out back in the 0.13 days, not sure of the name but it doesn't seem worth the money.

Pm me the part number of that asic project that you mentioned, I'd like to take a look

investorpgroovy

newbie

Activity: 58

Merit: 0

I mean I am just generally exploring to try and and find the lowest cost.shortest time to market.. not a huge fan of hiring someone to reverse engineer something if the other option is just to start with Icarus..

investorpgroovy

newbie

Activity: 58

Merit: 0

Sure everyone uses FPGAs for design..

to clarify a bit more. the issue I would worry about with going from FPGA is I dont know how it was designed .If we started with icarus and for example they designed it using the "free " IP that xilinix offers you, you start adding on to the development process time... not to mention timing issues and what not.

The last project where I acquired a FPGA design with the plan to convert quickly to asic ended up being a nightmare, went a year beyond schedule and had crazy licensing fees (it was a complex design with 4 arm cores, 3 levels of on die cache and so on ) .. it was successful in the end.. but it wasn't easy,

here a guide to the type of issues I am talking about .... http://www.onsemi.com/pub/Collateral/HBD872-D.PDF

so I am not a designer myself and I suspect these designs are very simple... so let me ask, are you basicaly saying these chips are so simple I don't need to worry about CPU core/memory timing type issues and 3rd party licensing ?

the_electronrancher

jr. member

Activity: 112

Merit: 4

If you have deep pockets, you can get chipworks to reverse any chip you want. Start with a larger geometry to save yourself some bucks.

But adapting verilog from fpga to asic is not difficult. I can assure you that many digital designs are prototyped in fpga and then synthesized into asic.

I would say that other than asicboost, the architecture changes have been small in the last few generations - it's an unrolled loop, pipelined to give one result per clock. There's really only one answer there, I would expect that everyone's design is very similar for this important part.

Topic: SHA256d IC design question - page 2. (Read 1046 times)