Pages:
Author

Topic: SHA256d IC design question - page 2. (Read 957 times)

alh
legendary
Activity: 1843
Merit: 1050
January 09, 2018, 04:15:00 AM
#23
Are you by any chance a marketing guy?

No.

Ph.D in engineering.  I worked in process R&D for Intel for over a decade before I escaped.

Given your experience and education, why would you start asking questions here? I am lost as to what you are seeking from a truly ransom collection of folks here.....
hero member
Activity: 756
Merit: 501
January 09, 2018, 02:42:39 AM
#22
I am not a semiconductor guy, but just my discussions with folks that are suggest that the "design rules" and associated tool chains change on a regular basis as the node size shrinks. What that means to me is that the "rules and tools"  for a 40nm process don't work for a 28nm process which don't work at 16nm. I think that means that your idea of developing with "cheap" process and then shrinking down won't work since the Fab for 16nm can't use "masks" from a 40nm process. Voltages are all wrong, leakage current and a whole host of things that don't manifest at 40nm become hugely important at 16nm. I expect the testing and packaging also changes.

I know that very well.

The arrangement of transistor gates required to implement a double SHA-256 hash would not change.  The implement would change for the target process node and Fab.  

By demonstrating that your transistor level design is correct, the risk is dramatically reduced.  It's not zero because, as you say, the implementation at each node would require a unique design layout.

Would it really move the need on the costs to implement a bitcoin hash chip?  I can't say for certain.  I can tell you that a surface discussion with somebody exposed to semi design won't give you a valid answer because they are thinking in terms of tool chains and standard cells that decouple you from the transistor level by several layers.  It simply isn't industry standard practice.  But results delivered by Bitfury make it clear it's the only way to be competitive in the crypto mining space.



Moderator's note: This post was edited by frodocooper to remove a nested quote.
alh
legendary
Activity: 1843
Merit: 1050
January 08, 2018, 07:26:38 PM
#21

That's the whole point of doing the transistor design as open hardware.  You eliminate the biggest barrier to entry by putting the transistor layout into the public domain.  There still would only be a handful of folks that would go to masks at 10 nm or lower nodes, but they would be forced to keep pricing competitive because there are dozens of entities capable of entering the market.

I am sure you could find faculty that would find this a worthwhile project, and you could easily fund a few spins at an 8 inch 64 nm fab for under $1M.  That's 50 BTC.  I paid that much as a bounty to fix my FPGA supplier's garbage code back in 2012!

I am not a semiconductor guy, but just my discussions with folks that are suggest that the "design rules" and associated tool chains change on a regular basis as the node size shrinks. What that means to me is that the "rules and tools"  for a 40nm process don't work for a 28nm process which don't work at 16nm. I think that means that your idea of developing with "cheap" process and then shrinking down won't work since the Fab for 16nm can't use "masks" from a 40nm process. Voltages are all wrong, leakage current and a whole host of things that don't manifest at 40nm become hugely important at 16nm. I expect the testing and packaging also changes.

jr. member
Activity: 112
Merit: 4
January 08, 2018, 04:38:10 PM
#20
Are you by any chance a marketing guy?
hero member
Activity: 756
Merit: 501
January 08, 2018, 12:12:30 AM
#19
Well, if you still have 50BTC you want to throw at it I have a couple of layout guys who will moonlight doing it.  Do the first tapeout at MOSIS, then buy a mask set once it's verified.  At that point, if you want to open source the GDS you're free to do so.

It would take a lot more than that.  Form a 501(c), build a project and test plan and publish a budget.  Identify a qualified team that's committed to moving forward if the needed resources are available, and the key milestones where funds are needed.

With that it really wouldn't be hard to raise the funds.  Whether it's via angel investors for a start up, or a kickstarter approach with a public domain solution as the end point would be up to you.
jr. member
Activity: 112
Merit: 4
January 07, 2018, 07:34:29 PM
#18
Well, if you still have 50BTC you want to throw at it I have a couple of layout guys who will moonlight doing it.  Do the first tapeout at MOSIS, then buy a mask set once it's verified.  At that point, if you want to open source the GDS you're free to do so.
hero member
Activity: 756
Merit: 501
January 07, 2018, 04:34:58 PM
#17
Your idea about starting at a larger node is a good one, you would certainly want to debug on a cheap process.

 An open sourced transistor layout for SHA-256 would seriously break open the whole competitive oligopoly that exists now.  There are plenty of folks with double digits millions from bitcoin at this point that would see the benefit, so fund raising should be feasible.

This I think is the tough part.  Bitcoin has changed from a cool open-source environment to ultra-greed mode.  Those who have the ability to do this design certainly aren't going to want to do it for free and see some other Chinese or Russian shop take the design, kill them on manufacturing cost so the original project creators get pushed out of business, and then the takers become the next Bitmain on the originator's backs.



That's the whole point of doing the transistor design as open hardware.  You eliminate the biggest barrier to entry by putting the transistor layout into the public domain.  There still would only be a handful of folks that would go to masks at 10 nm or lower nodes, but they would be forced to keep pricing competitive because there are dozens of entities capable of entering the market.

I am sure you could find faculty that would find this a worthwhile project, and you could easily fund a few spins at an 8 inch 64 nm fab for under $1M.  That's 50 BTC.  I paid that much as a bounty to fix my FPGA supplier's garbage code back in 2012!
jr. member
Activity: 112
Merit: 4
January 07, 2018, 03:58:07 PM
#16
Your idea about starting at a larger node is a good one, you would certainly want to debug on a cheap process.

 An open sourced transistor layout for SHA-256 would seriously break open the whole competitive oligopoly that exists now.  There are plenty of folks with double digits millions from bitcoin at this point that would see the benefit, so fund raising should be feasible.

This I think is the tough part.  Bitcoin has changed from a cool open-source environment to ultra-greed mode.  Those who have the ability to do this design certainly aren't going to want to do it for free and see some other Chinese or Russian shop take the design, kill them on manufacturing cost so the original project creators get pushed out of business, and then the takers become the next Bitmain on the originator's backs.

hero member
Activity: 756
Merit: 501
January 07, 2018, 02:48:32 PM
#15
I would never settle for just borderline aspergers on my engineering teams when I can hire full on aspies instead.



Actually you want engineers with a balance if it's any sort of a team.  With full on cases they will only be effective with a strong manager who can command respect on a technical level.  There's an amusing article on medium from a few months back talking about an example of this; it was something like 'We fired out best programmer and it was the smartest thing I ever did'.

Semi design isn't my area of expertise.  But as an outsider I don't understand why it wouldn't be feasible to have the transistor level layout be done in a platform independent way.  The would then allow for design debug to be completed at a low cost node for less than $1 M, then you could focus on building at the expensive node with confidence there won't be a catastrophe.  If that approach is feasible I don't really see why the whole thing couldn't be done in an open source fashion.  An open sourced transistor layout for SHA-256 would seriously break open the whole competitive oligopoly that exists now.  There are plenty of folks with double digits millions from bitcoin at this point that would see the benefit, so fund raising should be feasible.

I don't think there's much promise in pursuing other algorithms.  Basically ether's algo is the only one without existing silicon and a chance to survive long term.  I guarantee you there are people working on it already.

 
jr. member
Activity: 112
Merit: 4
January 07, 2018, 12:10:39 PM
#14
Hashes per cycle means number of hash cores.  Each core is an unrolled sha engine, so you continuously feed data into the front end, and finished hashes come out the back end and you check result to see what difficulty result a particular nonce generated.

There is delay in filling the engine, but once it's full they all give one hash per clock as each clock starts a new hash on the front end, and spits out a finished one on the back end.

So you naturally want to stuff as many copies of the engine in as your little power supply lines can handle.  Smiley
newbie
Activity: 58
Merit: 0
January 07, 2018, 03:28:02 AM
#13
I would never settle for just borderline aspergers on my engineering teams when I can hire full on aspies instead.

My main objective was to try and figure out if there was a way to get something to market quickly enough to challenge the incumbent players, the thing that bothers me specifically with bitmain is that they unfairly mine, driving up difficulty then release the parts into the market..

I started out in dram so I know once you get to the point that you need to do transistor level layout its a lot harder to jump into a new market essentially from scratch...and its become clear to me that there is no shortcut in terms of buying the IP from a defunct company as everything has advanced so much since anyone with a reasonably fast chip has been in the market... a better focus would be a "breaking" a different algo


that being said Its beyond my scope of knowledge but I wonder if there is a more efficient way to go about hashing fundamentally.

Bitmain  BM1382 calculates 63 hashes per clock cycle (Hz) and BM1384 calculates 55 hashes per clock cycle.
BitFury's BF756C55 is claimed to have 756 cores for about 11.6 hashes per clock cycle.
legendary
Activity: 3612
Merit: 2506
Evil beware: We have waffles!
January 05, 2018, 04:03:14 PM
#12
It is mainly all about efficient layout of the signal paths between the cores and coms. Like the Cray super computers proved decades ago, using very short and direct pathways with minimal reliance on multiple layers has a very dramatic effect on speed and power consumption. Standard Foundry IP blocks only care about functions and not optimum I/O speed between the blocks.
jr. member
Activity: 112
Merit: 4
January 05, 2018, 02:47:44 PM
#11
Borderline aspergers, lol.

I'd like to learn a little more about this transistor level implementation, I'm having a hard time picturing what could reasonably be exploded or minimized in the hash core.  Xor?  It's just flops and wiring otherwise, I would be surprised if the flop was exploded, but maybe - if you have any links to check out, it would be an interesting read.
hero member
Activity: 756
Merit: 501
January 05, 2018, 01:57:37 AM
#10
so I had assumed that all these guys were using standard IP libraries.. If what your saying is true, then its obvious why no one has picked up the IP from a defunct firm and ran with it.

Based on your comment I found a few details... seems like you have a good point

The BFL at 28nm was I guess 400GH/s at .27J/Gh or 1600GHS at .76J/GH ..but the layout size was massive compared to bitmain/bitfury, its seems like in addition to using standard libraries they had a significantly different design methodology

The bitfury at 28nm was at around .2 J/GH and supposedly the 16nm is .1J/GH..
The bitmain 1385 is listed at .18 J/G in 16nm at the slowest speed (21 ghs)

Interestingly Global Foundries (formerly the AMD fab) fabbed the BFL device

I think Jensen (CEO) of Nvidia already has plans to build specialized mining "GPUS" for ether.

So far the public intentions has been to offer mining gpus that don't have video outputs.  The sole purpose is to prevent miners from dumping their gpu gear onto the market used when the inevitable crash comes and they can't mine profitably.

Global Foundries operates on a standard contract fab model so it's not really surprising that they built the BFL devices.

I don't think the transistor level design requirement is that big of a barrier.  Bitfury did it by himself on a kitchen table over the course of a year.  The problem is you won't find a design house willing to work that way.  They have their tool sets and their work flows and they aren't going to diverge from it.  So you will need to buy your own set of design tools and find a team of borderline Asperger's cases to do the transistor design.  

Somebody should really fund a Professor to do the design work under an open hardware license.  One the transistor design for SHA256 is done you just have to bring that into the fab's design tools and optimize for placement.  Conductor losses are becoming dominant at these process nodes so that is where the biggest optimizations will be found.
newbie
Activity: 58
Merit: 0
January 04, 2018, 11:47:53 PM
#9
so I had assumed that all these guys were using standard IP libraries.. If what your saying is true, then its obvious why no one has picked up the IP from a defunct firm and ran with it.

Based on your comment I found a few details... seems like you have a good point

The BFL at 28nm was I guess 400GH/s at .27J/Gh or 1600GHS at .76J/GH ..but the layout size was massive compared to bitmain/bitfury, its seems like in addition to using standard libraries they had a significantly different design methodology

The bitfury at 28nm was at around .2 J/GH and supposedly the 16nm is .1J/GH..
The bitmain 1385 is listed at .18 J/G in 16nm at the slowest speed (21 ghs)

Interestingly Global Foundries (formerly the AMD fab) fabbed the BFL device

I think Jensen (CEO) of Nvidia already has plans to build specialized mining "GPUS" for ether.
hero member
Activity: 756
Merit: 501
January 04, 2018, 03:21:20 PM
#8
I believe the Hashfast design ended up in the hands of their silicon integrator, which did the design work in the first place. Similarly, Terrahash went bankrupt, so their design IP ended up in play and is probably held by someone who assigns it a modest value.  BFL - I don't know what became of them at all, but they had a design.

The catch is that all of these designs were laid out using VHDL to standard cell libraries.

Bitfury clearly demonstrated that laying out at the transistor level gave massive advantage for power efficiency.  His 64 nm chip performed better than the 28 nm generation.  It was also buggy as hell.

I don't think it's feasible to deliver a power competitive design with standard cells.  You will need to start with transistor level design of an unrolled hashing core.  From there, it's likely there are power optimizations that are possible.  The design will likely need to be optimized thermally as well, to limit hot spots.

Delivering a working SHA256 hash core isn't all that hard.  Being competitive from a power efficiency standpoint will be difficult.  I doubt it's practical to expect you will be within 20% of Bitmain on a given node until your 3rd or 4th generation.

Good luck, but I think you will find your money would be better invested convincing a major player like AMD or NVIDIA to develop a solution.
jr. member
Activity: 112
Merit: 4
January 04, 2018, 10:20:37 AM
#7
I didn't say simple, I said there are not many unique ways to make an optimized sha core to give the ideal performance of one hash per clock once pipelined. In my opinion, only one.

You're probably going to want to license the pll from the foundry, but the logic would be built from gates.  Some company built a hashcore they licensed out back in the 0.13 days, not sure of the name but it doesn't seem worth the money.

Pm me the part number of that asic project that you mentioned, I'd like to take a look

newbie
Activity: 58
Merit: 0
January 04, 2018, 03:50:37 AM
#6
I mean I am just generally exploring to try and and find the lowest cost.shortest time to market.. not a huge fan of hiring someone to reverse engineer something if the other option is just to start with Icarus..
newbie
Activity: 58
Merit: 0
January 04, 2018, 03:44:35 AM
#5
Sure everyone uses FPGAs for design..

to clarify a bit more.  the issue I would worry about with going from FPGA is I dont know how it was designed .If  we started with icarus and for example they designed it using the "free " IP that xilinix offers you, you start adding on to the development process time... not to mention timing issues and what not.

The last project where I acquired a FPGA design with the plan to convert quickly to asic ended up being a nightmare,  went a year beyond schedule and had crazy licensing fees (it was a complex design with 4 arm cores, 3 levels of on die cache and so on ) .. it was successful in the end.. but it wasn't easy,

here a guide to the type of issues I am talking about .... http://www.onsemi.com/pub/Collateral/HBD872-D.PDF

so I am not a designer myself and I suspect these designs are very simple... so let me ask, are you basicaly saying these chips are so simple I don't need to worry about CPU core/memory timing type issues and 3rd party licensing ?
jr. member
Activity: 112
Merit: 4
January 03, 2018, 05:33:01 PM
#4
If you have deep pockets, you can get chipworks to reverse any chip you want.  Start with a larger geometry to save yourself some bucks.

But adapting verilog from fpga to asic is not difficult.  I can assure you that many digital designs are prototyped in fpga and then synthesized into asic.

I would say that other than asicboost, the architecture changes have been small in the last few generations - it's an unrolled loop, pipelined to give one result per clock.  There's really only one answer there, I would expect that everyone's design is very similar for this important part.



Pages:
Jump to: