Pages:
Author

Topic: FPGA development board "Lancelot" - accept bitsteam developer's orders. - page 19. (Read 101887 times)

hero member
Activity: 592
Merit: 501
We will stand and fight.
got 4 samples today.... Cheesy
newbie
Activity: 8
Merit: 0
subscribed
count me in for 4 or 5 boards  Cool
sr. member
Activity: 266
Merit: 251
Hmm so bitfury might have quite a bit of incentive in this FPGA vs ASIC discussion Smiley
https://bitcointalksearch.org/topic/m.915037
(weapon of choice? Cheesy)

https://bitcointalksearch.org/topic/m.925049

That's for ngzhang and all folks - I've run some comparison of FPGA vs ASIC Hardcopy...
Artix7 allows tricks like LX150 unlike Cyclone V though. So by following that thread,
you can understand why claims about 28nm being obsolete are questionable.
legendary
Activity: 4592
Merit: 1851
Linux since 1997 RedHat 4
Hmm so bitfury might have quite a bit of incentive in this FPGA vs ASIC discussion Smiley
https://bitcointalksearch.org/topic/m.915037
(weapon of choice? Cheesy)
sr. member
Activity: 448
Merit: 250
One interesting thing that I have researched - single-bit design. I.e. instead of carry chains you use D-flip flop for carry and D-flip flop for result. Then it would require 32 times less wires for W-expander. Allows constant-optimization. This would be smallest CORE, but with one IF - IF you are capable to design long digital delay lines (i.e. like SRL16 in spartan) within chip. I know that it is pretty doable. But nobody I contacted can work at that level, and basically it is unlikely what you will find in cell library from TSMC for example. This carefully designed thing can beat everything and rise calculations speed at silicon maximum. I doubt however that there's many _developers_ who would even understand what I wrote about here and zero who can do that not in theory but with more or less guaranteed result in hardware.

A carryless adder, 32 bits in + 32 bits in results in 64 bits out in a non-canonical "2 output bits for each output bit" representation?
The problem with this approach is, it's not compatible with the XOR operation, not even with rotate and shift operations.
So, yes, while you can build a large multiplier that way, converting the result to a canonical representation as the final step,
you cannot build SHA-256 that way. I have investigated it, and it's not possible.

If you have been talking about something else entirely, I apologize.

Not exactly. I mentioned case when you do adding in 32 clocks.... One bit at one clock edge. So one D-trigger holds output, and other D-trigger holds carry, which fed back to adder on next clock.

So you get ONE wire instead of 32 wires for round expander fully unrolled. Still design is pipelined.

But you need really long and compact shift registers without access to internal bits of course. These are required to do rotation operations (basically by delaying for 32 clocks all variables in calculation, but doing different delays for RORs). And then really long delays for W round (that would be 224-bit delay line and 256-bit delay line).

I've tried to experiment this with BRAMs - it is nice - when you have 32 rounds of round expander around single BRAM :-)
but actually static RAM is nowhere near efficiency and density of such shift registers implemented in silicon.

As this register would work only in dynamics, basically you need only capacitor to hold bit and circuit to charge next capacitor on clock pulse. It will not work at slow clocks then of course. And as far as I know it is extremely hard to implement such circuit in silicon (basically because I have spoken not with elite ASIC developers indeed).

If that approach would save 3-4 times transistor count compared to serie of flip-flops, the design then would shine :-)


Ah, I see what you mean.
Maybe a less radical approach, adding 4 bits per clock, would be more practical. A 4-bit adder fits inside one slice.
I'll think about it...

>one capacitor to hold bit

I the old days, you could buy such chips: Called CCD or charge-coupled device.
Steve Wozniak based the video memory of the Apple I on such a device. It not only stored the all the characters in the video buffer (1024 bytes IIRC), but generated the video signal as well, as its content rotated constantly. A new character would be inserted by breaking the loop for a moment.
sr. member
Activity: 266
Merit: 251
One interesting thing that I have researched - single-bit design. I.e. instead of carry chains you use D-flip flop for carry and D-flip flop for result. Then it would require 32 times less wires for W-expander. Allows constant-optimization. This would be smallest CORE, but with one IF - IF you are capable to design long digital delay lines (i.e. like SRL16 in spartan) within chip. I know that it is pretty doable. But nobody I contacted can work at that level, and basically it is unlikely what you will find in cell library from TSMC for example. This carefully designed thing can beat everything and rise calculations speed at silicon maximum. I doubt however that there's many _developers_ who would even understand what I wrote about here and zero who can do that not in theory but with more or less guaranteed result in hardware.

A carryless adder, 32 bits in + 32 bits in results in 64 bits out in a non-canonical "2 output bits for each output bit" representation?
The problem with this approach is, it's not compatible with the XOR operation, not even with rotate and shift operations.
So, yes, while you can build a large multiplier that way, converting the result to a canonical representation as the final step,
you cannot build SHA-256 that way. I have investigated it, and it's not possible.

If you have been talking about something else entirely, I apologize.

Not exactly. I mentioned case when you do adding in 32 clocks.... One bit at one clock edge. So one D-trigger holds output, and other D-trigger holds carry, which fed back to adder on next clock.

So you get ONE wire instead of 32 wires for round expander fully unrolled. Still design is pipelined.

But you need really long and compact shift registers without access to internal bits of course. These are required to do rotation operations (basically by delaying for 32 clocks all variables in calculation, but doing different delays for RORs). And then really long delays for W round (that would be 224-bit delay line and 256-bit delay line).

I've tried to experiment this with BRAMs - it is nice - when you have 32 rounds of round expander around single BRAM :-)
but actually static RAM is nowhere near efficiency and density of such shift registers implemented in silicon.

As this register would work only in dynamics, basically you need only capacitor to hold bit and circuit to charge next capacitor on clock pulse. It will not work at slow clocks then of course. And as far as I know it is extremely hard to implement such circuit in silicon (basically because I have spoken not with elite ASIC developers indeed).

If that approach would save 3-4 times transistor count compared to serie of flip-flops, the design then would shine :-)
sr. member
Activity: 448
Merit: 250
One interesting thing that I have researched - single-bit design. I.e. instead of carry chains you use D-flip flop for carry and D-flip flop for result. Then it would require 32 times less wires for W-expander. Allows constant-optimization. This would be smallest CORE, but with one IF - IF you are capable to design long digital delay lines (i.e. like SRL16 in spartan) within chip. I know that it is pretty doable. But nobody I contacted can work at that level, and basically it is unlikely what you will find in cell library from TSMC for example. This carefully designed thing can beat everything and rise calculations speed at silicon maximum. I doubt however that there's many _developers_ who would even understand what I wrote about here and zero who can do that not in theory but with more or less guaranteed result in hardware.

A carryless adder, 32 bits in + 32 bits in results in 64 bits out in a non-canonical "2 output bits for each output bit" representation?
The problem with this approach is, it's not compatible with the XOR operation, not even with rotate and shift operations.
So, yes, while you can build a large multiplier that way, converting the result to a canonical representation as the final step,
you cannot build SHA-256 that way. I have investigated it, and it's not possible.

If you have been talking about something else entirely, I apologize.

Example (3 bit inputs instead of 32 bit inputs):
Let's calculate (A+B) XOR (C+D)
A=5, B=7, C=7, D=7
A carry adder yields A+B=4 (3-bit result) and C+D=6 (3-bit result),  4 XOR 6 is 2.
A carryless adder yields A+B=10|01|10 (noncanonical binary) and C+D=10|10|10 (noncanonical binary)
Xoring that while still in noncanonical binary yields 00|11|00
Canonizing that yields 110 (canonical binary) = 6 (decimal)

In other words, the XOR operation is not compatible with a non-canonical binary representation.
sr. member
Activity: 448
Merit: 250
One interesting thing that I have researched - single-bit design. I.e. instead of carry chains you use D-flip flop for carry and D-flip flop for result. Then it would require 32 times less wires for W-expander. Allows constant-optimization. This would be smallest CORE, but with one IF - IF you are capable to design long digital delay lines (i.e. like SRL16 in spartan) within chip. I know that it is pretty doable. But nobody I contacted can work at that level, and basically it is unlikely what you will find in cell library from TSMC for example. This carefully designed thing can beat everything and rise calculations speed at silicon maximum. I doubt however that there's many _developers_ who would even understand what I wrote about here and zero who can do that not in theory but with more or less guaranteed result in hardware.

A carryless adder, 32 bits in + 32 bits in results in 64 bits out in a non-canonical "2 output bits for each output bit" representation?
The problem with this approach is, it's not compatible with the XOR operation, not even with rotate and shift operations.
So, yes, while you can build a large multiplier that way, converting the result to a canonical representation as the final step,
you cannot build SHA-256 that way. I have investigated it, and it's not possible.

If you have been talking about something else entirely, I apologize.
sr. member
Activity: 266
Merit: 251




we are trying to solve the power problems and heat dissipation by multiple ways, if they work, i guarantee will tell you( in private). because I admire you about your detailed introduction about your design, we are really doing the same thing, but i have no plan to share it (before).

Well, possibly we can meet and discuss it. Because it is unlikely that it would be EXACTLY the same, very unlikely. As the design I made - I know that it is not absolutely the best one that could be done, and there's room for improvement. However efforts required are not justified, especially with epic failure about powering that logics inside. Also all path of design evolution is even more interesting than design itself, as I have for example interesting approach for parallel design with W-expander around DSP48. I've aimed about 350-370 Mhz clock originally - so this IS DEFINITELY point of failure, and if I would relax clock and target it at about 300 Mhz - it _could_ be implemented more efficiently. Also there's interesting possibility to mix parallel computation and serial rolled computation in 1:3 etc. (that's of what I had before).

about the ASIC design, a 90nm ASIC can run a 32bit adder over 7GHz. but you need a "elite research group" instead of some " bad engineer group". but you see, in our design, a SHA-2 core is really small and simple, this architecture is relatively easy to do the optimize. the smaller, the better. i nearly for sure place 200+ 128-cycle hash cores is better than now 80+ of 64 cycle cores (maybe this is our next design).

one way here to resist 51% attack is to increase the total hash speed fast. now we need to find a way. i think a mass of small mining ASICs (in public' hands) is a good choice.

Yes, making BitCoin ASIC available from different suppliers is nice idea. But someone has to invest funds into it. And it seems that community has no interest to invest say 10% of owned BTC to finish this moment up... About "elite research group" - that's exactly what I mentioned about.... All you can get say for $500k to produce ASIC would be unlikely "elite"... I suppose that AMD, Nvidia, Intel, military consume resources of elite research groups at much higher rates, than single investor would afford. Then if when elite group would do ASIC at high costs concentrate efforts on backwards 90-nm ? It should be AT LEAST 45-nm then... As this would rock... And true ASIC of course, not things like structured asic or fpga hardcopies.

ABOUT ASIC - One interesting thing that I have researched - single-bit design. I.e. instead of carry chains you use D-flip flop for carry and D-flip flop for result. Then it would require 32 times less wires for W-expander. Allows constant-optimization. This would be smallest CORE, but with one IF - IF you are capable to design long digital delay lines (i.e. like SRL16 in spartan) within chip. I know that it is pretty doable. But nobody I contacted can work at that level, and basically it is unlikely what you will find in cell library from TSMC for example. This carefully designed thing can beat everything and rise calculations speed at silicon maximum. I doubt however that there's many _developers_ who would even understand what I wrote about here and zero who can do that not in theory but with more or less guaranteed result in hardware.

Keep in mind, you have to add 11 TH/s to get anywhere near a 51% attack, and at that point you would be mining ~3600 Bitcoin per day. If you are generating that much, it is actually in your best interest to *not* attack the network, and let someone else develop ASIC as the price increases because demand will remain the same but supply will slow down.

I know a number of people talk about what will happen when the reward halves, but what would happen if a large investor developed ASIC to control a significant stake of the Bitcoin network? Wouldn't it essentially be the same result if difficulty doubled as if the reward halved?

a better way is mining for them self at ordinary day, and do a accurate attack when some large transform processing.....  Cheesy

well. it is pretty doable (about 51% for bitcoins) following way (also point for FPGA at current period vs ASIC) -
 I have request for video transcoding at large scale - people inquiry whether I can beat with cost of these chips installations of servers/gpus for that. Typical need - tranform video file from formats:

(S)VCD (Super Video CD);
DVD, including encrypted DVD;
MPEG-1/2 (ES/PS/PES/VOB);
AVI file format;
MOV/MP4 format; Ogg/OGM files; Matroska.

codecs:

MPEG-1 (VCD) and MPEG-2 (SVCD/DVD/DVB) video;
MPEG-4 ASP including DivX and Xvid;
MPEG-4 AVC aka H.264;
DV video;
MJPEG, AVID, VCR2, ASV2;
FLI/FLC.

into:

MP4 H264 AAC

at bitrates:

'1080p' 4M (720p < height <= 1080p)
'720p' 2M (480p < height <= 720p)
'480p' 1M (height = 480p)
'480p-' 512k (360p < height < 480p)
'360p'  512k (height = 360p)
'360p-' 256k (240p < height < 360p)
'240p'  164k (height <= 240p)

that's for flash tube web sites...

So if farm can do this work - and process multiple petabytes of videos more efficiently than on own server hardware or like purchased work on clouds - then they will be definitely willing to invest more, as this is not only bitcoin-targetted then, and when ASIC comes in play - this farm would still be useful for other computations.

The problem is - that supporting all of that codes is ton of work. And also designing more or less universal board for computations is tough part as well. But it would open more financing for FPGAs - say on demand it transcode videos, and then in idle calculates bitcoin. If later it could be possible to run rendering there - it would be even more beneficial, however that's yet higher ton of work, and it is quite difficult to estimate feasibility of FPGA vs GPU for rendering.

If this is doable - such farm could be nice step into ASIC development for bitcoin world, still investments into it would be secured well and much less risky.


hero member
Activity: 592
Merit: 501
We will stand and fight.




we are trying to solve the power problems and heat dissipation by multiple ways, if they work, i guarantee will tell you( in private). because I admire you about your detailed introduction about your design, we are really doing the same thing, but i have no plan to share it (before).

about the ASIC design, a 90nm ASIC can run a 32bit adder over 7GHz. but you need a "elite research group" instead of some " bad engineer group". but you see, in our design, a SHA-2 core is really small and simple, this architecture is relatively easy to do the optimize. the smaller, the better. i nearly for sure place 200+ 128-cycle hash cores is better than now 80+ of 64 cycle cores (maybe this is our next design).

one way here to resist 51% attack is to increase the total hash speed fast. now we need to find a way. i think a mass of small mining ASICs (in public' hands) is a good choice.

Keep in mind, you have to add 11 TH/s to get anywhere near a 51% attack, and at that point you would be mining ~3600 Bitcoin per day. If you are generating that much, it is actually in your best interest to *not* attack the network, and let someone else develop ASIC as the price increases because demand will remain the same but supply will slow down.

I know a number of people talk about what will happen when the reward halves, but what would happen if a large investor developed ASIC to control a significant stake of the Bitcoin network? Wouldn't it essentially be the same result if difficulty doubled as if the reward halved?

a better way is mining for them self at ordinary day, and do a accurate attack when some large transform processing.....  Cheesy
donator
Activity: 1419
Merit: 1015
Keep in mind, you have to add 11 TH/s to get anywhere near a 51% attack, and at that point you would be mining ~3600 Bitcoin per day. If you are generating that much, it is actually in your best interest to *not* attack the network, and let someone else develop ASIC as the price increases because demand will remain the same but supply will slow down.

I know a number of people talk about what will happen when the reward halves, but what would happen if a large investor developed ASIC to control a significant stake of the Bitcoin network? Wouldn't it essentially be the same result if difficulty doubled as if the reward halved?
sr. member
Activity: 266
Merit: 251
So the real contest is time; 28nm vs ASICs.

no, i mean:

130nm ASICs will fuck 28 nm FPGAs to shit.  Cheesy

Haha, yes I know they will, but if the 28nms come out before the ASICs they will at least have a chance of entering the market Grin

no, i think 28nm FPGAs will never have chance.

too many things will happen in 2013 and 2014.

These things are what I'm excited to see. I need to get some sleep now!

About 28 nm FPGAs chances... I've counted approximately translation of FPGA into ASIC. for example if my design translated - it would get approximately 8.7 million transistors count. And it is comparable to Pentium II design, so what we have with Spartan-6 0.045 um (45 nm) is what you could get if you squeeze hard into 0.35 um ASIC.

But squeezing design so hard into ASIC would be difficult, as many errors will happen on the way. It is LIKELY that builders of ASICs would squeeze more or less kind of simple VHD design, which would give approx. 3 times worse performance, and then start gradually improve technology with about 3-4 month iteration with each try. I think ngzhang understands well, what design mistakes - when in simulators it works but in hardware does not means when it gets to ASIC production.

So back to issue about ASIC vs FPGA - I suppose that 45 nm equivalent FPGA of Spartan6 class like 0.18 um ASIC.
Then 28 nm FPGA Artix7 could be like 0.112 um ASIC (if just counted, but I suppose comparable to 90 nm more, because it has CARRY CHAIN IN EVERY SLICE, AND I HAVE ROUND DESIGN THAT USES THIS FACT, AND WHICH IS 20% smaller Spartan6 is really bad with their Slice X stuff).

Then interesting thing about FPGA prices. They will fall if volumes will be bulk. This is why I am insisting on making FPGA-based products better, with better pricing - to make it at least competitive against ASICs.

Also - costs for chip production for vendor like Xilinx or Altera is not that much than silicon costs.... So production Spartan6 or Virtex7 does not make much difference in raw material / work cost. If they would want - they could sell say 6.8 billion transistor chip for $60-70 and not for $1k-$2k for specific needs, still they would earn profits. And this is huge risk for ASIC builders. Such chip indeed would be very powerful and definitely would blow off low-end 90 nm ASIC solution. And this is what could happen - Xil or Altera will just lower prices for some specific application of their chip, to take share of this. But this will only happen of course if there will be more or less significant sales amount, say we get all-together to levels 10k chips per month.

So there's no "cheap and secure" entry into ASIC world. Those who go with 90 nm will still compete with FPGA. And it is just only about organization of FPGA sales and production, if FPGA devices vendors would have so high expenses, that they could not resist such ASIC.

The killing solution however would be to get 28-nm chip with SIMPLE design from first order. It is doable, I believe in about $4 - $6 mio. But I doubt that someone would invest it this day. At some day it will happen of course. I quoted multiple companies already about ASIC when did FPGA-based design, and typically 90nm with investments about $500k could blow off Spartans, but would be hard to compete against 28-nm.

So, please comment ? If this is just hobby for you and you won't like to stand head-to-head with upcoming ASIC or not ?

Why do you think that 28-nm would not compete ?

You probably have up to date worked the most on bitstream design as well. Interested to hear your point of view, where I made mistake ?






our design is still fixing some small bugs. i will talk with you about the design after it fully completed. at that time we will know if we can solve the problems that you have.

This is interesting. If you manage to get it working at clocks that TRCE says - it would be interesting indeed, so we could improve. Because I doubt that our designs could be similar, maybe we can produce even higher speeds by combining techniques used. If you have right equipment around - check power. I expect that you have higher speed of prototype PCB delivery, so maybe several designs should be tried to actually deliver necessary power into spartan. For me such experiments are quite difficult, as I have typically to wait 3 weeks before getting PCB of required quality to solder there Spartan.

ASIC design is much more complex than FPGA's. simple synthesis will not work.
why i said a 130nm ASIC can defeat a 28nm FPGA? because 32bit adder in 130nm ASIC can operate over 3GHz, a 3-input 32bit adder can easily running over 1GHz. i really doubt a 28nm FPGA will running a 3-input adder over 600MHz, maybe only 500MHz.
a 130nm ASIC is really cheap now, but we must find some professional team to do this work, their salaries and their company management cost will charge a lot. that's the thing stuck me. otherwise a small mining chip will cost only 1$/ea if you build 100K of them.
i mean, taking risks for a ASIC just for mining (and earn bitcoins for benefit) is unreasonable, but i think their are someone who want to push forward the bitcoin applications and resist a 51% attack from Bank of America will consider to pay the bill. if succeed, sell 100K of 1G speed small chips will multiply the total hashing speed by 10.
51% attackers didn't need ASIC, just buy 50 of your 110G rigs (cost only 5M $), and then bitcoin dead. after that, sell the second hand chips, can get 30% money back.

That's exactly is the point.... But when I started talking about ASICs - with guarantees developers given me quotes for 90nm in range of 500 Mhz .. 1 Ghz ...  That they will re-do design, re-order wafers in case of failure. And when I was talking about pushing limits to what Intel does with their chips - many said - sorry - we do not have right experience to do that. So the same thing stopped me going with ASIC. And also having current BitCoin size - seems almost nobody interested to seriously throw money in into such risky perspective.

Then - exactly - costs only 5M$ to get 51% of majority. For system with 40M$ market cap it is ok, and way better than banks. But if someone would like to invest say 1M$ to build big project using BitCoin - he face with the question - okay, my project will grow, and then will grow BitCoin cap. and then making 51% attack would be more feasible. Imagine if product, say like better, functional system like skype would emerge within BitCoin ... and its market cap goes to $2 billions...  would Microsoft pay that $5M to disrupt it ?

BitCoin is actually enabler for very interesting AND NEW p2p technology, that internet lacked from its beginning - solution of problem that currently solved by advertising online, that can be built and blow off many very big projects, doing them obsolete . But once BitCoin would start doing that - it would face real battle ..... As this basically would mean making whole business model of TOP-10 internet companies obsolete... That is actually much worse than single Bank of America... So that BF-110 was also estimation - how well BitCoin is prepared for that battle... Unfortunately not very well, a lot of work ahead. Most pity that it seems that these owners of $40M worth of Bitcoins don't really get how this thing important is, and why things like Scrypt-modified versions etc would not help much here, as BitCoin blockchain protection should lie tightly on Moore's law curve.
hero member
Activity: 592
Merit: 501
We will stand and fight.
So the real contest is time; 28nm vs ASICs.

no, i mean:

130nm ASICs will fuck 28 nm FPGAs to shit.  Cheesy

Haha, yes I know they will, but if the 28nms come out before the ASICs they will at least have a chance of entering the market Grin

no, i think 28nm FPGAs will never have chance.

too many things will happen in 2013 and 2014.

These things are what I'm excited to see. I need to get some sleep now!

About 28 nm FPGAs chances... I've counted approximately translation of FPGA into ASIC. for example if my design translated - it would get approximately 8.7 million transistors count. And it is comparable to Pentium II design, so what we have with Spartan-6 0.045 um (45 nm) is what you could get if you squeeze hard into 0.35 um ASIC.

But squeezing design so hard into ASIC would be difficult, as many errors will happen on the way. It is LIKELY that builders of ASICs would squeeze more or less kind of simple VHD design, which would give approx. 3 times worse performance, and then start gradually improve technology with about 3-4 month iteration with each try. I think ngzhang understands well, what design mistakes - when in simulators it works but in hardware does not means when it gets to ASIC production.

So back to issue about ASIC vs FPGA - I suppose that 45 nm equivalent FPGA of Spartan6 class like 0.18 um ASIC.
Then 28 nm FPGA Artix7 could be like 0.112 um ASIC (if just counted, but I suppose comparable to 90 nm more, because it has CARRY CHAIN IN EVERY SLICE, AND I HAVE ROUND DESIGN THAT USES THIS FACT, AND WHICH IS 20% smaller Spartan6 is really bad with their Slice X stuff).

Then interesting thing about FPGA prices. They will fall if volumes will be bulk. This is why I am insisting on making FPGA-based products better, with better pricing - to make it at least competitive against ASICs.

Also - costs for chip production for vendor like Xilinx or Altera is not that much than silicon costs.... So production Spartan6 or Virtex7 does not make much difference in raw material / work cost. If they would want - they could sell say 6.8 billion transistor chip for $60-70 and not for $1k-$2k for specific needs, still they would earn profits. And this is huge risk for ASIC builders. Such chip indeed would be very powerful and definitely would blow off low-end 90 nm ASIC solution. And this is what could happen - Xil or Altera will just lower prices for some specific application of their chip, to take share of this. But this will only happen of course if there will be more or less significant sales amount, say we get all-together to levels 10k chips per month.

So there's no "cheap and secure" entry into ASIC world. Those who go with 90 nm will still compete with FPGA. And it is just only about organization of FPGA sales and production, if FPGA devices vendors would have so high expenses, that they could not resist such ASIC.

The killing solution however would be to get 28-nm chip with SIMPLE design from first order. It is doable, I believe in about $4 - $6 mio. But I doubt that someone would invest it this day. At some day it will happen of course. I quoted multiple companies already about ASIC when did FPGA-based design, and typically 90nm with investments about $500k could blow off Spartans, but would be hard to compete against 28-nm.

So, please comment ? If this is just hobby for you and you won't like to stand head-to-head with upcoming ASIC or not ?

Why do you think that 28-nm would not compete ?

You probably have up to date worked the most on bitstream design as well. Interested to hear your point of view, where I made mistake ?






our design is still fixing some small bugs. i will talk with you about the design after it fully completed. at that time we will know if we can solve the problems that you have.

ASIC design is much more complex than FPGA's. simple synthesis will not work.
why i said a 130nm ASIC can defeat a 28nm FPGA? because 32bit adder in 130nm ASIC can operate over 3GHz, a 3-input 32bit adder can easily running over 1GHz. i really doubt a 28nm FPGA will running a 3-input adder over 600MHz, maybe only 500MHz.
a 130nm ASIC is really cheap now, but we must find some professional team to do this work, their salaries and their company management cost will charge a lot. that's the thing stuck me. otherwise a small mining chip will cost only 1$/ea if you build 100K of them.
i mean, taking risks for a ASIC just for mining (and earn bitcoins for benefit) is unreasonable, but i think their are someone who want to push forward the bitcoin applications and resist a 51% attack from Bank of America will consider to pay the bill. if succeed, sell 100K of 1G speed small chips will multiply the total hashing speed by 10.
51% attackers didn't need ASIC, just buy 50 of your 110G rigs (cost only 5M $), and then bitcoin dead. after that, sell the second hand chips, can get 30% money back.
sr. member
Activity: 266
Merit: 251
So the real contest is time; 28nm vs ASICs.

no, i mean:

130nm ASICs will fuck 28 nm FPGAs to shit.  Cheesy

Haha, yes I know they will, but if the 28nms come out before the ASICs they will at least have a chance of entering the market Grin

no, i think 28nm FPGAs will never have chance.

too many things will happen in 2013 and 2014.

These things are what I'm excited to see. I need to get some sleep now!

About 28 nm FPGAs chances... I've counted approximately translation of FPGA into ASIC. for example if my design translated - it would get approximately 8.7 million transistors count. And it is comparable to Pentium II design, so what we have with Spartan-6 0.045 um (45 nm) is what you could get if you squeeze hard into 0.35 um ASIC.

But squeezing design so hard into ASIC would be difficult, as many errors will happen on the way. It is LIKELY that builders of ASICs would squeeze more or less kind of simple VHD design, which would give approx. 3 times worse performance, and then start gradually improve technology with about 3-4 month iteration with each try. I think ngzhang understands well, what design mistakes - when in simulators it works but in hardware does not means when it gets to ASIC production.

So back to issue about ASIC vs FPGA - I suppose that 45 nm equivalent FPGA of Spartan6 class like 0.18 um ASIC.
Then 28 nm FPGA Artix7 could be like 0.112 um ASIC (if just counted, but I suppose comparable to 90 nm more, because it has CARRY CHAIN IN EVERY SLICE, AND I HAVE ROUND DESIGN THAT USES THIS FACT, AND WHICH IS 20% smaller Spartan6 is really bad with their Slice X stuff).

Then interesting thing about FPGA prices. They will fall if volumes will be bulk. This is why I am insisting on making FPGA-based products better, with better pricing - to make it at least competitive against ASICs.

Also - costs for chip production for vendor like Xilinx or Altera is not that much than silicon costs.... So production Spartan6 or Virtex7 does not make much difference in raw material / work cost. If they would want - they could sell say 6.8 billion transistor chip for $60-70 and not for $1k-$2k for specific needs, still they would earn profits. And this is huge risk for ASIC builders. Such chip indeed would be very powerful and definitely would blow off low-end 90 nm ASIC solution. And this is what could happen - Xil or Altera will just lower prices for some specific application of their chip, to take share of this. But this will only happen of course if there will be more or less significant sales amount, say we get all-together to levels 10k chips per month.

So there's no "cheap and secure" entry into ASIC world. Those who go with 90 nm will still compete with FPGA. And it is just only about organization of FPGA sales and production, if FPGA devices vendors would have so high expenses, that they could not resist such ASIC.

The killing solution however would be to get 28-nm chip with SIMPLE design from first order. It is doable, I believe in about $4 - $6 mio. But I doubt that someone would invest it this day. At some day it will happen of course. I quoted multiple companies already about ASIC when did FPGA-based design, and typically 90nm with investments about $500k could blow off Spartans, but would be hard to compete against 28-nm.

So, please comment ? If this is just hobby for you and you won't like to stand head-to-head with upcoming ASIC or not ?

Why do you think that 28-nm would not compete ?

You probably have up to date worked the most on bitstream design as well. Interested to hear your point of view, where I made mistake ?




legendary
Activity: 1064
Merit: 1000
So the real contest is time; 28nm vs ASICs.

no, i mean:

130nm ASICs will fuck 28 nm FPGAs to shit.  Cheesy

Haha, yes I know they will, but if the 28nms come out before the ASICs they will at least have a chance of entering the market Grin

no, i think 28nm FPGAs will never have chance.

too many things will happen in 2013 and 2014.

These things are what I'm excited to see. I need to get some sleep now!

Have a good nap, pal. But please just do not sleep till 2013!  haha Cheesy
legendary
Activity: 938
Merit: 1000
What's a GPU?
So the real contest is time; 28nm vs ASICs.

no, i mean:

130nm ASICs will fuck 28 nm FPGAs to shit.  Cheesy

Haha, yes I know they will, but if the 28nms come out before the ASICs they will at least have a chance of entering the market Grin

no, i think 28nm FPGAs will never have chance.

too many things will happen in 2013 and 2014.

These things are what I'm excited to see. I need to get some sleep now!
legendary
Activity: 4592
Merit: 1851
Linux since 1997 RedHat 4

there are some differences between both designs, so at first we will announce a update bitsteam for icarus ,  but will slower than Lancelot. this bitsteam will also used for Lancelot testing.
the reason is Lancelot have some special designs for power and heat dissipation, also some more parts for more functions.

notice that  BitFury Design have a high license fee for their bitsteam, but icarus bitsteam update will be free.

Lancelot will also have free bitsteam update Forever.

 Grin
Well my latest version of the Icarus code waiting to go into cgminer should either just work with this or maybe with minor changes

The minor changes would be if it must be mined differently to the way the Icarus currently works

The latest version allows specifying the hash speed (standard Icarus Rev3 is 2.6316ns) and the timeout/abort time (11.2s is optimal)
It will also calculate these accurately for you in either of 2 optional 'timing' modes, if you don't know what they are (that's how I got the 2.6316ns)
It should have no problems on a dual FPGA up to 840MH/s

... though, I've still yet to hear from anyone try it on an Icarus with a non-standard bitstream or a non Rev3 Icarus ...
hero member
Activity: 592
Merit: 501
We will stand and fight.
So the real contest is time; 28nm vs ASICs.

no, i mean:

130nm ASICs will fuck 28 nm FPGAs to shit.  Cheesy

Haha, yes I know they will, but if the 28nms come out before the ASICs they will at least have a chance of entering the market Grin

no, i think 28nm FPGAs will never have chance.

too many things will happen in 2013 and 2014.
legendary
Activity: 938
Merit: 1000
What's a GPU?
we don't expect ASIC's for a couple years though, as i understand it. or am i incorrect?

It all depends on how much money someone will put toward ASIC development Smiley
legendary
Activity: 1778
Merit: 1008
we don't expect ASIC's for a couple years though, as i understand it. or am i incorrect?
Pages:
Jump to: