Author

Topic: FPGA mining looped vs unrolled (Read 1249 times)

Ari
member
Activity: 75
Merit: 10
January 03, 2013, 10:31:28 AM
#12
Well, you have two different approaches.

You can build a small processor which does the basic operations of SHA256, duplicate this many times, and then run the same instruction sequence (but different data) on each.  This is similar to how a GPU works.  The problem with this is that some operations are faster than others, so you have some idle time during some clock cycles, and some parts of the chip may be idle during some cycles.

Or, you build a large unrolled SHA256, so that every part does a calculation every clock cycle, and small fast operations can be grouped together and done in one clock cycle.  This is more efficient because there is less idle time.  Of course this takes more space and might not fit in a small FPGA.

It'd be interesting to see some actual numbers of how many gates each design takes, how fast can it be clocked, and what the throughput is in hashes/sec/area.
donator
Activity: 1218
Merit: 1079
Gerald Davis
January 03, 2013, 09:21:48 AM
#11
so it's not worth considering the option that asic's are just hot air and will not arrive any time soon?

"Anytime soon" is all relative.   Also there are at least five companies working on ASICs.  Even if some are scams, and some have overpromised eventually someone will deliver a product and it will almost overnight render everything else obsolete.  Like I said even if you were production ready (as in 10,000 FPGA boards sitting in a warehouse 200% more efficient than anything on the market, ready to ship out the door today) nobody would buy them.  

The break even point is likely well beyond the most optimistic scenario before ASICs explode difficulty so high as to make your FPGA as obsolete as mining on a Pentium 4 CPU.  This is because ASICs are the "end of life" for everything else.  So someone buying a FPGA today would need to be confident they can mine enough (after electrical costs) to cover the cost of the FPGA plus enough of a profit to make the risk worthwhile.  With current mining rewards and FPGA raw chip costs that is something 6-12 months.  Nobody is going to bet that nobody will release an ASIC chip in the next year.  It is a foolish bet one with huge downside (your investment becomes worthless in 20 days) and very very limited upside (you make marginally more coins until someone does release an ASIC).
newbie
Activity: 14
Merit: 0
January 03, 2013, 08:38:31 AM
#10
so it's not worth considering the option that asic's are just hot air and will not arrive any time soon?
donator
Activity: 1218
Merit: 1079
Gerald Davis
January 02, 2013, 04:34:39 PM
#9
when you say it's an old debate, could you link me to anything saying that unrolled is better, as I have only seen people aiming to unroll, not why it's been picked as better than massively parallel looped.

There is overhead to a loop (the loop logic itself).  In a FPGA that also involves the traces from the output back to the input for the next round of the loop.  Unrolling eliminates that.  Usually eliminating that overhead is worth more than the cost of unrolling.  There are exceptions but they are rare and far between.  The software used for planning FPGA layout tends to do better with unrolled constructs then rolled ones so that adds another source of improvement.  Keeping it rolled you likely will need some tedious hand placement.

Still it doesn't matter ASICs will be here soon and even if delayed 6 months it is still shorter than the break even time on a new FPGA.
newbie
Activity: 14
Merit: 0
January 02, 2013, 04:28:40 PM
#8
when you say it's an old debate, could you link me to anything saying that unrolled is better, as I have only seen people aiming to unroll, not why it's been picked as better than massively parallel looped.
newbie
Activity: 1
Merit: 0
January 02, 2013, 02:41:32 AM
#7
I think that the performance of this really depends on the speed grade of the FPGA, the cooling solution that you are using, and how far you can push the FPGA clocks. I'm not an expert on FPGAs but I think that the type of logic blocks that are used also matters here (which is also different between manufacturers.) Optimizing for space-savings, you might be able to reuse chunks of logic between the SHA stages, reducing the number of blocks required.
legendary
Activity: 4466
Merit: 3391
January 01, 2013, 04:56:48 PM
#6
I agree that ASIC's could change everything, but until even 1 appears, they are pure speculation, and most are probably pre-order scams / collapse before any are sold.

so until then FPGA is the only real bleading edge.

anyway, still no one willing to try and debate why it's better to unroll than have massively parallel. any takers?

This is an old debate, at least in the CPU arena. The answer so far is, "it depends".
newbie
Activity: 14
Merit: 0
January 01, 2013, 12:19:17 PM
#5
I agree that ASIC's could change everything, but until even 1 appears, they are pure speculation, and most are probably pre-order scams / collapse before any are sold.

so until then FPGA is the only real bleading edge.

anyway, still no one willing to try and debate why it's better to unroll than have massively parallel. any takers?
legendary
Activity: 1792
Merit: 1008
/dev/null
December 31, 2012, 10:23:59 PM
#4
I don't agree with the linked thread, sure if you want an easy way to make money, then GPU mine, but FPGA is the new front of development.

I would like to know if there is something I am missing in my way of looking at it, but instead of unrolling, try and use as few logic blocks as possible and then repeat that many times INSTEAD, as there is no reason why the hash has to be done quickly, many hashes at the same time is just as good.
no, ASIC is sry, still FPGAs are way more fun since they arent hardware programming.
newbie
Activity: 14
Merit: 0
December 31, 2012, 08:43:53 PM
#3
I don't agree with the linked thread, sure if you want an easy way to make money, then GPU mine, but FPGA is the new front of development.

I would like to know if there is something I am missing in my way of looking at it, but instead of unrolling, try and use as few logic blocks as possible and then repeat that many times INSTEAD, as there is no reason why the hash has to be done quickly, many hashes at the same time is just as good.
full member
Activity: 140
Merit: 100
1221iZanNi5igK7oAA7AWmYjpsyjsRbLLZ
December 31, 2012, 05:09:26 PM
#2
That's a clever way to optimize the FPGA's block usage. Subscribed.

Have you read this forum post yet? It's the reply to a guy who wanted to buy some FPGA hardware.

I only say this because we're newbies here. If you're fine with not making any money, go for it!
newbie
Activity: 14
Merit: 0
December 31, 2012, 03:14:08 PM
#1
I have been considering making an array of custom fpga miners, and from my research it seems the focus is on doing the operation (2xSHA2) in as few clock cycles. The way I see it is it's a trade of between number of logic blocks used and clock cycles to do one full operation, added to that is the price for a chip that has enough logic blocks to have a fully unrolled (and pipelined) design.

I would like anyone to explain why it's not just as useful to NOT unroll and NOT pipeline it, instead make the smallest possible version (looped version with only one sha2 core) and repeat it many many times in the FPGA, so it's massively parallel and doesn't need the more expensive chips, but could instead be done in many cheaper ones. as it's not an issue of how many clock cycles it takes, more how many operations can be done in a certain time, so for example if it takes 100th of the time in a unrolled version, but it 100 times bigger, then having 100 of them would do as many hashes.

You might think I'm stupid, but I do understand.

consider this simple example:
say you make an unrolled SHA2 core, it would need to run twice, but it's twice the size, so 1x 2 unrolled sha2 cores joined to do it in one go is equal to 2x one unrolled sha2 cores doing them looped.

basically I am considering making a very small (logic blocks used) unit that can be repeated lots and many fpga's can be made into a massive array, processing many hashes at once, but slower.

an explanation of what I'm messing up in my maths\understanding would be good Smiley
Jump to: