FPGA mining for fun and profit - page 9.

greenlander

newbie

Activity: 28

Merit: 0

I might be interested in helping out with this project with either time or money. I'm a system software engineer with some experience writing VHDL with a EE degree.

I did my own back-of-the-envelope calculation similar to others on this forum. I figured you could probably fit a single unrolled pipeline in a LX75 Spartan 6. I agree with cypherf0x that it's not completely clear whether it's possible or not. For the sake of argument, let's say it is. Then if you could run at 100 MHz then you could get 100 MH/s.

The problem is that it doesn't seem much cheaper than GPU computing. The Spartan 6 LX75s are about $100 each at Avnet.com (one of Xilinx's distributors). By the time you figured in the fabrication and assembly costs you might break even in terms of "hashes per capital expense"

However, it would certainly be more power-efficient... and a lot more dense. If you were clever you could put arrays of them on DIMM-like modules like this design: http://www.sciengines.com/copacobana/ . You could potentially fit 100 or more FPGAs in the case that a single desktop computer fits in.

caston

hero member

Activity: 756

Merit: 500

Quote from: allinvain on May 17, 2011, 11:37:18 PM

How much does a Cyclone4-150 board cost?

Also do you get a discount of you buy the Spartan boards in bulk?

Kinda shame you can't get more hashes out of the Spartan boards cause compared to say a cheap 5850 it gets killed. If these boards cost like $50 or something that may be worth it for 80 Mhash/s I'd definitely buy 4 to begin with - maybe more.

I think you'll get best results if you can share the workload between the GPU and the FPGA. I don't have the technical know how to get this going but it might be possible to do it with something like the ODG1 in your mining rigs PCI slot.

http://www.linuxfund.org/projects/ogd1/

allinvain

legendary

Activity: 3080

Merit: 1083

How much does a Cyclone4-150 board cost?

Also do you get a discount of you buy the Spartan boards in bulk?

Kinda shame you can't get more hashes out of the Spartan boards cause compared to say a cheap 5850 it gets killed. If these boards cost like $50 or something that may be worth it for 80 Mhash/s I'd definitely buy 4 to begin with - maybe more.

cypherf0x

newbie

Activity: 28

Merit: 1

Quote from: mooreaa on May 17, 2011, 09:25:26 PM

Hey cypherf0x,

I just got into bitcoin and ran across your post here. I have my own startup and we run a small design/assembly service as part of our business. We have the capability to assemble FPBGA/LGA parts on PCBs and I would be really interested in working with you on a low cost Spartan-6 FPGA board. I know I would be willing to put ups some of my own cash to fund some initial board revisions, and with a little help from the community we might be able to produce a batch of these at a really compelling price.

Interested?

Aaron

Yeah, send me a PM with your email.

mooreaa

newbie

Activity: 5

Merit: 0

Hey cypherf0x,

I just got into bitcoin and ran across your post here. I have my own startup and we run a small design/assembly service as part of our business. We have the capability to assemble FPBGA/LGA parts on PCBs and I would be really interested in working with you on a low cost Spartan-6 FPGA board. I know I would be willing to put ups some of my own cash to fund some initial board revisions, and with a little help from the community we might be able to produce a batch of these at a really compelling price.

Interested?

Aaron

fpgaminer

hero member

Activity: 560

Merit: 517

Quote

Okay, so you fit "around" 1.5 engines on a chip. is it me or doesn't that make any sense at all?

You can actually fit 1.5 engines on a chip, assuming an engine is a full 128 rounds of SHA-256 (that's 128 because you need to do it twice to get the final hash that Bitcoin expects). One full engine, at 128 rounds, and a second half-engine, at 64 rounds, with a mux in front to switch between processing new data and finishing old data.

I've considered doing that for my C120, which would fit one full engine in 80K LEs, and the half-engine in 40K (if I'm lucky). Or just get my hands on a C150 and try desperately to cram two full engines on it Tongue

cypherf0x

newbie

Activity: 28

Merit: 1

Quote from: ArtForz on May 17, 2011, 08:40:35 PM

Okay, so you fit "around" 1.5 engines on a chip. is it me or doesn't that make any sense at all?

I never said I fit 1.5 engines on a chip. I apologies if some of the numbers implied that since they were ballpark estimations based on short runs.

You're free to doubt, it's your time spent.

ArtForz

sr. member

Activity: 406

Merit: 257

Okay, so you fit "around" 1.5 engines on a chip. is it me or doesn't that make any sense at all?
Edit:
Yes, I make assumptions about sha256. it's sha256. the round function including W update needs at least 8 32 bit adders. no amount of "optimizing" changes that.
And those "highly optimized" commercial cores? barely 120MHz on a S6, 65+ clocks/block, and you can maybe fit 70 on a LX150. 65Mh/s wooo...

cypherf0x

newbie

Activity: 28

Merit: 1

Quote from: ArtForz on May 17, 2011, 07:18:13 PM

Quote

A single pipeline is now doing about 133MH/s with the chip around 210MH/s total

Trying to make any sense of this.
a) You have a 120+ stage unrolled pipelined engine at 133MHz. You fit 1.58 of em? what the hell is 0.58 of a engine?
b) You have a single registered round running at 133MHz. one bitcoinhash = double-sha256 takes 128 or so clocks. you fit 200 of those - ~ 208Mh/s.
let's assume B
you need to store at least a..h and W 0..15, that's 24*32 = 768 FFs per engine.
times 200 engines. thats 153600 FFs
a S3-5000 has 66560 FFs... nope
a S6 LX100 has 126576 FFs... still nope
a S6 LX150 has 184304 FFs... 83% utilization just for the storage FFs. far edge of plausible

For adder utilization it gets hilarious, you need at least 8 32-bit adders per round.
Times 200 single-round engines thats 1600 32bit adders...
half of a S6s slices have carry logic, each of those can do 4 bits of a adder, that's a max of 988 32 bit adders on a S6 LX100, 1439 on a LX150... we need 1600... ?!?

I have the sneaking suspicion someone didn't realize one bitcoinhash = 2 sha256 blocks...

I don't know where you came up with 133MHz out of MH/s. There is the 'about' and 'around' meaning values are not absolute. The speed average was a bit high initially. You're also making design assumptions. There are highly optimized commercial hashing cores available for FPGAs too.

ArtForz

sr. member

Activity: 406

Merit: 257

Okay, so now you're fitting 2 pipelined engines on a LX150.
need 120 rounds, thanks to cheating with W updates etc you can get it down to ~6 32 bit adders per round avg, times 120 ... 720 or so 32 bit adders per engine, 1440 adders.
So *only* a bit over 100% slice utilization of a LX150, just for the adders. Yeah, sure.

cypherf0x

newbie

Activity: 28

Merit: 1

Quote from: fpgaminer on May 17, 2011, 06:58:40 PM

Without having an actual Spartan-6 LX150 board on hand, I ran my design through ISE quickly. This showed that the LUT consumption is indeed similar to Altera's, so there does not appear to be any area improvements by using a Xilinx device over Altera.

What I do not know, however, is how fast Spartan-6 LUTs operate compared to Altera's, for apples-to-apples speed grades. If they run faster, it would indeed be possible to get more bang for your LUT. I get 80MHz in my design, resulting in 80MHash/s burning 80K LUTs. The Cyclone4-150 or Spartan6 LX150 may fit two full hashing pipelines (128 SHA-256 rounds per full hashing pipeline). This would double their performance. The Cyclone4-150 achieving 160MHash/s. If the Spartan6 is faster, it could possibly achieve >200MHash/s as you've reported.

You could get faster speed grades, but those are typically a bit more expensive. I haven't calculated whether a fast speed grade would balance out the cost for its improved hashing speeds.

It's actually about 90MH/s over time per pipeline but the speed average jumps around a bit at first but settles over a longer run.

ArtForz

sr. member

Activity: 406

Merit: 257

Quote

A single pipeline is now doing about 133MH/s with the chip around 210MH/s total

Trying to make any sense of this.
a) You have a 120+ stage unrolled pipelined engine at 133MHz. You fit 1.58 of em? what the hell is 0.58 of a engine?
b) You have a single registered round running at 133MHz. one bitcoinhash = double-sha256 takes 128 or so clocks. you fit 200 of those - ~ 208Mh/s.
let's assume B
you need to store at least a..h and W 0..15, that's 24*32 = 768 FFs per engine.
times 200 engines. thats 153600 FFs
a S3-5000 has 66560 FFs... nope
a S6 LX100 has 126576 FFs... still nope
a S6 LX150 has 184304 FFs... 83% utilization just for the storage FFs. far edge of plausible

For adder utilization it gets hilarious, you need at least 8 32-bit adders per round.
Times 200 single-round engines thats 1600 32bit adders...
half of a S6s slices have carry logic, each of those can do 4 bits of a adder, that's a max of 988 32 bit adders on a S6 LX100, 1439 on a LX150... we need 1600... ?!?

I have the sneaking suspicion someone didn't realize one bitcoinhash = 2 sha256 blocks...

fpgaminer

hero member

Activity: 560

Merit: 517

Without having an actual Spartan-6 LX150 board on hand, I ran my design through ISE quickly. This showed that the LUT consumption is indeed similar to Altera's, so there does not appear to be any area improvements by using a Xilinx device over Altera.

What I do not know, however, is how fast Spartan-6 LUTs operate compared to Altera's, for apples-to-apples speed grades. If they run faster, it would indeed be possible to get more bang for your LUT. I get 80MHz in my design, resulting in 80MHash/s burning 80K LUTs. The Cyclone4-150 or Spartan6 LX150 may fit two full hashing pipelines (128 SHA-256 rounds per full hashing pipeline). This would double their performance. The Cyclone4-150 achieving 160MHash/s. If the Spartan6 is faster, it could possibly achieve >200MHash/s as you've reported.

You could get faster speed grades, but those are typically a bit more expensive. I haven't calculated whether a fast speed grade would balance out the cost for its improved hashing speeds.

cypherf0x

newbie

Activity: 28

Merit: 1

If anyone is looking for an inexpensive FPGA to experiment with try the SPARTAN-6 LX9 MICROBOARD. I've gotten a lot of messages asking about it and these boards are USB and cost less than $100

keybaud

full member

Activity: 120

Merit: 100

In your haste to create faster miners, be careful that you don't destroy that which you seek.

Quote from: Grinder on May 17, 2011, 02:00:19 PM

I just realised that Bitcoins future depends on using an algorithm that is not possible to put in hardware like this. If it is, there will probably only be one mining company left after a while because of the economy of scale.

My understanding is that there is a bigger problem, in that if one person/organisation controls over 50% of the Bitcoin network, then it is effectively compromised and bitcoins will no longer be a viable e-currency.

See this thread: http://forum.bitcoin.org/index.php?topic=8653.0

https://en.bitcoin.it/wiki/Weaknesses#Attacker_has_a_lot_of_computing_power

Attacker has a lot of computing power
An attacker that controls more than 50% of the network's computing power can, for the time that he is in control, exclude and modify the ordering of transactions. This allows him to:
Reverse transactions that he sends while he's in control
Prevent some or all transactions from gaining any confirmations
Prevent some or all other generators from getting any generations
The attacker can't:
Reverse other people's transactions
Prevent transactions from being sent at all (they'll show as 0/unconfirmed)
Change the number of coins generated per block
Create coins out of thin air
Send coins that never belonged to him
It's much more difficult to change historical blocks, and it becomes exponentially more difficult the further back you go. As above, changing historical blocks only allows you to exclude and change the ordering of transactions. It's impossible to change blocks created before the last checkpoint.
Since this attack doesn't permit all that much power over the network, it is expected that no one will attempt it. A profit-seeking person will always gain more by just following the rules, and even someone trying to destroy the system will probably find other attacks more attractive. However, if this attack is successfully executed, it will be difficult or impossible to "untangle" the mess created -- any changes the attacker makes might become permanent.

cypherf0x

newbie

Activity: 28

Merit: 1

Quote from: kebumaha on May 17, 2011, 04:28:06 PM

I got only one question. Where the heck can you even buy these things like "PICO EX-300?" All I find are specs and specs. I guess you need to study computer engineering for 10 years just to see one of those?

They're expensive enough you have to call the sales office to order them.

cypherf0x

newbie

Activity: 28

Merit: 1

For anyone skeptical about FPGA abilities the video below is for MD5 hashing but it's the same principle.

http://www.youtube.com/watch?v=zEwWvVP_RU0

kebumaha

newbie

Activity: 14

Merit: 0

I got only one question. Where the heck can you even buy these things like "PICO EX-300?" All I find are specs and specs. I guess you need to study computer engineering for 10 years just to see one of those?

fpgaminer

hero member

Activity: 560

Merit: 517

Quote

The chips on the boards have about 100k LUTs 23k slices with 4 LUTs/slice

For which platform? The PICO EX-300 platform? You're probably talking about a platform with Spartan-6's on it, because those do indeed carry four 4-LUTs per slice, with the LX150 totaling ~150k 4-LUTs.

Quote

Try implementing parallel hashing pipelines if your FPGA has the gates for it.

I develop on a C120, and I have different designs, some of which are indeed pipelined. The pipelined designs get 1Mash/s per 1K LUTs.

Perhaps the Xilinx devices can pack far more bang-per-LUT than Altera's for SHA-256 designs? I shall certainly investigate. Again, thank you for sharing your numbers.

Chris Acheson

sr. member

Activity: 266

Merit: 251

Quote from: cypherf0x on May 17, 2011, 03:31:57 PM

So someone should release the code and maybe get a bounty? You can play with maybe all day. In the end I already have a working prototype and now someone else with more FPGA experience than myself to polish the code. He develops chips for a living, I develop hardware boards and embedded software so that seems like a pretty reasonable combo for getting something done.

If you're serious about this, you should arrange to have the contributions put in escrow until you actually release something. Just putting up a black-hole donation address means that no one knows how much has been contributed, and if the total only gets halfway there the whole thing's a loss.

Anyway, the bounty isn't aimed at you specifically, so I'm going to split it off into its own thread.

Topic: FPGA mining for fun and profit - page 9. (Read 67224 times)