Pages:
Author

Topic: Looking for an FPGA with cache for BTC and Litecoin Mining - any ideas? (Read 9536 times)

newbie
Activity: 55
Merit: 0
I wonder if the up/down ports on the cairns more 1 fpga could be used to interface with some sort of memory.
legendary
Activity: 1484
Merit: 1005
LaSeek has been running simulations like crazy, they come out fast, but after synthesis they run very slow so far.

Do you have a link? I could not find much information

Proprietary design, they're not talking about it a lot.  You can try them at #litecoin-dev on freenode if you like.
legendary
Activity: 1270
Merit: 1000
LaSeek has been running simulations like crazy, they come out fast, but after synthesis they run very slow so far.

Do you have a link? I could not find much information
.m.
sr. member
Activity: 280
Merit: 260
Hi, what do you think about this one ?
Virtex-7 2000T: Designed with ASIC Prototyping and Emulation in Mind - FPGA enabled by Stacked Silicon Interconnect (SSI) technology delivers 2 million logic cells, 6.8 billion transistors in 28 nm design.
Around 20W @ 100 MHz  (3600 8 bit processing elements consumed 85% chip capacity providing 180 000 MIPS)

http://www.xilinx.com/applications/asic-prototyping/index.htm

oops - do they really cost 5000 USD each ?
legendary
Activity: 1484
Merit: 1005
The slow speeds are on real chips. The simulations are what runs fast.

They're working on a lot of optimizations for the N=1024, p=1, r=1 scenario that is the current implementation.  I think it's more of a technical challenge for laSeek as an FPGA engineer than anything else. It'll be interesting to see if he gets it off the ground.
legendary
Activity: 1708
Merit: 1000
Reality is stranger than fiction
LaSeek has been running simulations like crazy, they come out fast, but after synthesis they run very slow so far.

Is there any chance that the slow speed comes from the simulation itself? What if we tried it on a real board. Would the results be similar or way different (better)?
legendary
Activity: 1484
Merit: 1005
LaSeek has been running simulations like crazy, they come out fast, but after synthesis they run very slow so far.
legendary
Activity: 965
Merit: 1000
That's exactly my problem. I write software and programmed pal's etc many years ago. But never programmed an fpga. I got a link to a sha256 implementation in vhdl (I know, that's not what ltc requires), and I compared it to the C sources just to get an idea, how similar they look. And at a first glance you can port the C sources almost 1:1. But I guess the devil is in the detail, so I won't claim, that a scrypt port is no problem. I wondered, if it's feasable to simulate the whole hardware, before any money is spent on prototype boards? But maybe the dev software will cost quite some money alone....don't know...
legendary
Activity: 1708
Merit: 1000
Reality is stranger than fiction
You can add up to 4 GB ram. I thought that might be sufficent for an ltc lookup table.

That's good news. I do not think the RAM will be expensive, so who should we ask to get better info? Have you any insights on this: how much hashes it will produce, what is needed for programming the board to be able to mine with scrypt. I' m a software engineer, but I've never programmed a board..
legendary
Activity: 1484
Merit: 1005
Any news on this? have you designed the chip?

These are theoretical numbers...  laSeek has been busting his ass to try to get kilohash/second rates into the double digits with inexpensive FPGAs.  The trials in altera FPGAs were a trainwreck.

The problem is that even with a large number of slices, you will run into the problem that
1) Memory bandwidth in FPGA devices is poor comparative to a GPU.  For on-slice cache it is 10-20x less than that of a GPU, and for off-chip memory it is about 20-40x less than a GPU.
2) Clock rate of FPGA devices in general is lower than that of GPUs.

You can resolve 1) by chaining memory interfaces in a multichip configuration, but that's a lot of hardware customization.
legendary
Activity: 965
Merit: 1000
You can add up to 4 GB ram. I thought that might be sufficent for an ltc lookup table.
legendary
Activity: 1708
Merit: 1000
Reality is stranger than fiction

What exactly can we do with this board? It says it has advanced memory interfacing. Can we use it for mining LTCs?
hero member
Activity: 1596
Merit: 502
Remember, litecoin doesn't use the memory bandwidth, it uses the L1 (L2?) cache bandwidth, which is much higher.
legendary
Activity: 1610
Merit: 1000
I was wandering is it possible for a miner software + bitstream to use external RAM resource? Something like PC RAM. We can put a much as we want.
Spartan-6 memory controller blocks are designed to control single memory chips, not multi-chip memory modules. There are 4 memory controller blocks in each Spartan-6, but depending on the package not all are connected to pins.

While it isn't impossible to build a memory-module controller from the regular Spartan-6 logic blocks, such controller will be inefficient and slow. In the Xilinx product line only Virtex FPGA can directly interface with multi-chip memory modules.

BTW, bitfury is very busy with his 55nm Bitcoin ASIC project:

https://bitcointalksearch.org/topic/m.1641318
10X

Buy the way i am watching bitfury closely long time ago:)
legendary
Activity: 2128
Merit: 1073
I was wandering is it possible for a miner software + bitstream to use external RAM resource? Something like PC RAM. We can put a much as we want.
Spartan-6 memory controller blocks are designed to control single memory chips, not multi-chip memory modules. There are 4 memory controller blocks in each Spartan-6, but depending on the package not all are connected to pins.

While it isn't impossible to build a memory-module controller from the regular Spartan-6 logic blocks, such controller will be inefficient and slow. In the Xilinx product line only Virtex FPGA can directly interface with multi-chip memory modules.

BTW, bitfury is very busy with his 55nm Bitcoin ASIC project:

https://bitcointalksearch.org/topic/m.1641318
legendary
Activity: 1610
Merit: 1000
I was wandering is it possible for a miner software + bitstream to use external RAM resource? Something like PC RAM. We can put a much as we want.
Having it as commission bit stream is ok for me. There are a lot of Spartans out there and if this is possible at all, whoever makes it will be rewarded for sure
Any comments?
legendary
Activity: 1708
Merit: 1000
Reality is stranger than fiction
Any news on this? have you designed the chip?
sr. member
Activity: 266
Merit: 251
Well, scrypt's scratchpad is 1024 x 1024 matrix. there are two loops, causing major slowdown:

1st loop - for i from 0 to 1023 - filling scratchpad doing:
   scratchpad[1023..0] <= X[1023..0];
   X[511..0] <= xor_salsa(X[511..0], X[1023..512]);
   X[1023..512] <= xor_salsa(X[1023..512], X[511..0]);

2nd loop - use scratchpad for i from 0 to 1023
   X[1023..0] <= X[1023..0] xor scratchpad[X[521..512]][1023..0];
   X[511..0] <= xor_salsa(X[511..0], X[1023..512]);
   X[1023..512] <= xor_salsa(X[1023..512], X[511..0]);

While xor_salsa could be perfectly pipelined, in Spartan6 XC6SLX150 fits only 8 scratchpads.
If BRAMs are not used for bitcoin computations, it is possible to implement LTC mining for XC6SLX150 at about 50 - 100 kh/s per chip with about 80% of slices free.
So single chip can mine both - LTC and BTC using different of its internal resources - BRAMs for LTC and logics for BTC.

What is interesting to note - that scratchpad access could be perfectly pipelined as well, and is 1024-bit wide. That means that imaginable FPGA should have only 6 wires
to transmit out address (6 bits + clock) and get 1024 input wires for scratchpad data.

This means that multiple smaller DRAM chips working in parallel will do best job... Allowing about 500 mega-transfers for low-cost / mid-cost fpga, that is 500 giga-bits per
second or 60 gigabytes per second. Overall cost of DRAM will be about 150 EUR- and of FPGA to handle that about 300 EUR-. If works in fully-pipelined manner it would give
about 500 kh/s mining performance for litecoin application.

Generally performances achieved near the same for litecoin as for decent GPU boards with FPGA, but power consumption would be radically less than for SHA256 bitcoin
mining for example. Power dissipation would be very low. That is only point. Cost to build solution would be higher.

What is more interesting, that there will be no cheap way of ASIC for LTC purpose, as basically most of chip area would be RAM, and there will be no significant edge to produce
RAM for pipelining using say 250-nm or 90-nm tech process. But - building cheap 250-nm chips for computations and to drive DRAM arrays would give significant cost
reduction compared to installing FPGAs. Still - DRAM prices will not go anywhere and best DRAM-based solution would not outperform GPUs or CPUs by orders of magnitude.

Say for Scrypt it is best to pipeline about 32-36 calculations deep, not 1024... That would make xor_salsa calculations and DRAM access performances comparable.

Best on-die solution should contain 1024-bit wide (bus) and 32768-bit tall DRAM block - that will be biggest thing. For example for 90nm - 90nm is smallest feature size, while such single
transistor. Single holding cell with routing area would have size about 0.5 um^2. So overall chip area would be ~16 mm^2 (!) without self-healing features. And computation unit size would
be below 0.2 mm^2 :-)

For 90-nm that still requires $500k initial investments to build masks + investments into design, etc... And you'll get at about $1 per die price for chip that could compute 124 kh/s.
Power consumption will be neglible - about 0.1 - 0.5 W

What is more interesting that 180nm would require ~$150k-$200k initial investments and will lead to $4 / die price (die will be bigger) and about 60 kh/s performance.

And 250-nm would require _much_ less - of about $50k-$80k initial investments and will lead to $8 / die price (die will be very big! 128 mm^2!) with about 40 kh/s performance.

So I would consider 180-nm to 250-nm for LTC ASIC. big die maybe not that bad, as that die can be mounted without packaging easily (11 mm x 11 mm is really big!).

Well - these numbers are very preliminary... I am currently learning into ASIC design, I think I would design scrypt hasher chip as well - 250-nm requires really small amounts of money to start with,
and maybe there's something could be invented as well for speedup - say scrypt accesses memory randomly only in second part of using scratchpad, but when generating it access memory sequentially.
Hmm ... maybe even Litecoin chip would be done before Bitcoin chip, as it seems to be much simpler and uses well-known techniques - less competition :-)

hero member
Activity: 1596
Merit: 502
I think I would round it up to 256 kB so you only need to make address lines for within the 256 kB and other address lines for the threads. That way you don't have to calculate where you must read by doing thread number * 128.5 kB.
But that would still give the possibility of 512 parallel scrypt threads with that amount of memory.
Pages:
Jump to: