Pages:
Author

Topic: DIY FPGA Mining rig for any algorithm with fast ROI - page 72. (Read 99472 times)

hero member
Activity: 609
Merit: 500
DMD,XZC
It's all nonsense. There is no evidence to prove the hashrate and power consumption.
full member
Activity: 1179
Merit: 131
TLDR: Sounds Neat.  Do it if you can afford it.  Be aware of cost and risk factor of future support.

I have been mining off-and-on since 2014 and I remember quite well the FPGA interest back in 2014.  I'm sure there have been other FPGA efforts before 2014 but my point is to simply share my own opinion.

First, I'll just say this is probably a great opportunity for people who can afford this.  Not everyone will be on board to spend $5K to get a system running.  That's a rough estimate.. but $4K for the FPGA and then whatever else for ancillary equipment like mobo/ram/psu/etc.  

So people saying that this will cause the demise of GPU mining are being short sighted.  

Asides from being expensive, it is technically daunting.  We would be relying on a programmer for future firmware updates and from what I've seen the support is just not to the same scale as the existing support for other mining options.

So I'll just summarize by saying this sounds like an awesome opportunity for diversification of a mining portfolio if you have the money for that.  I imagine youtubers like VoskCoin would be jumping all over this.

TLDR: Sounds Neat.  Do it if you can afford it.  Be aware of cost and risk factor of future support.
 

This is probably one of the best and grounded comments I've read on this forum in a long time.  The fact is that 95% of miners out there have a very rudimentary understanding of computers, algorithms, and programming.  The barrier for entry is minuscule;  Buy a few GPUs and there are numerous programs available that are designed to mine on them.  I really can't envision a scenario where FPGAs will become fully mainstream.  As evidenced by all of the posts on here, they are difficult to buy, let alone program.  The real profits are always going to go to the people who spend the time and effort to find an edge in the mining game that goes beyond Nvidia, AMD, and Bitmain.
full member
Activity: 846
Merit: 115
Everybody wake the f*'ck up. Gpu mining is dead. Your competing with the big asic boys and if your competing with 14 year olds with gamer gpu that get free electricity from parents than your gpu farm is doomed to fail. Expect half of cost to be electric waste.

The decentralized dream is bullshit. Only serious players can profit and the rest will get roasted. It's a zero sum game. FPGA and asics is the only game Left to compete at industrial or small business scale. Every f*'ck tard gamer will mine and not report gains to irs to cover their GPU cost and wack off to their $1 daily gpu profits 2 year roi
member
Activity: 154
Merit: 37
Are there many individual components of the Xnn series of algos which won't fit on an arria or even a cyclone ?

Yes. Some of the components hardly fit on a 9P. The cubehash example i gave a few posts ago.


Are you saying the algorithm  for an individual cubehash pipeline hardly fits on a 9P???... while I haven’t studied cubehash specifically, it’s claimed to take 200 cycle on a basic CPU, and I can implement a lot of basic CPU cores on a 9P...

A pipeline, sure, lots of pipelines, if you want to unroll it fully and obtain real (1Gh/s+) performance it would take the entire 9P and it's not clear that a fully unrolled version of it would fit at all.



Ahh ok - I completely understand what you’re saying now. I misread as the individual algorithm took the chip.

The way i’d attack Lyra2Rev2 in hardware is a literal pipeline of chips, sized according to paralyzed throughput. It looks like the whole chain is 256 bit hashes, so 400gbps interconnect could handle your 1Gh+. For chip to hip interconnect on the same board 3 quads of 32 Gbps should be sufficient.  The Blake/keccak skein is probably all on the same chip or a much smaller chip.

Your 9P is probably $3000, you could buy 2-3x the luts for pipelines on smaller chips for that...

All that said it looks like the 1GH is worth about $1000/mo right now, so still quite a long payout if you’re using $12k in hardware.
hero member
Activity: 1118
Merit: 541
Are there many individual components of the Xnn series of algos which won't fit on an arria or even a cyclone ?

Yes. Some of the components hardly fit on a 9P. The cubehash example i gave a few posts ago.


Are you saying the algorithm  for an individual cubehash pipeline hardly fits on a 9P???... while I haven’t studied cubehash specifically, it’s claimed to take 200 cycle on a basic CPU, and I can implement a lot of basic CPU cores on a 9P...

A pipeline, sure, lots of pipelines, if you want to unroll it fully and obtain real (1Gh/s+) performance it would take the entire 9P and it's not clear that a fully unrolled version of it would fit at all.

legendary
Activity: 1316
Merit: 1014
ex uno plures
That’s exactly what I’ve been working on, for reasonable definitions of small and fast.

I think its a promising avenue for research. Its the one I would choose too. Sinking big bucks into an investment in UltraScale+ FPGA boards and being dependent on one or two VHDL coders who know the subject matter seems risky adventurous.

Revisiting sha256 ASIC development history, I can think of at least two companies (CoinTerra, Spondoolies) who failed because they tried to design large die area high hash rate chips and were late to market and at least one company (Bitmain) who succeeded by designing smaller and simpler chips, using lots of them in a miner and being first to market.
member
Activity: 154
Merit: 37
Are there many individual components of the Xnn series of algos which won't fit on an arria or even a cyclone ?

Yes. Some of the components hardly fit on a 9P. The cubehash example i gave a few posts ago.





Are you saying the algorithm  for an individual cubehash pipeline hardly fits on a 9P???... while I haven’t studied cubehash specifically, it’s claimed to take 200 cycle on a basic CPU, and I can implement a lot of basic CPU cores on a 9P...
hero member
Activity: 1118
Merit: 541
Are there many individual components of the Xnn series of algos which won't fit on an arria or even a cyclone ?

Yes. Some of the components hardly fit on a 9P. The cubehash example i gave a few posts ago.



member
Activity: 154
Merit: 37
You can’t fit a Stratix 10 on a nVME stick... I’ve tried. Kintex or Arrria is about as big as you can get. Damn 22x80 form factor.

I know.

I question the need for ultra large and expensive FPGAs instead of clusters of smaller FPGAs with high speed interconnects.
Perhaps a custom PCI-E format board with 4-6 last generation FPGA devices with some fast memory and a cross point switch. Are there many individual components of the Xnn series of algos which won't fit on an arria or even a cyclone ?



That’s exactly what I’ve been working on, for reasonable definitions of small and fast.

I have two active projects in the first spin batch phase. One is nVME with basically the biggest thing you can fit on there, and it augments GPUs more than works standalone.

The second is 4 chips on one PCIe card, with a switch, but the most reasonable chip that can be used in that configuration is still not what you would call cheap. Frankly even the nVME chip is as much as some graphics cards to get 4x 3.0 PCIe lanes.

The one advantage is the 4-chip board uses modules, so you could buy one with 1 module populated in the 3 figure range. When it is ready, which is likely August at this point for mass production.
legendary
Activity: 2296
Merit: 1031
TLDR: Sounds Neat.  Do it if you can afford it.  Be aware of cost and risk factor of future support.

I have been mining off-and-on since 2014 and I remember quite well the FPGA interest back in 2014.  I'm sure there have been other FPGA efforts before 2014 but my point is to simply share my own opinion.

First, I'll just say this is probably a great opportunity for people who can afford this.  Not everyone will be on board to spend $5K to get a system running.  That's a rough estimate.. but $4K for the FPGA and then whatever else for ancillary equipment like mobo/ram/psu/etc.  

So people saying that this will cause the demise of GPU mining are being short sighted.  

Asides from being expensive, it is technically daunting.  We would be relying on a programmer for future firmware updates and from what I've seen the support is just not to the same scale as the existing support for other mining options.

So I'll just summarize by saying this sounds like an awesome opportunity for diversification of a mining portfolio if you have the money for that.  I imagine youtubers like VoskCoin would be jumping all over this.

TLDR: Sounds Neat.  Do it if you can afford it.  Be aware of cost and risk factor of future support.
 
legendary
Activity: 1316
Merit: 1014
ex uno plures
You can’t fit a Stratix 10 on a nVME stick... I’ve tried. Kintex or Arrria is about as big as you can get. Damn 22x80 form factor.

I know.

I question the need for ultra large and expensive FPGAs instead of clusters of smaller FPGAs with high speed interconnects.
Perhaps a custom PCI-E format board with 4-6 last generation FPGA devices with some fast memory and a cross point switch. Are there many individual components of the Xnn series of algos which won't fit on an arria or even a cyclone ?

hero member
Activity: 1118
Merit: 541
i discovered bitcore yesterday. it seems to be a pure gpu coin. therefore my question.

it is possible to mine bitcore (btx) coins with this fpga miner? algo is Timetravel10.

Yes, it's basically lyra2rev2 with only a single round of cubehash and no memory. I'd guess maybe 900mh/s-1.2gh/s.

Edit: No, sorry, it's nist5 + bmw, luffa and cube. And a randomized order to the hashes. Ya, maybe 900Mh/s with some intelligent buffering. It also depends on how long the chain is, and I'm not quite sure I understand that.
member
Activity: 144
Merit: 10
@senseless

@2112

@GPUHoarder

Thank you for your explanations and I understood what was said, much obliged.

No need to explain what was said below, I have the patience to wait and see how throughput can be doubled or quadrupled when the cards are daisy chained as stated below. I must not be very good at math as I thought  Grin

- It is possible by using data from initial algorithms to project the hashrate within +/-10% for future algorithms, and in that light the expected rates (per card) are about 300MH/s for X17 & X16R, 25MH/s for Neoscrypt, 600MH/s for Lyra2v2; 150MH/s for Xevan (Bittware card only for Xevan!).  For Equihash it is much harder to calculate the projected hash rate.  I don't think Ethash would be profitable enough to be worth it.  Those numbers are just projections though, and their profits are in the same range as the initial algorithms being released, with X16R and Xevan looking the best at around $75/day
- X17 and X16R require two FPGA cards daisy chained together with 2 x 100G ethernet cables, one FPGA does half the function, the other FPGA does the other half
- Xevan requires FOUR FPGA cards daisy chained together with 6 x 100G ethernet cables; this is only possible with the Bittware card

Clarifying the projected hash rates
X17: 2 cards daisy chained get 600MH/s total
X16R: 2 cards daisy chained get 600MH/s total
Xevan: 4 Bittware cards daisy chained get 600MH/s total
member
Activity: 154
Merit: 37
In the next couple of years we'll be buying Stratix 10 PCI-E boards at walmart for $600 a pop.


And gamers will be complaining that miners have bought up all the NvmE Stratix 10 FPGA sticks

You can’t fit a Stratix 10 on a nVME stick... I’ve tried. Kintex or Arrria is about as big as you can get. Damn 22x80 form factor.
jr. member
Activity: 322
Merit: 1
i discovered bitcore yesterday. it seems to be a pure gpu coin. therefore my question.

it is possible to mine bitcore (btx) coins with this fpga miner? algo is Timetravel10.
legendary
Activity: 1316
Merit: 1014
ex uno plures
In the next couple of years we'll be buying Stratix 10 PCI-E boards at walmart for $600 a pop.


And gamers will be complaining that miners have bought up all the NvmE Stratix 10 FPGA sticks
hero member
Activity: 1118
Merit: 541
I'm wanting to try fpga mining on an AWS EC2 instance.  It seems anyone that has done/is doing that is keeping the 'how to' close to their chest.  BFGMiner seems to be the way to go, but where does one get the bitstream (or in the case of AWS the AFI containing the bitstream)?

After seeing posts saying someone fried an AWS F1 board with 300A, and that now there is a 150W limit but it is only a warning, I did a little searching and found that it appears AWS F1 limits your core power (Vccint) to 85W, which would be 100A at 0.85V.  They say they may/will shut you down (gate your clocks) if you exceed this, see https://github.com/aws/aws-fpga/blob/master/hdk/docs/afi_power.md

That's not what I said at all. Power limitations were not introduced until aws shell v1.3.5 IIRC (maybe as early as 1.3.0? -- I don't remember off hand), sometime around sept/oct 2017. And yes, when I compile firmwares for my 80A 0.85V vccint VCU118, I can ignore the power warning if I wish, continue to compile a firmware, load the bitstream and fry my $7500 board -- If i wanted.

Really? What happens if you try to draw 300 amps on a board that only has a 160A vccint supply? Did you know vivado only tosses an ignorable warning? That you can still compile and complete the firmware? I know people who have fried their own fpga boards by drawing more current than the board has a supply. I have destroyed amazon boards by drawing too much current (unintentionally). This is just one of many ways you can physically destroy a FPGA with a bad firmware / design problem.

The only fud about what I said is that it could possibly happen.
member
Activity: 154
Merit: 37
As it relates to Ravencoin mining with FPGAs, OP will need to store over 300 million bitstreams to account for every possible combination. Better get back to the drawing board because this design will never work.
Partial reconfiguration - you don’t need every combination, just every building block.
Yeah, for X16r coins thats 16^2=256, for X16s coins thats 16!/14!=240. Certainly doable.

Hmmm, two accelerator cards will be daisy chained with pipelining and their performance will magically double. I will believe it when I see it like all other claims made by the OP.

This isn’t hard. I do it all the time for a few algorithms. Here’s a contrived example - fill the scratchpad for CN7 on one FPGA dedicated to that, spitting out 2MB scratch pads all day long, and taking them back in and compressing / finalizing them. Total bandwidth for (example) 22kH is 343 Gbps. That’s achievable on lots of current hardware.

This makes it a lot easier build two sets of pipelines on two FPGAs for two related but very different set of operations. Doing all the things on two separate FPGAs couldn’t achieve the same performance.

This was a back of the envelope example, but the Xxx algorithms that just chain more on to the process definitely lean them selves to this kind of operation. This (and memory bandiwdth, and easier cooling) is why my accelerator cards have 4x75W
Ultrascale + FPGAs and not one big Virtex. Interconnect on those is 256 Gbps.


Edit: Let me try to phrase this in a few words. Don’t waste the extremely high bandwidth interconnect and resources inside the FPGA for something you can use the slow external interfaces to accomplish.
newbie
Activity: 31
Merit: 0
Please PM me if you're looking for someone to guinnea pig/collaborate to help progress your goal.  I have funds I'm willing to use for POC.
legendary
Activity: 2128
Merit: 1073
Hmmm, two accelerator cards will be daisy chained with pipelining and their performance will magically double. I will believe it when I see it like all other claims made by the OP.
Why it wouldn't pipeline efficiently? There are 8 transceivers available over QSFP28 that work at the raw speed of 32.75 Gbps for a total of 262 Gbps from one board to the other in one direction. No back-channel is required, also we don't care about protocols and error detection & correction over our link, it is for lottery purposes only.

If that is not enough there are 16 of the same transceivers connected to the PCIe edge connector. This may be a little more tricky, to run them at full blast without obeying PCIe protocols we would need to either do some simple trace cuts on the backplane or find a way to busy-out and/or disable the PCIe bridge chip.

I seriously don't see the inter-board bandwidth as an important limitation. I haven't done the above with the Ultrascale+ technology, but I've successfully found ways to abuse the older connectivity standards (for an application not related to cryptocoins, but also tolerating occasional noise.)
Pages:
Jump to: