Pages:
Author

Topic: DIY FPGA Mining rig for any algorithm with fast ROI - page 66. (Read 99472 times)

full member
Activity: 348
Merit: 119
Interesting post , i read too much for today will finish later

Quote
FPGA can change to do the new algo in hours.

it cant be so fast ,dev has to be honest on this point , it represent some work , and variable depending algo

@whitefire990 where you come from please , i mean US , EUROPE , ASIA ?
else , this need better multi fan plate , and more silent Smiley

is it compatible with windows ? as i see install is with linux shell
newbie
Activity: 5
Merit: 0


If you already have a VCU1525 (a real one, not AWS instance), then please message me ASAP to receive your pre-release software.







I own one VCU1525's, I sent the OP a PM.


I'll help confirm the feasibility of this.
newbie
Activity: 4
Merit: 0
Very interested in nexys video.

That's all my budget allows atm but I am reinvesting to grow my farm.
member
Activity: 125
Merit: 35
Really not meaning to offend anyone, this has been a very interesting and entertaining thread, even inspiring all around in a way that leads to substantially more decentralization.  However, as an energy industry professional I must say that a kw/h and a kWh is substantially exactly the same thing.  Not sure what you guys are onto here... a kW is a unit of energy, it is a 1,000 Watts.  Watts are convertible to Joules or Therms or any other unit of energy.  And a kW/h is the number of kiloWatts consumed in an hour, as is a kWh, the number of kiloWatts consumed in an hour.  a kW is a measurement of power, and a kWh is a volumetric measurement of energy.  you can use 100kW in one hour is 100kWh or you can use 50kW for 30 minutes and 150kW for 30 minutes and it will also be 100kWh.

You can convert Joule to Wh, it is indeed a measurement of amount of energy.
Watt's cannot be converted to Joules directly as much as velocity cannot directly be converted to distance without the input of other parameters to build an equation.

Just saying... of course seeing is believing, but not really any reason not too here.  If Bittware or anyone else is trying to just unload a bunch of hardware, do they really need to do it in a bitcoin talk forum? I think there is more going on out in the world than what people are making out here.  Healthy skepticism sure... but look at the size of this thread!  People realize this is a really important topic, there is a reason for it.  Centralization of hashing power and ASIC's in general are beginning to threaten the security of crypto software... the very thing it was meant to solve.  Not good...

Let's see how it turns out, I do think though that they see what profits NVidia and AMD have gotten and want a piece of the pie. A healthy dose of skepticism is never a bad idea.
Possible profits remain to be shown but I think the claims and motivation behind the project are not unreasonable, time will tell.

I own two VCU1525's, I sent the OP a PM.

I'll help confirm the feasibility of this.

Curious to see first results!
newbie
Activity: 14
Merit: 0
I own two VCU1525's, I sent the OP a PM.


I'll help confirm the feasibility of this.
newbie
Activity: 7
Merit: 0
What’s wrong with 900W to 1kw per hour exactly? Other than being pedantic I think that saying consumption is 0.9-1 KWH is understood.
Your teacher should have explained to you what is the difference between kW/h and kWh.

But on a marketing site dedicated for miners, that to quote earlier post in this thread:
The fact is that 95% of miners out there have a very rudimentary understanding of computers, algorithms, and programming.
I would add that they also have rudimentary understanding of literacy and numeracy.

This is what makes reading mining forums such a great fun. Are people really that stupid or are they just pretending? How to they are going to bamboozle people with bullshit calculations involving non-existing units of measure like kelvin-watt-henry?

On this occasion I'd like to post a good advice that reeses had given about a year six years ago:
I'd recommend reading "The Big Con" for some of the history, and watching Confidence and The Sting as examples of the "classic" con games.
I read that book, and although it was written between the world wars, it is very pertaining to Bitcoin all cryptocurrecies. Here's a short excerpt:

  • Locating and investigating a well-to-do victim. (Putting the mark up.)
  • Gaining the victim’s confidence. (Playing the con for him.)
  • Steering him to meet the insideman. (Roping the mark.)
  • Permitting the insideman to show him how he can make a large amount of money dishonestly. (Telling him the tale.)
  • Allowing the victim to make a substantial profit. (Giving him the convincer.)
  • Determining exactly how much he will invest. (Giving him the breakdown.)
  • Sending him home for his amount of money. (Putting him on the send.)
  • Playing him against a big store and fleecing him. (Taking off the touch.)
  • Getting him out of the way as quietly as possible. (Blowing him off.)
  • Forestalling action by the law. (Putting in the fix.)


Really not meaning to offend anyone, this has been a very interesting and entertaining thread, even inspiring all around in a way that leads to substantially more decentralization.  However, as an energy industry professional I must say that a kw/h and a kWh is substantially exactly the same thing.  Not sure what you guys are onto here... a kW is a unit of energy, it is a 1,000 Watts.  Watts are convertible to Joules or Therms or any other unit of energy.  And a kW/h is the number of kiloWatts consumed in an hour, as is a kWh, the number of kiloWatts consumed in an hour.  a kW is a measurement of power, and a kWh is a volumetric measurement of energy.  you can use 100kW in one hour is 100kWh or you can use 50kW for 30 minutes and 150kW for 30 minutes and it will also be 100kWh.

How is the original post confusing or misleading again?  No, the OP hasn't offered "evidence" of his experimentation other than several photos and videos, but a lot of people that seem to know the art well are discussing the possibilities in a meaningful way, which makes the claims relatively speaking, plausible.  And considering how FPGA's have been used for ages to mine similar algorithms, and considering how the FPGA's currently available OTS are dramatically larger and more powerful than those original silicon used to mine BTC... it all makes perfect sense.

Why not just wait until May 30th or whatever and let him release his work to the various people that are willing to try it out and have hardware?  And if none of it materializes, the difference between a kW/h and a kWh is totally irrelevant.  And if it does, that is really neat too.

Just saying... of course seeing is believing, but not really any reason not too here.  If Bittware or anyone else is trying to just unload a bunch of hardware, do they really need to do it in a bitcoin talk forum? I think there is more going on out in the world than what people are making out here.  Healthy skepticism sure... but look at the size of this thread!  People realize this is a really important topic, there is a reason for it.  Centralization of hashing power and ASIC's in general are beginning to threaten the security of crypto software... the very thing it was meant to solve.  Not good...
jr. member
Activity: 252
Merit: 8
I don't see where the idea comes from that increased mining hashrate increases a coins' price.
A higher gold price causes increased interest in gold mining, not vice versa.
Yeah, theoretically it should be exactly opposite. A sudden increase in supply = lower price
e97
jr. member
Activity: 58
Merit: 1
I don't see where the idea comes from that increased mining hashrate increases a coins' price.
A higher gold price causes increased interest in gold mining, not vice versa.

I believe the thinking is:

faster hash rate = more coins = more profitability -> drives more miners -> more interest => more speculation on the coin

There are some iffy transistions but that seems to be the 'pump dump' / penny-crypto way
member
Activity: 125
Merit: 35

For XYZ==GPU start with GPUs strengths. I haven't studied the recent GPU universal shader architecture, but the main idea was to optimize particular floating point computation used in 3D graphics using homogeneous coordinates, like AX=Y, where A is 4*4 matrix and X is 4*1 vector where w==1. So include lots of those in your hash function. In particular GPUs are especially fast when using FP16, a half-precision floating point.


NVidia GPU's perform abysmal in half and double precision workloads. For half precision(FP16) you can expect somewhat the same amount of FLOPS as full precision(FP32) and around 3% of the full precision flops on double precision(FP64). You would expect double on FP16 & half on FP64 comparing to FP32.
For AMD it's a similar story except for Vega 56 & 64 having double FP16 performance but sadly crippled on FP64 still.

Only the Quadro cards & recent Titan V are not sterilised like that and do double the FLOPS on half precision and 50% of FP32 on FP64.
Some older AMD cards are much less cut down as well, with an R9 280x performing 3x better than a 1080ti in FP64.

sources:
https://medium.com/@u39kun/titan-v-vs-1080-ti-head-to-head-battle-of-the-best-desktop-gpus-on-cnns-d55a19866b7c
http://www.geeks3d.com/20140305/amd-radeon-and-nvidia-geforce-fp32-fp64-gflops-table-computing/
https://www.anandtech.com/show/11717/the-amd-radeon-rx-vega-64-and-56-review/4

edit: did some extra clarification for other readers that might be interested.

well Denarius (DNR) created the Tribus algo. Seems smart to do Tribus first seeing DNR is soon to be the fastest most secure crypto that no ones knows about yet. Mining DNR with FGPA miners should rocket DNR's price.

Cant wait

I don't see where the idea comes from that increased mining hashrate increases a coins' price.
A higher gold price causes increased interest in gold mining, not vice versa.
member
Activity: 118
Merit: 10
well Denarius (DNR) created the Tribus algo. Seems smart to do Tribus first seeing DNR is soon to be the fastest most secure crypto that no ones knows about yet. Mining DNR with FGPA miners should rocket DNR's price.

Cant wait
sr. member
Activity: 1021
Merit: 324
I suppose the release of Keccak first is purely for showing proof of concept? Because I don’t see it making more $12 per card.

This is correct.  The Keccak launch is primarily to iron-out power and thermal issues, determining unit-to-unit variance in over-clocking capacity, as well as auto-detecting FPGA's attached to the PC, and various other proofs of concept for scaling up operations.  The Tribus launch on June 15 is the first bitstream that generates significant profit. 





How come you chose tribus to start with?
member
Activity: 154
Merit: 37
I’ll see if I can dig up recent ones. A lot of people pull up the old CUDA vs FPGA academic papers that are focused on very old architectures.
Thanks in advance.

I'll put the blame squarely on the vendor's lap. Intel which now acquired Altera still lists "An Independent Analysis of Altera’s FPGA Floating-point DSP Design Flow" from 2011 as the only source mentioning "accuracy". I've found several other, newer papers; but they all repeat the old bullshit methodology: only using single-precision and only estimating the errors. At most they'll show fused-multiply-add like if double precision or https://en.wikipedia.org/wiki/Kahan_summation_algorithm never existed, or didn't apply.

As to GPU floating point performance, you don’t need a benchmark. The figures are right in the ISA documents. Single precision TFLOPs are usually given in terms of FMA unit operations though, which is a bit misleading.

The FPGAs are a bit harder to get TFLOPs numbers for given the flexibility,  it since most of the performance actually comes from the DSP blocks you can calculate those. If you’ve never read them Xilinx gives extremely detailed performance metrics for every chip for most IP blocks, as well as frequency numbers for the hard blocks in the AC/DC switching characteristic docs.  Agner Fog publishes a very detailed set of specifications for the performance of those units on most every CPU/APU available as well.
The funny thing is that the closest to honest comparison of Xilinx's FP I've found on the Altera's site:

https://www.altera.com/content/dam/altera-www/global/en_US/pdfs/literature/wp/wp-01222-understanding-peak-floating-point-performance-claims.pdf

The main resource CPUs and GPUs have is instruction flexibility. Until a PoW hash truly requires most of the full instruction to be supported to implement it will be hard to keep out ASIC/FPGA.
I think this claim is true, but somewhat pessimistic. I think it would be fairly easy once wider range of cryptocurrency programmers start to appreciate floating point and https://en.wikipedia.org/wiki/Chaos_theory as an useful building blocks for the proof-of-work algorithms.

I've only skimmed the currently available literature on the subject, but it is next to trivial to demolish all the current claims of FPGA superiority that I was able to find today:

1) use double precision
2) use division or reciprocal (either accurate or approximate)
3) use square-root or reciprocal square-root (either accurate or approximate)

and I haven't even gotten into transcendental functions (on CPUs) or using later, pixel-oriented hardware in the shaders (on GPUs).

You did, however, motivated me to reconsider Altera/Quartus for certain future projects. They are now shipping limited, but fully hardware implemented single-precision floating-point in their DSP blocks and their toolchain had improved in terms of supported OS-es/device-drivers.

I deal with a lot of complex, large FFTs on CPUs, GPUs, and FPGAs. The “only using single precision” is unfortunately true of every vendor - GPU and FPGA. Marketing wants to use the big number - and frankly so do most real world users now. Modern GPUs are horrible at double precision. It is a sad fate. Your comparison also compares a modern Stratix 10 (10 TFLOPs) to the previous generation Ultrascale (not Ultrascale+) with slower fabric and significantly fewer DSP blocks compared to the VCU1525 (XCVU9P-L2FSGD2104E) everyone has been talking about here.

Compared to even modern weak DP GPUs any normal priced CPU is horrible at double precision. A modern GPU runs circles around the Complex FFTs using double precision vs a CPU. Both become quickly memory bound. The FPGA performance is usually on par or slightly better for the double precision, but the benefits in the rest of the calculation are much better. I think you’ll be hard pressed to build a hashing algorithm that is entirely Floating Point like a synthetic benchmark.

The only place FPGAs really fall down is upfront cost.

I’m still a bit confused by why you think sqrt/reciprocal, and the transidentals are so difficult for FPGA, or that’s they are magically free on GPUs/CPUs. On at least AMD GPUs these are macro-ops that take (100s of clockcycles) EDIT: searching for my reference on this, I see these ops are quarter rate. May have been thinking of division) . On the FPGA you can devote a lot of logic to lowering the latency on these functions, or you can pipeline them nice and long with very high throughput to match what you need for the algorithms in question. You have none of that flexibility on the GPU. What you do have is a tremendous amount of power and overhead in instruction fetching, scheduling, branching, caching, etc. to a limited set of ports to implement the opcodes for each GCN/CUDA core.





copper member
Activity: 166
Merit: 84
I suppose the release of Keccak first is purely for showing proof of concept? Because I don’t see it making more $12 per card.

This is correct.  The Keccak launch is primarily to iron-out power and thermal issues, determining unit-to-unit variance in over-clocking capacity, as well as auto-detecting FPGA's attached to the PC, and various other proofs of concept for scaling up operations.  The Tribus launch on June 15 is the first bitstream that generates significant profit. 



member
Activity: 160
Merit: 10
I’ll see if I can dig up recent ones. A lot of people pull up the old CUDA vs FPGA academic papers that are focused on very old architectures.
Thanks in advance.

I'll put the blame squarely on the vendor's lap. Intel which now acquired Altera still lists "An Independent Analysis of Altera’s FPGA Floating-point DSP Design Flow" from 2011 as the only source mentioning "accuracy". I've found several other, newer papers; but they all repeat the old bullshit methodology: only using single-precision and only estimating the errors. At most they'll show fused-multiply-add like if double precision or https://en.wikipedia.org/wiki/Kahan_summation_algorithm never existed, or didn't apply.

As to GPU floating point performance, you don’t need a benchmark. The figures are right in the ISA documents. Single precision TFLOPs are usually given in terms of FMA unit operations though, which is a bit misleading.

The FPGAs are a bit harder to get TFLOPs numbers for given the flexibility,  it since most of the performance actually comes from the DSP blocks you can calculate those. If you’ve never read them Xilinx gives extremely detailed performance metrics for every chip for most IP blocks, as well as frequency numbers for the hard blocks in the AC/DC switching characteristic docs.  Agner Fog publishes a very detailed set of specifications for the performance of those units on most every CPU/APU available as well.
The funny thing is that the closest to honest comparison of Xilinx's FP I've found on the Altera's site:

https://www.altera.com/content/dam/altera-www/global/en_US/pdfs/literature/wp/wp-01222-understanding-peak-floating-point-performance-claims.pdf

The main resource CPUs and GPUs have is instruction flexibility. Until a PoW hash truly requires most of the full instruction to be supported to implement it will be hard to keep out ASIC/FPGA.
I think this claim is true, but somewhat pessimistic. I think it would be fairly easy once wider range of cryptocurrency programmers start to appreciate floating point and https://en.wikipedia.org/wiki/Chaos_theory as an useful building blocks for the proof-of-work algorithms.

I've only skimmed the currently available literature on the subject, but it is next to trivial to demolish all the current claims of FPGA superiority that I was able to find today:

1) use double precision
2) use division or reciprocal (either accurate or approximate)
3) use square-root or reciprocal square-root (either accurate or approximate)

and I haven't even gotten into transcendental functions (on CPUs) or using later, pixel-oriented hardware in the shaders (on GPUs).

You did, however, motivated me to reconsider Altera/Quartus for certain future projects. They are now shipping limited, but fully hardware implemented single-precision floating-point in their DSP blocks and their toolchain had improved in terms of supported OS-es/device-drivers.



Just wondering why you don't develop a new algo..you seem to have a handle on what is needed..its people like you that are needed to move this forward..
legendary
Activity: 2128
Merit: 1073
I’ll see if I can dig up recent ones. A lot of people pull up the old CUDA vs FPGA academic papers that are focused on very old architectures.
Thanks in advance.

I'll put the blame squarely on the vendor's lap. Intel which now acquired Altera still lists "An Independent Analysis of Altera’s FPGA Floating-point DSP Design Flow" from 2011 as the only source mentioning "accuracy". I've found several other, newer papers; but they all repeat the old bullshit methodology: only using single-precision and only estimating the errors. At most they'll show fused-multiply-add like if double precision or https://en.wikipedia.org/wiki/Kahan_summation_algorithm never existed, or didn't apply.

As to GPU floating point performance, you don’t need a benchmark. The figures are right in the ISA documents. Single precision TFLOPs are usually given in terms of FMA unit operations though, which is a bit misleading.

The FPGAs are a bit harder to get TFLOPs numbers for given the flexibility,  it since most of the performance actually comes from the DSP blocks you can calculate those. If you’ve never read them Xilinx gives extremely detailed performance metrics for every chip for most IP blocks, as well as frequency numbers for the hard blocks in the AC/DC switching characteristic docs.  Agner Fog publishes a very detailed set of specifications for the performance of those units on most every CPU/APU available as well.
The funny thing is that the closest to honest comparison of Xilinx's FP I've found on the Altera's site:

https://www.altera.com/content/dam/altera-www/global/en_US/pdfs/literature/wp/wp-01222-understanding-peak-floating-point-performance-claims.pdf

The main resource CPUs and GPUs have is instruction flexibility. Until a PoW hash truly requires most of the full instruction to be supported to implement it will be hard to keep out ASIC/FPGA.
I think this claim is true, but somewhat pessimistic. I think it would be fairly easy once wider range of cryptocurrency programmers start to appreciate floating point and https://en.wikipedia.org/wiki/Chaos_theory as an useful building blocks for the proof-of-work algorithms.

I've only skimmed the currently available literature on the subject, but it is next to trivial to demolish all the current claims of FPGA superiority that I was able to find today:

1) use double precision
2) use division or reciprocal (either accurate or approximate)
3) use square-root or reciprocal square-root (either accurate or approximate)

and I haven't even gotten into transcendental functions (on CPUs) or using later, pixel-oriented hardware in the shaders (on GPUs).

You did, however, motivated me to reconsider Altera/Quartus for certain future projects. They are now shipping limited, but fully hardware implemented single-precision floating-point in their DSP blocks and their toolchain had improved in terms of supported OS-es/device-drivers.
member
Activity: 531
Merit: 29
I suppose the release of Keccak first is purely for showing proof of concept? Because I don’t see it making more $12 per card.
member
Activity: 154
Merit: 37
Are you so sure about that? The floating point performance of modern FPGAs per Watt is much better than GPUs.  Even in the 28nm Virtex 7 days TFLOPs were roughly on-par, it’s neck and neck now and the next gen FPGAs are leading ahead on the AI/ half precision’s stuff. That Floating point performance gap was true several years ago but has rapidly closed since.

The types of instructions you’re listing also take many many clock cycles on GPUs and CPUs, and can almost always be implemented faster in FPGAs
I've never seen a honest comparison involving actual verification of accuracy, not even bit-accuracy. I've seen some very skewed benchmarks made with very ugly code that conflated/convolved FPU performance with memory bandwidth/latency limitations. https://en.wikipedia.org/wiki/False_sharing seems to be in fashion nowadays for obfuscation purposes.

Frequently the comparison don't even use the real floating-point but some extended-precision fixed-point in the inner loops because the original CPU/GPU implementation was just a generic library code versus carefully-optimized special-purpose code for the FPGA. It does make business sense, especially with regards to time-to-market; but I wouldn't call that science, even if published in the ostensibly scientific journal.

Do you recall where you've seen those comparisons?

I’ll see if I can dig up recent ones. A lot of people pull up the old CUDA vs FPGA academic papers that are focused on very old architectures.

As to GPU floating point performance, you don’t need a benchmark. The figures are right in the ISA documents. Single precision TFLOPs are usually given in terms of FMA unit operations though, which is a bit misleading.

The FPGAs are a bit harder to get TFLOPs numbers for given the flexibility,  it since most of the performance actually comes from the DSP blocks you can calculate those. If you’ve never read them Xilinx gives extremely detailed performance metrics for every chip for most IP blocks, as well as frequency numbers for the hard blocks in the AC/DC switching characteristic docs.  Agner Fog publishes a very detailed set of specifications for the performance of those units on most every CPU/APU available as well.

The main resource CPUs and GPUs have is instruction flexibility. Until a PoW hash truly requires most of the full instruction to be supported to implement it will be hard to keep out ASIC/FPGA.

legendary
Activity: 2128
Merit: 1073
Are you so sure about that? The floating point performance of modern FPGAs per Watt is much better than GPUs.  Even in the 28nm Virtex 7 days TFLOPs were roughly on-par, it’s neck and neck now and the next gen FPGAs are leading ahead on the AI/ half precision’s stuff. That Floating point performance gap was true several years ago but has rapidly closed since.

The types of instructions you’re listing also take many many clock cycles on GPUs and CPUs, and can almost always be implemented faster in FPGAs
I've never seen a honest comparison involving actual verification of accuracy, not even bit-accuracy. I've seen some very skewed benchmarks made with very ugly code that conflated/convolved FPU performance with memory bandwidth/latency limitations. https://en.wikipedia.org/wiki/False_sharing seems to be in fashion nowadays for obfuscation purposes.

Frequently the comparison don't even use the real floating-point but some extended-precision fixed-point in the inner loops because the original CPU/GPU implementation was just a generic library code versus carefully-optimized special-purpose code for the FPGA. It does make business sense, especially with regards to time-to-market; but I wouldn't call that science, even if published in the ostensibly scientific journal.

Do you recall where you've seen those comparisons?
member
Activity: 154
Merit: 37
FPGA can change to do the new algo in hours.
Until some day that one of those altcoin programmers discovers that his CPU has some interesting instructions like: FILD, FIST, FDIV, FSINCOS, etc.

Then the hours become months, and even after that the FPGA will have hard time beating even a cheap Atom CPU.

I tried to research exactly emulating 80x87 couple of years ago. There was exactly nothing available open source, with exception of an exact re-implementation of FDIV in Mathematica. The available closed-source code was not only expensive but had mandatory royalties and defense-grade NDA requirements.  

Are you so sure about that? The floating point performance of modern FPGAs per Watt is much better than GPUs.  Even in the 28nm Virtex 7 days TFLOPs were roughly on-par, it’s neck and neck now and the next gen FPGAs are leading ahead on the AI/ half precision’s stuff. That Floating point performance gap was true several years ago but has rapidly closed since.

The types of instructions you’re listing also take many many clock cycles on GPUs and CPUs, and can almost always be implemented faster in FPGAs
legendary
Activity: 2128
Merit: 1073
FPGA can change to do the new algo in hours.
Until some day that one of those altcoin programmers discovers that his CPU has some interesting instructions like: FILD, FIST, FDIV, FSINCOS, etc.

Then the hours become months, and even after that the FPGA will have hard time beating even a cheap Atom CPU.

I tried to research exactly emulating 80x87 couple of years ago. There was exactly nothing available open source, with exception of an exact re-implementation of FDIV in Mathematica. The available closed-source code was not only expensive but had mandatory royalties and defense-grade NDA requirements. 
Pages:
Jump to: