Pages:
Author

Topic: Modular FPGA Miner Hardware Design Development - page 26. (Read 119276 times)

hero member
Activity: 686
Merit: 564
I just realised something when I read makomk previous post (thanks for the info, by the way!): the unit MHash/s is not self-explanatory! To do one check if a nonce is golden or not, you need two calculations of a SHA-256 hash. When I gave my synthesis results previously, I interpreted 1Hash as one calculation of sha(sha(.)). Does everyone do the same or does your unit equate 2Hash to sha(sha(.)) ?
It seems to be standard to list the number of MHash as the number of total sha256(sha256(data)) operations just like you did - certainly that's what I've been doing. I get the impression this dates back to the early days of bitcoin. It's possible others haven't been doing it this way of course.

Edit: Also, after a slightly tedious 4-hour build process, Fmax=109.29MHz and 97% resource usage for the fully-unrolled DE2_115_makomk_mod on the EP4CE75F29C7. Might be able to get it up to 110MHz with the right options, but I wouldn't bet on it.
member
Activity: 70
Merit: 10
I just realised something when I read makomk previous post (thanks for the info, by the way!): the unit MHash/s is not self-explanatory! To do one check if a nonce is golden or not, you need two calculations of a SHA-256 hash. When I gave my synthesis results previously, I interpreted 1Hash as one calculation of sha(sha(.)). Does everyone do the same or does your unit equate 2Hash to sha(sha(.)) ?
hero member
Activity: 686
Merit: 564
But seriously: isn't there someone who can give us some info on chip performance to wrap up this discussion? What I gave here is using different code and does not contain all chips of interest. Especially missing are the Altera EP4CE75F23C7 and the Xilinx XC6SLX75-3CSG484C through XC6SLX150-3CSG484C. The Altera and largest Xilinx are roughly comparable in price and the smaller Xilinx is the best that can be compiled with their free software.
I think I managed to compile a 100 MHash/s design for the EP4CE75F23C7, though no-one has one to test it on and the device was almost totally full so I'm not sure if you'd be able to fit any extra control logic in. Bear in mind that the last digit is the speed grade (lower is faster for Cyclone IV). You can try it for yourself - fpgaminer committed my modified version in projects/DE2_115_makomk_mod, just change which device it targets and the clock speed. If you're careful and do the design right you could probably build a PCB that supported both the 75 and 115.

Edit: Also,
Ok well if you need some direction let me say that unless you are choosing the largest cyclone iv or spartan 6 device you are probably wasting your time.  You will need to put up with the issue that a license is required to perform compiles.  This can be overcome by people with licenses volunteering to perform compiles.
The free tools support all Cyclone I, II, III and IV devices, it's just the other ranges of FPGAs that are limited.
member
Activity: 70
Merit: 10
Cheaper just to choose one.  Looks like LX150 is probably the best bet, but would be nice to have some compilation results.
sr. member
Activity: 410
Merit: 252
Watercooling the world of mining
Would you consider it feasible to use both  the Spartan 6 Lx 150 (~130€) and an altera FPGA eg the Cyclone IV E 75k (~175) or The Cyclone IV GX 110K (~214€) on the first prototype stage?
It may require a different voltage supply for each of them but i will look into that.

I think it this would be a good chance to balance and test the different FPGA's. So we may develop a individual optimal software solution for each chip.Afther that we may decide on a final one for the series.


I hope on monday i will get the chance to negotiate with Xilinx about the software problem.
member
Activity: 70
Merit: 10
Ok well if you need some direction let me say that unless you are choosing the largest cyclone iv or spartan 6 device you are probably wasting your time.  You will need to put up with the issue that a license is required to perform compiles.  This can be overcome by people with licenses volunteering to perform compiles.
legendary
Activity: 1270
Merit: 1000
For the numbers, i would take the the unused resource into acount.
I have a EP3C25 and a EP2C35, both running a 8 stage pipeline that requires 8 clock cycles. If i could extend this scheme to a 12 stage long pipeline, this would only require 6 cycles which are 25% more MHash if there is no impact on the clock cycle length.
member
Activity: 70
Merit: 10
We still need a decision on which FPGA to use. As there has been no new data, I thought of at least copying infos from elsewhere Wink. I took the performance data from the bitcoin wiki and changed the price to what just the chip costs (not the dev board price as stated in the linked table). Price may not be valid for single unit, FPGA has been substituted for cheapest comparable alternative with smallest package.

ChipRate [MHash/s]Power [W]Price [EUR]Rate/Price [MHash/s/EUR]Rate/Power [MHash/J]
Altera EP4CE115F23C7N804.4303.690.26318.2
Altera EP4CE115F23C7N109-303.690.359-
Xilinx XC5VLX110-1FFG676C120-1126.510.107-
Xilinx XC3S500E-5CPG132C3.1250.7820.380.1534

But seriously: isn't there someone who can give us some info on chip performance to wrap up this discussion? What I gave here is using different code and does not contain all chips of interest. Especially missing are the Altera EP4CE75F23C7 and the Xilinx XC6SLX75-3CSG484C through XC6SLX150-3CSG484C. The Altera and largest Xilinx are roughly comparable in price and the smaller Xilinx is the best that can be compiled with their free software.
newbie
Activity: 25
Merit: 0
Please stop talking off-topic about bruteforcing bitstreams with gpus (that means, stop talking about this here). this thread is about the hardware part.

I'm sure that we'll find someone who could get us those bitstreams, if special software is needed. Please focus on the hardware part.
hero member
Activity: 686
Merit: 564
I don't suppose the rules for generating a bitstream are documented?

I don't think it is exactly rocket science. It would be of comparable difficulty to writing a compiler. Obviously from the CPU time used, these tools brute force many possibilities.
Harder than rocket science, I think. Not only is the bitstream format totally undocumented, but the algorithms required to map a design to an FPGA effectively are apparently really hairy - which is why a the tools are slow and often tempramental. I hear simulated annealing is quite popular for the actual place-and-route stage...

Edit: Oh, and of course if you generate an incorrect bitstream you'll probably blow up your expensive FPGA.
legendary
Activity: 1008
Merit: 1001
Let the chips fall where they may.
It is probably an NP-Hard problem like the traveling salesman problem. The Software has to decide which traces to place where to get the shortest routing (allowing higher clock speeds). If lots of forking is involved, CPUs may be better at it (I don't know).

I don't think I have heard of GPU-accelerated compilers yet.
legendary
Activity: 2940
Merit: 1090
Brute force? What kind of brute force? GPU-amenable brute force?

-MarkM-
legendary
Activity: 1008
Merit: 1001
Let the chips fall where they may.
I don't suppose the rules for generating a bitstream are documented?

I don't think it is exactly rocket science. It would be of comparable difficulty to writing a compiler. Obviously from the CPU time used, these tools brute force many possibilities.
legendary
Activity: 2940
Merit: 1090
Then don't offer such a service.

Keep it as an in-house capability useable only by employees and maybe shareholders of the GLBSE-listed concern that owns the license. Wink

-MarkM- (P.S. no maybe about it, shareholders are owners so it is theirs to use...)

P.P.S. No, wait, maybe current owner doesn't want to donate/sell it to the concern. Can employees use their copy at work?
hero member
Activity: 686
Merit: 564
I'm sure that someone in the community would love to set up such a service, where you upload your design, get back the bitstream a couple of hours later if it succeeded, and pay per processing time. This kind of FPGA synthesis pool would make sense anyway, as you probably won't want to run SmartXPlorer on a single machine of your own and wait for days until you have a reasonably-optimized design. Parallelizing this and pooling it to increase usage/efficiency certainly makes sense.
There's already a company called Plunify that's in the process of setting up exactly this service. Last I heard, Xilinx were being annoying about licensing though - they won't even let them offer the free WebPack functionality, let alone anything more powerful.
hero member
Activity: 504
Merit: 500
FPGA Mining LLC
This software is only needed for generating the bitstream, in theory it would be sufficient to have somebody  generating the bitstream for you, but i don't know how much such a service would cost.

If you are a student you could also try out the CS oder EE laboratory Smiley


I'm sure that someone in the community would love to set up such a service, where you upload your design, get back the bitstream a couple of hours later if it succeeded, and pay per processing time. This kind of FPGA synthesis pool would make sense anyway, as you probably won't want to run SmartXPlorer on a single machine of your own and wait for days until you have a reasonably-optimized design. Parallelizing this and pooling it to increase usage/efficiency certainly makes sense.
legendary
Activity: 1270
Merit: 1000
This software is only needed for generating the bitstream, in theory it would be sufficient to have somebody  generating the bitstream for you, but i don't know how much such a service would cost.

If you are a student you could also try out the CS oder EE laboratory Smiley
sr. member
Activity: 410
Merit: 252
Watercooling the world of mining
Doubly-unrolled should be more efficient, as the price difference between LX75 an LX150 is way less than a factor of 2.

I  would love to use the Lx150 if it is possible.But i found a quote that the full size Xilinx software needed for it would be around 2000 $ a year.
If a (expensive) payware is needed for programming and running the FPGA, it would certainly kill the idea of the open source Miner plattform open to everyone .

Maybe someone could try to verify the price for the software needed for using the SP6 LX150.A rate for it less than 100 € or similar might be bearable.
hero member
Activity: 686
Merit: 564
Can you send me that design? I'd like to validate it. I even failed with a singly-unrolled one on an LX100.
Nevertheless, even if it would run at 100MHz, that would probably not be worth it. Doubly-unrolled should be more efficient, as the price difference between LX75 an LX150 is way less than a factor of 2.
Hopefully this archive should have all the bits you need. It won't actually run like that, because there are no pin assignments and the PLL speed setting is all wrong, but with a bit of luck you should be able to coax it into doing something. Unfortunately it's rather dependent on the right build settings, and I'm not even sure if they've copied over from SmartXplorer properly, let alone to that archive.

Edit: Oh, and don't try changing LOOP_LOG2 from 0 with that code; it won't work correctly.
hero member
Activity: 504
Merit: 500
FPGA Mining LLC
Can you send me that design? I'd like to validate it. I even failed with a singly-unrolled one on an LX100.
Nevertheless, even if it would run at 100MHz, that would probably not be worth it. Doubly-unrolled should be more efficient, as the price difference between LX75 an LX150 is way less than a factor of 2.
Pages:
Jump to: