Pages:
Author

Topic: Official Open Source FPGA Bitcoin Miner (Last Update: April 14th, 2013) - page 19. (Read 432891 times)

full member
Activity: 196
Merit: 100
It's taken much longer than I thought it would, but I can run the testbench in the simulator and it successfully finds the (assumed) correct nonce  Shocked Grin

I've only just learnt that 'midstate' is deprecated, which this code requires.  Perhaps I can work towards adding that feature (later, when I work out what this code is doing!).

The test hash I used was already in the code;
Code:
uut.midstate_buf = 256'h228ea4732a3c9ba860c009cda7252b9161a5e75ec8c582a5f106abb3af41f790;
uut.data_buf = 512'h000002800000000000000000000000000000000000000000000000000000000000000000000000000000000080000000000000002194261a9395e64dbed17115;
uut.nonce = 32'h0e33337a - 256; // Minus a little so we can exercise the code a bit


I didn't completely understand your example hashes. Am I correct in thinking you've provided data:nonce pairs, where one is correct?  Or are they all correct?  Obviously I'll need to calculate the midstate myself if this is the case.

Thanks for your help! I finally feel like I'm getting somewhere.


They are ALL correct solutions for a single hash, just it makes it easier to test the code, instead of only having one hit point, specifically because you can test your communications algo. to see how it handles multiple nonce hits.

Sometimes when running a simulation, it destroys valuable results when you have to restart it to re-hit the 'nonce' on a single nonce solution.
newbie
Activity: 11
Merit: 0
It's taken much longer than I thought it would, but I can run the testbench in the simulator and it successfully finds the (assumed) correct nonce  Shocked Grin

I've only just learnt that 'midstate' is deprecated, which this code requires.  Perhaps I can work towards adding that feature (later, when I work out what this code is doing!).

The test hash I used was already in the code;
Code:
uut.midstate_buf = 256'h228ea4732a3c9ba860c009cda7252b9161a5e75ec8c582a5f106abb3af41f790;
uut.data_buf = 512'h000002800000000000000000000000000000000000000000000000000000000000000000000000000000000080000000000000002194261a9395e64dbed17115;
uut.nonce = 32'h0e33337a - 256; // Minus a little so we can exercise the code a bit


I didn't completely understand your example hashes. Am I correct in thinking you've provided data:nonce pairs, where one is correct?  Or are they all correct?  Obviously I'll need to calculate the midstate myself if this is the case.

Thanks for your help! I finally feel like I'm getting somewhere.
full member
Activity: 196
Merit: 100
I had anticipated a few warnings/info messages, but 14,000 synthesis messages about trimming FF/Latches made me think I'd done something wrong.  Is this as high as you'd expect?

I've been using the LX150_makomk_Test project as a start point.  If these warnings are to be expected, I'll replace the ChipScope stuff with something else for comms (one of the other projects uses serial, I'll probably copy that) and go from there.

My comment about simulation was more directed at the use of the testbench code to smoke test the code before I program it.  For the time being, I'm hoping there wont be much debugging required for the main algorithm!


Thanks for your help.

Don't sweat it....... the "trimming messages" are just the compiler doing its job.
As regards  running, I hope you realize that again... hardware programming in NOT like C programming, you DO NOT debug on hardware, you debug in the simulator then when it works, you move to the hardware.

A simulation can compile in just under 15 seconds an "un-patitioned" "un-blackboxed" SHA256(SHA256(x)) compiled for hardware is going to take several hours,or on a portable maybe a day to compile.....

here is a 4 solution set.
Next thing you need.... is to find a 'solution" and use that in the test bed code:
Code:
00000001f1fc17f04446c70a6946bdfc0c50addb0157839b8226c2af000001a9000000005e8f39668534c41c8a3322239d6c0cfaf41c222ca8cb863929b440d5c48f38e0509057ad1a0513c5
00000001f1fc17f04446c70a6946bdfc0c50addb0157839b8226c2af000001a9000000005e8f39668534c41c8a3322239d6c0cfaf41c222ca8cb863929b440d5c48f38e0509057ad1a0513c5:b733aa28
00000001f1fc17f04446c70a6946bdfc0c50addb0157839b8226c2af000001a9000000005e8f39668534c41c8a3322239d6c0cfaf41c222ca8cb863929b440d5c48f38e0509057ad1a0513c5:1526ba33
00000001f1fc17f04446c70a6946bdfc0c50addb0157839b8226c2af000001a9000000005e8f39668534c41c8a3322239d6c0cfaf41c222ca8cb863929b440d5c48f38e0509057ad1a0513c5:c885e546
00000001f1fc17f04446c70a6946bdfc0c50addb0157839b8226c2af000001a9000000005e8f39668534c41c8a3322239d6c0cfaf41c222ca8cb863929b440d5c48f38e0509057ad1a0513c5:5b9c067f

Basically you load the hash into the test bench and set the nonce just below one of the solution values, then run the simulation.
You really do not want to scan the nonce from 0x00000000 to 0xffffffff, because the simulation will take forever to run.
Also watch the endian of the values above.
newbie
Activity: 11
Merit: 0
I had anticipated a few warnings/info messages, but 14,000 synthesis messages about trimming FF/Latches made me think I'd done something wrong.  Is this as high as you'd expect?

I've been using the LX150_makomk_Test project as a start point.  If these warnings are to be expected, I'll replace the ChipScope stuff with something else for comms (one of the other projects uses serial, I'll probably copy that) and go from there.

My comment about simulation was more directed at the use of the testbench code to smoke test the code before I program it.  For the time being, I'm hoping there wont be much debugging required for the main algorithm!


Thanks for your help.
legendary
Activity: 2128
Merit: 1068
So, the next question is: will any of the projects run for me out of the box, or at least compile with minimal tweaking?  I don't mind it being in Verilog, but I don't like giving up, so want to see one of these boards mining!

What project file would you suggest I use as a staring point?  I tried the "LX150 makomk....." projects, but they resulted in thousands of warnings (just opened the project and clicked the build button).
I never had any of the SLX150 boards that were used in those projects. I built some of them with minor changes on VLX240; but I don't recall which ones. This design is very flexible: you can roll the hashers by increasing CONFIG_LOOP_LOG2 parameter until it fits in your Spartan. You shouldn't worry about warnings; there is no way to completely shut them down in ISE even for a perfect design.

This project isn't friendly for simulation. The I/O protocol would need to be changed to give beginning and end of the nonce range to search. Otherwise simulating the search through the 2^32 nonce values just takes too much real time.
newbie
Activity: 11
Merit: 0
I've grabbed the works laptop for the long weekend to see if I can get something to build.  They're unlikely to lend me any of their dev/debugging hardware (and no spare licenses for their decent simulation software), but if I can go in armed with a bit file to program in next week it would be a good start.  There are a few test headers I can rig up for comms.

So, the next question is: will any of the projects run for me out of the box, or at least compile with minimal tweaking?  I don't mind it being in Verilog, but I don't like giving up, so want to see one of these boards mining!

What project file would you suggest I use as a staring point?  I tried the "LX150 makomk....." projects, but they resulted in thousands of warnings (just opened the project and clicked the build button).


Quote
You can also restore the competitive element for yourself and make a first Litecoin FPGA hasher.
Once I manage to get someone elses bitcoin code running then maybe you're right.  Annoyingly there is a decent amount of RAM on this board, it's connected via a relatively slow ARM processor.


Thanks!
full member
Activity: 196
Merit: 100
Ah, ok.  Thanks for the info.  I'll abandon that challenge for the time being then.  I think I will still try to get the Verilog working so I can run it at weekends Wink

Are you trying to learn VHDL knowing Verilog or from scratch?
VHDL from scratch.  I write embedded software (ASM, C, C++) for a living, so the language syntax is easy enough.  The bit I'm struggling with at the moment is exactly what makomk has said - tailoring the HDL to the FPGA.  I still think in terms of high level code that has the correct behaviour in the simulator, not how best to utilise the available slices, flip flops, block rams etc.  With any luck this will come with time.  It seems that getting the VHDL working would be an interesting challenge when I start understanding things.

The issue is not the "syntax" but rather the complete change in the mindset of hardware programming.
If you approach hardware programming the same way you approach C programming, then it will be a fail and you will spend you life in a mental hospital wondering WHY it did not work.

Certainly when you code something like

       noverheat <= NOT overheat;
   reset <= reset_count(reset_count'high);
   fpga_0_LEDs_8Bit_GPIO_IO_pin(0) <=NOT overheat;

From a C perspective you would say WTF it should be:

fpga_0_LEDs_8Bit_GPIO_IO_pin(0) <=noverheat ;

So that I can save  logic.......

But then you discover that in VHDL in this situation:
1. you CANNOT guarantee WHICH order the statements are executed in ,even they are in top down format......

2.That
fpga_0_LEDs_8Bit_GPIO_IO_pin(0) <=NOT overheat;
may actually get executed before or at the same time as:
noverheat <= NOT overheat;

it all depends on the FPGA chip being used and the compiler options, so if you have flaky code that just about functions on one chip, changing the chip can make it non functional even when it simulates correctly.

Then you have the completely GASH Xilinx tools, written in all their memory leaking java glory........
legendary
Activity: 2128
Merit: 1068
VHDL from scratch.  I write embedded software (ASM, C, C++) for a living, so the language syntax is easy enough.  The bit I'm struggling with at the moment is exactly what makomk has said - tailoring the HDL to the FPGA.  I still think in terms of high level code that has the correct behaviour in the simulator, not how best to utilise the available slices, flip flops, block rams etc.  With any luck this will come with time.  It seems that getting the VHDL working would be an interesting challenge when I start understanding things.
I sincerely wish you good luck, but this project isn't a good starting assignment for a beginner. The competitive motivation element (bounties, etc.) is already almost gone. What you have now is just a combination of workarounds for the deficiences in the Xilinx toolchain; e.g. the use of 512-bit vectors where 16-element array of 32-bit vectors would produce much cleaner code. This skill has a value now, but Xilinx will eventually fix it in some future release and the skill would start to look cargo-cult-ey.

Also, in your past experience, how often have you faced a problem where you could drop half of the valid results and the project would still appear to work and be valuable?

I'm not trying to discourage you at all from working on a miner, just set yourself appropriate goals; e.g. use a comm protocol with CRC so you'll know the actual BERT of your transport.

You can also restore the competitive element for yourself and make a first Litecoin FPGA hasher.

Again: good luck.
newbie
Activity: 11
Merit: 0
Ah, ok.  Thanks for the info.  I'll abandon that challenge for the time being then.  I think I will still try to get the Verilog working so I can run it at weekends Wink

Are you trying to learn VHDL knowing Verilog or from scratch?
VHDL from scratch.  I write embedded software (ASM, C, C++) for a living, so the language syntax is easy enough.  The bit I'm struggling with at the moment is exactly what makomk has said - tailoring the HDL to the FPGA.  I still think in terms of high level code that has the correct behaviour in the simulator, not how best to utilise the available slices, flip flops, block rams etc.  With any luck this will come with time.  It seems that getting the VHDL working would be an interesting challenge when I start understanding things.
hero member
Activity: 686
Merit: 564
Hello,

I've managed to get access to a Spartan6 150 FPGA board and Xilinx toolset at work at lunch/evenings to play with.  The idea is to learn VHDL, which is going OK.  I'd really like to have a play with Bitcoin too.

I hope there is someone here that can help me with regards to TheSeven's Xilinx VHDL implementation.  If not, do you know where can I get help?

I'm afraid getting miners running on Spartan-6 is kind of hairy and really requires HDL tailored specifically to that chip. TheSeven's Xilinx VHDL implementation isn't going to be, and ztex's Spartan-6 mining core which everyone uses is in Verilog rather than VHDL.
legendary
Activity: 2128
Merit: 1068
I've managed to get access to a Spartan6 150 FPGA board and Xilinx toolset at work at lunch/evenings to play with.  The idea is to learn VHDL, which is going OK.  I'd really like to have a play with Bitcoin too.
I think most of the people worked on the Verilog version, not the VHDL one. I wouldn't be surprised if the VHDL version never implemented correctly on Spartan chips, but maybe on Virtex-es only.

Start your play with the versions that have a top-level ISE project file: *.xise.

Are you trying to learn VHDL knowing Verilog or from scratch?
newbie
Activity: 11
Merit: 0
Hello,

I've managed to get access to a Spartan6 150 FPGA board and Xilinx toolset at work at lunch/evenings to play with.  The idea is to learn VHDL, which is going OK.  I'd really like to have a play with Bitcoin too.

I hope there is someone here that can help me with regards to TheSeven's Xilinx VHDL implementation.  If not, do you know where can I get help?

I've looked back through this topic and have seen something similar, but the response didn't help me.  When I synthesise the project, I get the following message for every stage (i.e. If DEPTH = 0, I get it once.  If DEPTH=6, I get it 64 times).

Quote
Xst:3031 - HDL ADVISOR - The RAM will be implemented on LUTs either because you have described an asynchronous read or because of currently unsupported block RAM features. If you have described an asynchronous read, making it synchronous would allow you to take advantage of available block RAM resources, for optimized device usage and improved timings. Please refer to your documentation for coding guidelines.


I think the problem is caused by the following code in sha256_pipeline.vhd;
Quote
rounds: for i in 0 to 2 ** DEPTH - 1 generate
   signal round_k : std_logic_vector(31 downto 0);
   signal round_w : std_logic_vector(511 downto 0);
   signal round_s : std_logic_vector(255 downto 0);

begin
   round_k <= K(i * 2 ** (6 - DEPTH) + conv_integer(step));
My understanding is that round_k is not set on a clock pulse, and therefore gets treated as asynchronous.  The result of this is that a fully unrolled (DEPTH=6) implementation does not fit on the 150 device (I've read it should).

What am I doing differently to everyone else?  I haven't seen mention of any errors/warnings/info by others that have used the code.


In addition, are there any warnings that I should expect to see? (I also get warnings that txdata/txwidth will be optimised away - although I haven't looked at that code in detail yet).


The only difference between the download and what I'm running is that I've had to change the DCM (I created a new one using CoreGen as the old one generated errors).


Thank you very much for any help.
sr. member
Activity: 384
Merit: 250
The issue is that a $2 regulator "protects" itself but trashes the FPGA in the process.

The FPGA dumps the extra heat because the regulators voltage drifts and even a 0.1v increase in the  core FPGA voltage can cause a significant increase in the wattage dumped by the FPGA.

The only way to make money with this, it to find a market that dumps scrap telecom boards. Sometimes they are loaded with FPGA's.

Reasons to continue? , purely for the educational value because by trying things that Don't work it is an excellent insight into the hardware
Same reason why I won't get "rich" mining off a  XUPV5, but I continue to improve the code.

Ah, thanks for that. I can see what you mean now. I had assumed that the regulators would simply enter thermal shutdown if they overheat, but you're concerned about the stage before that where the regulators continue to operate, but their output drifts out of spec. Obviously in a professionally designed system this would be taken into account and the heatsinking sized accordingly, but I can see that in "overdriving" the DE0-Nano this can certainly be an issue (the manual claims the PSU is good for 1.5 amps, but they're just quoting the regulator specs there, and from the temperature they get to at the 800mA or so I'm using, I can see why it would not be wise to push it much further). Anyway I've got a fan blowing air over the board so this will help somewhat, and I'm not too concerned about frying a £70 ($110) board. One of the LX150 mining boards would be another matter entirely, but I'm not going there as I just can't see any way that these can make any money (the payback time is way too long, and with ASIC's perhaps, maybe, sometime, coming to the table, its just not worth the risk).

I'm with you on the scrap approach as I had considered this myself. The only problem is reworking the BGA packages as this is not really possible without expensive SMD rework gear (and quite a bit of skill). I was able to reflow a 144 pin TQFP package (only a cheap 10k part, so I was not bothered if I trashed it) quite successfully using a DIY approach with an IR lamp (actually a 150W floodlamp) as a heat source, but that is not going to work with BGA's. Anyway you'd really want professional multilayer PCB's to get the power distribution done properly, rather than the sort of things I can knock up in my "workshop". Not to mention sourcing the scrap parts in the first place!

So we're back to the educational value, and I've certainly got my money's worth there. Not sure where I'm going with it, but sometimes the journey itself is the destination, and at my stage of life there is no need to justify what I spend my time on!

TTFN
Mark
full member
Activity: 196
Merit: 100
The issue is that a $2 regulator "protects" itself but trashes the FPGA in the process.

The FPGA dumps the extra heat because the regulators voltage drifts and even a 0.1v increase in the  core FPGA voltage can cause a significant increase in the wattage dumped by the FPGA.

The only way to make money with this, it to find a market that dumps scrap telecom boards. Sometimes they are loaded with FPGA's.

Reasons to continue? , purely for the educational value because by trying things that Don't work it is an excellent insight into the hardware
Same reason why I won't get "rich" mining off a  XUPV5, but I continue to improve the code.
sr. member
Activity: 384
Merit: 250
Hi HC

No contest there, the regulator is just protecting itself (to some degree anyway, as higher junction temperatures will still degrade the device faster), not its client.

Not so sure about your second comment about the FPGA dumping wattage for no reason. I assume you mean the system as a whole, rather then the device itself. So yes, an efficient switch mode/buck converter is certainly to be preferred over a linear regulator, though in the case of the DE0-Nano, its meant as an educational tool, so I guess efficiency is not its main purpose.

I'm actually pretty amazed at the cost effectiveness of the little beast. In small quantities, at least, the entire board is hardly more expensive than the bare FPGA itself, given any reasonable cost of building something with it. Unfortunately its not much use as a bitcoin miner (I've been running one full time for about a month now, for educational purposes of course, and I'm only up to about 0.1BTC, not much more than the cost of powering it).

I had a look at the more commercial offerings (mostly LX150 based), and to be honest, I've concluded that there really is no money to be made here. So I'm not entirely sure why I'm pottering along with this project, but it's something to keep me busy I suppose. Not really made that much progress beyond my last posting. The Raspberry Pi is doing a sterling job as a host to one (sometimes a pair) of DE0-Nano's running at 12.5 MH/s apiece (the most I can reliably get out of them at the moment). The freezer experiments are still pending (I took your cautions to heart, so I won't risk a Nano on that), but I've built myself a test board with a TQFP which I'm more inclined to push the limits on (fun was had finding a way to get it programmed (I wasn't going to shell out on the overpriced official programmer), so a Nano was co-opted to that purpose, not without much grief getting it debugged, but these are versatile devices and its now the ward of said Nano).

I'm rather running out of steam on this now, so I'll probably look to some other uses for these rather fascinating devices. Many thanks for you help (and Makomk too).
Mark.
full member
Activity: 196
Merit: 100


Re the regulators, I checked the datasheet and they are thermally protected (just about everything is these days), so no harm letting them run hot.


This is NOT correct, just because the Regulator is thermally protected DOES NOT mean the voltage it is supplying to the FPGA is in range!!!

Not to mention it causes the FPGA to dump extra wattage for NO reason!!!, thereby severely aging the device.

newbie
Activity: 7
Merit: 0
I've got this running on a regular university educational DE2 board using the DE2-70 project.  All I had to do was set right FPGA as the device, CONFIG_LOOP_LOG2 to 3, the PLL clock multiplier to 2 to give 100MHz and it compiles just fine and cranks out 12.5 MH/s.  It overheats after a while if the multiplier is set to 3 but with passive cooling it should work fine.
sr. member
Activity: 384
Merit: 250
Another quick update ...

I've got a DE0-Nano running using a Raspberry Pi as a host (no usb, just a serial connection via the GPIO pins and an opto-coupler for peace of mind). Using the LX150_makomk_Test branch merged with code from DE2_115_makomk_serial, I'm getting 15MH/s at 120MHz clock, drawing almost exactly 1000mA from a 3.3V power brick, which makes it 3.3W (excluding PSU losses, though I am using a switcher). So if someone wants to update the mining hardware comparison page on bitcoin.it, I make that 4.5 MH/s/Joule, though with a dedicated 1.2V switch mode supply for the core rail that should improve to around 12.5MH/s/Joule.

It seems to be quite stable (generated 14 shares in the last hour or so), though the serial interface needs some serious work as its not fault-tolerant. I intend to modify it to allow a (small) farm of fpga's to connect via a single serial port, so that the host polls the fpga's for results, which should be possible using a very simple interface (just or (well, actually and as its active low) all of the fpga TxD lines together and rely on them not talking until they've been spoken to). The onboard dip switches will make themselves useful for the device addresses.

It still sucks as a money maker though (I reckon about 3 years just to pay for itself at *current* rates, so basically never), so I wouldn't advise anyone to build a farm with them!

Mark
sr. member
Activity: 384
Merit: 250
I've now started looking at the code in the DE2_115_makomk_mod branch, but I've hit a problem. The code compiles fine at CONFIG_LOOP_LOG2=2, 3 and 4 but its producing the wrong hashes (I'm just running at 40MHz for testing, not full blast) ... the mine.tcl script submits hashes to the pool, but they are all rejected!
Yeah, that branch doesn't work with CONFIG_LOOP_LOG2!=1. You probably want http://www.makomk.com/gitweb/?p=Open-Source-FPGA-Bitcoin-Miner.git;a=summary de0-nano-hax branch, projects/DE2_115_Unoptimized_Pipelined project. The voltage regulators are also indeed horribly inefficient on the DE0-nano.

Many thanks for the info, I will take a look at it when I can make time, things got awfully busy since my last post.

If you've read my other posts, you note that I'm having good success with the xilinx branch LX150_makomk_Test which works great on the DE0 (after a bit of tweaking to reinsert the JTAG probes, and a bugfix on GOLDEN_NONCE_OFSET), it just needs extending a bit (multicore) to maximise the device utilization (hardcore-fs has been of great help here), but I can see all the hard work has already been done by yourself in the other branches, so I just need to get acquainted with the code.

Re the regulators, I checked the datasheet and they are thermally protected (just about everything is these days), so no harm letting them run hot. I've been monitoring the core rail, and its rock-steady at 1.21 volts no matter what I thow at it, which is a shame as I spent most of today building an auxiliary power supply for the core rail ... unfortunately I only had a LM317 to hand, which is piss-poor at this voltage (it droops from 1.25V quiescent to 1.0V at 1 amp), so I'm just going to have to get hold of something a bit more sensible (actually the DE0 LP385005D regulator is not bad, it just needs a more sensible input voltage, I'm currently looking at 3.3V switch mode PSU's, which are quite cheap).

The same cannot be said of the 3.3V rail, which sags below 3V when driving the EP4CE22F17C8N hard. I'm getting random USB/JTAG dropouts which I think I can firmly blame on this. No matter, I'll be switching to the serial interface as I intend to use a Raspberry pi as a host (no point running an 80W laptop just to babysit the FPGA's, the pi only draws a couple of watts and should do the job nicely).

Anyway I've got to get on, I picked up a SSD drive to see if it'll speed things up a bit in Quartus, so I'm going to spending the next day or so reinstalling, and upgrading from Vista to Windows8 (at £25 its a no-brainer, even though the "interface formerly known as metro" looks shite! Good thing you can (mostly) turn it off).

Again, thanks for the reply.
Mark
hero member
Activity: 686
Merit: 564
I've now started looking at the code in the DE2_115_makomk_mod branch, but I've hit a problem. The code compiles fine at CONFIG_LOOP_LOG2=2, 3 and 4 but its producing the wrong hashes (I'm just running at 40MHz for testing, not full blast) ... the mine.tcl script submits hashes to the pool, but they are all rejected!
Yeah, that branch doesn't work with CONFIG_LOOP_LOG2!=1. You probably want http://www.makomk.com/gitweb/?p=Open-Source-FPGA-Bitcoin-Miner.git;a=summary de0-nano-hax branch, projects/DE2_115_Unoptimized_Pipelined project. The voltage regulators are also indeed horribly inefficient on the DE0-nano.
Pages:
Jump to: