Pages:
Author

Topic: Official Open Source FPGA Bitcoin Miner (Last Update: April 14th, 2013) - page 23. (Read 432966 times)

legendary
Activity: 2128
Merit: 1073
If I didn't use the DSP48s, I could only fit 2 copies of the unrolled code.  I didn't try to optimize the code in any other way.
Thank you very much for your valuable input. If you have a moment, could you please post a snippet of HDL code that shows how you convinced ISE DS to use DPS48s for adders? Does ISE have some flag to make it infer DSP48s from additions? Or did you have to explicitly instantiate them?

Since my last post in this thread I learned a lot about ISE software. The license is node-locked to the Ethernet MAC address using standard FlexLM technology. So it allows for designing on one system and running the design on another system. I was afraid of a node-locking technology that would require connecting the ML605 board to the system that runs ISE to allow it to check the license.

Also, would you dare to speculate what will be the initial pricing on the Kintex-7 KC705 evaluation kit? I hesitate to buy ML605 right now because I could not really start working on it immediately due to the need to reorganize and remodel my physical workspace. On the other had I'm completely fascinated with contemporary FPGA design after a long break from doing any hardware-oriented design.
newbie
Activity: 35
Merit: 0
I have a quick question for those familiar with Xilinx Virtex-6 chips:

I have a different application that requires double SHA256 of a 256-bit string, thus the typical mining optimizations don't apply. Would the fully unrolled all-combinatorial-logic hasher fit into the XC6VLX240T-1FFG1156 that is included in the ML605 evaluation board? Please disregard any speed issues. At this moment I'm only concerned with the correctness of the implementation and being able to use my old VHDL files. The goal is to reproduce the defects in some faulty silicon of historical value.

With quite a difficulty I installed evaluation ISE_DS on my Ubuntu 10.04.3 and even managed to start the Xilinx FPGA Editor that uses old Motif libraries. But attempting to do any implementation on Virtex-6 device on my 4GB RAM laptop is hopeless; it goes deeply into swapping storm.

I tried to understand the modifications that somebody made to get this miner run on ML605. It apparently had 3 hashing cores, but I'm unclear if the DSP48E1 use was a requirement or choice. I'm also unclear if the 3 hashing cores were 3*single-SHA256 or 3*double-SHA256.

Thanks for any pointers you may have.


Hi,

I am the one who put 3 copies of the fully unrolled cores in the LX240T on the ML605.  I had to use the DSP48Es to get it to fit.  If I didn't use the DSP48s, I could only fit 2 copies of the unrolled code.  I didn't try to optimize the code in any other way.  So, the answer to your question is yes, there should be no problem with a single instance of the double SHA256 core fitting into the ML605.

I also found that more than 4 GB memory was used when building with 3 copies of the fully unrolled code.  I'm using some older version of red hat for my development enviroment.  If you're running a 64 bit version of the application, upgrading to 8 GB of memory should get you going just fine.

As far as your ISE license question, I think the ISE might only be licensed to produce bitstreams for the specific device on the ML605.
legendary
Activity: 2128
Merit: 1073
Modifications are small so, if one fully unrolled core (double SHA256) have been packed to Spatan6 LX150, there shouldn't be ANY problem to fit into V6, even without using DSP slices.
Thanks for the encouragement. I managed to do an implementation on VLX240T with LOOP_LOG2=1 on my 4GB laptop. It has about 33% utilization of SLICEs, design strategy was "Runtime reduction with multithreading" and the swapping wasn't that bad.

So the next question I have is: would I be able to keep comfortably doing my trial VLX240T designs if I upgrade my laptop to 8GB of RAM? Or would the experienced people rather suggest that I dedicate a desktop machine with 8-16GB RAM and a PCIe slot for my ML605 experiments? Xilinx says that the ISE_DS will be nodelocked to the actual Virtex-6 chip, I presume through the USB cable, right?
 
legendary
Activity: 2128
Merit: 1073
I haven't looked in FPGA Editor because the Linux version is a pain in the ass to actually run on my distro. It uses some grotty ancient Motif-based wrapper library that emulates some prehistoric version of the Windows API and requires deep magic to get it to launch.)
For the future reference: running ISE_DS' fpga_editor on Ubuntu Lucid requires the following magic incantations:

1) install old versions of libstdc++ from Ubuntu Hardy:
    libstdc++5_3.3.6-15ubuntu4_i386.deb libstdc++5_3.3.6-15ubuntu4_amd64.deb

2) install libmotif3 and libmotif-dev for your main architecture. If your Ubuntu is 64-bit then
    you may also want to install the 32-bit versions, as the Motif libraries aren't built with multilib:
    libmotif3_2.2.3-4_i386.deb libmotif-dev_2.2.3-4_i386.deb

3) change the DISPLAY environment variable to use the non-multiscreen format
    DISPLAY=:0

Instead of fighting with whatever user-firendly package management tools you are using you have an option of instaling the few releveant *.so and *.a by hand:

a) mkdir temp; cd temp
b) ar xv ../whatever.deb
c) tar xzvf data.tar.gz
d) sudo mv usr/lib/*.{so*,a} /usr/lib (when installing libraries matching the system)
e) sudo mv usr/lib/*.{so*,a} /usr/lib32 (when installing 32-bit libraries on the 64-bit system)
f) cd ..; rm -rf temp

Seems like 64-bit FPGA editor isn't starting cleanly on 64-bit Ubuntu and gives the following dynamic linking warnings:

.../bin/lin64/_fpga_editor: Symbol `_XtperDisplayList' causes overflow in R_X86_64_PC32 relocation
.../bin/lin64/_fpga_editor: Symbol `_XtGetPerDisplayInput' causes overflow in R_X86_64_PC32 relocation

but it appeared to operate correctly during my short inspection. I nonetheless installed the required 32-bit libraries on my 64-bit system and the 32-bit FPGA editor starts without any complains.

Oh, and the last thing: ISE seems to have hardcoded the Acrobat as a PDF reader. The quick workaround is:

cd /usr/bin; sudo ln -s evince acroread

No need to restart the Project Navigator.
legendary
Activity: 1029
Merit: 1000
Modifications are small so, if one fully unrolled core (double SHA256) have been packed to Spatan6 LX150, there shouldn't be ANY problem to fit into V6, even without using DSP slices.
legendary
Activity: 2128
Merit: 1073
I have a quick question for those familiar with Xilinx Virtex-6 chips:

I have a different application that requires double SHA256 of a 256-bit string, thus the typical mining optimizations don't apply. Would the fully unrolled all-combinatorial-logic hasher fit into the XC6VLX240T-1FFG1156 that is included in the ML605 evaluation board? Please disregard any speed issues. At this moment I'm only concerned with the correctness of the implementation and being able to use my old VHDL files. The goal is to reproduce the defects in some faulty silicon of historical value.

With quite a difficulty I installed evaluation ISE_DS on my Ubuntu 10.04.3 and even managed to start the Xilinx FPGA Editor that uses old Motif libraries. But attempting to do any implementation on Virtex-6 device on my 4GB RAM laptop is hopeless; it goes deeply into swapping storm.

I tried to understand the modifications that somebody made to get this miner run on ML605. It apparently had 3 hashing cores, but I'm unclear if the DSP48E1 use was a requirement or choice. I'm also unclear if the 3 hashing cores were 3*single-SHA256 or 3*double-SHA256.

Thanks for any pointers you may have.
member
Activity: 89
Merit: 10

Sorry, Imma cross posting biatch  Grin


This is a 5 input carry save adder I made in an attempt to fit two full chains in an LX150.
I don't know if this helps you guys out but I don't have time to test it myself atm  Embarrassed

download it here:  http://www.omegav.ntnu.no/~kamben/adder5x.vhd

or copy paste this:

-- This block uses 94 LUTs with only 29 Carry chain LUTs. (sliceM/L) (implemented purely combinatorial, no regs )
-- XST synth of 4 or 5 input adder uses 64 LUTs with 64 carry chain LUTs. (sliceM/L)   (implemented purely combinatorial, no regs )

LIBRARY IEEE;
USE ieee.std_logic_1164.ALL;
USE ieee.std_logic_unsigned."+";

--Library UNISIM;
--use UNISIM.vcomponents.all;

ENTITY adder5x IS PORT (
 reset               : IN  std_logic;
 clk                 : IN  std_logic;

  ina                 : IN  std_logic_vector(31 downto 0);   
  inb                 : IN  std_logic_vector(31 downto 0);   
  inc                 : IN  std_logic_vector(31 downto 0);   
  ind                 : IN  std_logic_vector(31 downto 0);   
  ine                 : IN  std_logic_vector(31 downto 0);   
 
  qout                : OUT std_logic_vector(31 downto 0)); 
END  adder5x;


ARCHITECTURE rtl OF adder5x IS
 

SIGNAL  a:  std_logic_vector(31 downto 0);
SIGNAL  b:  std_logic_vector(31 downto 0);
SIGNAL  c:  std_logic_vector(31 downto 0);
SIGNAL  d:  std_logic_vector(31 downto 0); 
SIGNAL  e:  std_logic_vector(31 downto 0);   
SIGNAL  qr: std_logic_vector(31 downto 0);
--

SIGNAL SA,SAr     :std_logic_vector(31 downto 0);
SIGNAL SB,SBr     :std_logic_vector(31 downto 2);
SIGNAL S1,S2,S3   :std_logic_vector(31 downto 0);

--SIGNAL fasit : std_logic_vector(31 downto 0);

BEGIN
 
 
--  input_reg: PROCESS (reset, clk)
--BEGIN
--   IF (clk'event AND clk='1') THEN     
      a<=ina;
      b<=inb;                         
      c<=inc; 
      d<=ind; 
      e<=ine;   
--   END IF; 
--END PROCESS;   


--  pipe_reg: PROCESS (reset, clk)
--BEGIN
--   IF (clk'event AND clk='1') THEN          
      SAr<=SA;                           -- if your whole "chain" only has 1 pipeline register
      SBr<=SB;                           -- this might be a good place to put it
--   END IF; 
--END PROCESS;
   
   
--  output_reg: PROCESS (reset, clk)
--BEGIN 
--   IF (clk'event AND clk='1') THEN     
      qr<=SAr+(SBr & "00");             -- Regular carry chain adder for the last stage       
--   END IF; 
--END PROCESS; 


qout<=qr;


--fasit<=a+b+c+d+e;

------------
--calc


-- first LUT column of adder
-- 5 single bit inputs -> 3 bit sum output
LUT_stage1:FOR i IN 0 TO 31 GENERATE   
 
---------
S1(i)<=a(i) XOR b(i) XOR c(i) XOR d(i) XOR e(i);

-----
-- forced LUT alternative. slightly faster, uses more overall LUTs
-- could save 1 sliceM/L for every 2 adder blocks. Might make routing easier.

--LUT5_inst1a : LUT5
--generic map (
--INIT => x"96696996")
--port map (
--O =>  S1(i),
--I0 => a(i),
--I1 => b(i),
--I2 => c(i),
--I3 => d(i),
--I4 => e(i));   
-----
---------

LUT_inst1bc : LUT6_2
generic map (
INIT => x"E8808000177E7EE8")       
port map (
O6 =>  S3(i),
O5 =>  S2(i),
I0 => a(i),       
I1 => b(i),     
I2 => c(i),   
I3 => d(i),   
I4 => e(i),
I5 => '1');     

END GENERATE;


-- 2x3bit LUT sums -> 2+2bit output sum
-- max sum =  5+(2*5)=15, range 0-15 -> exact 4 bit
LUT_stage2A:FOR i IN 0 TO 15 GENERATE   

SA((i*2))<=S1((i*2));
SA((i*2)+1)<=S2((i*2)) XOR S1((i*2)+1); 

END GENERATE;   


--SB(0)<='0';
--SB(1)<='0'; 

LUT_stage2B:FOR i IN 0 TO 14 GENERATE   

LUT_inst2cd : LUT6_2
generic map (
INIT => x"0077640000641364")   
port map (
O6 =>  SB((i*2)+3),
O5 =>  SB((i*2)+2),
I0 => S2((i*2)),      -- B1
I1 => S3((i*2)),      -- C1
I2 => S1((i*2)+1),    -- A2
I3 => S2((i*2)+1),    -- B2
I4 => S3((i*2)+1),    -- C2
I5 => '1');   --   

END GENERATE;   


END rtl;
legendary
Activity: 1029
Merit: 1000
Yes, I'm thinking about real dev kit too. But prices are sometimes too high (here in Poland that 69$ dev kit costs 130$). So buying mining rig makes more sense. And I will have excuse to my wife why I need to spent so much money:) Maybe there will be some room for few modifications. I'm wating for some specs of that new mining boards...
sr. member
Activity: 520
Merit: 253
555
I want to learn FPGA's (I think about it for some 3 years) thats why I need to use ISE.

If you really want to learn about FPGAs, and not just mine Bitcoins, you will be much happier with a traditional dev kit. There are so many other fun things do with an FPGA, and for those you will need some I/O connectors, and a few onboard buttons/switches/LEDs will come in handy. Even some cheap kits will run a miner, for example the one mentioned here:

https://bitcointalksearch.org/topic/m.451101
hero member
Activity: 686
Merit: 564
Due to the lovely Bitcoin community I'm discontinuing development on my forks of this code. Also, one of the moderators is trying to get me banned which would kind of make it impractical anyway.

Edit: Really, I should've done this a long time ago - the Bitcoin community is freaking toxic - but it's so interesting...
legendary
Activity: 1029
Merit: 1000
Just as I suspected.
I figure it out that you will use FT2232 in your design to load bitstream and data to FPGA Wink Correct? I use them to in few projects...
I want to learn FPGA's (I think about it for some 3 years) thats why I need to use ISE.
So 30-day trial or LX75. It would be possible to order X6500 with one LX150 and one LX75?
hero member
Activity: 560
Merit: 517
Quote
Maybe someone have an idea what would happen if file generated for LX75 I will try to load to LX150?
Configuration data for an LX75 is not compatible with an LX150, so no matter what you do it won't work.

If you attempt to load a normally generated bitstream from an LX75 onto an LX150, configuration will immediately fail. Technically you can hack the bitstream and force it to load, but A) Xilinx says it may "damage your hardware if you do", and B) as states above the configuration data isn't compatible so you accomplish nothing.

I should note that boards like the X6x00 do not require ISE to load firmware into the FPGA. You'd only need ISE if you wish to compile your own firmware and bitstreams. It's unfortunate, but that's how Xilinx's licensing works :/ They offer a 30-day trial which I think lets you compile for any device.
legendary
Activity: 1029
Merit: 1000
Hello again.
Maybe someone have an idea what would happen if file generated for LX75 I will try to load to LX150?
I have only web edition of ISE and that supports only to LX75 max. Maybe I will buy X6500 board from our FPGA eagles ( Wink )and there will be LX 150... I have some knowlage, so I want to try what its worth Wink
I suspect that it won't work, but confirmation will make me happier Wink
TIA.
hero member
Activity: 686
Merit: 564
Speaking of cheaper FPGAs, in theory the faster speed grades of EP3C25 should be able to reach 30 MHash/s (though EP4C22 only appears to be capable of 25 and I haven't even been able to achieve this in practice due to DE0-nano issues). See the de0-nano-hax branch in my github repo. Uses up pretty much all the FPGA resources though, so I've no idea how interfaces like teknohog's would fare and it's probably mostly a curiosity.
hero member
Activity: 560
Merit: 517
Quote
I've been discussing with jonand about building a cluster of cheap FPGAs, and I've got the basic idea working:
Very, very cool!
sr. member
Activity: 520
Merit: 253
555
I've been discussing with jonand about building a cluster of cheap FPGAs, and I've got the basic idea working:

https://github.com/teknohog/Open-Source-FPGA-Bitcoin-Miner/tree/master/projects/DE2_115_cluster

It is a "cluster" of two miners on a single FPGA, but the links are asynchronous serial ports, so it could just as well be distributed. There is a hub that distributes work to miners and consolidates results, so a single serial port could drive a big number of FPGAs.

This will take some work to make everything smooth, but I think the basic idea is valid. For example, there is an if-else construct that should be rewritten (with generate?) for any number of miners.

Edit: the code is verified to work with an unholy alliance of one Xilinx and one Altera chip.
hero member
Activity: 686
Merit: 564
Only problem with the UrJTAG Python code is that pexpect is UNIX specific. There's a Windows port called wexpect I have been playing with. Its project on Google Code is inaccessible at the moment, for reasons unknown. The random wexpect.py file I found lying around some dusty corner of the internet works, but has a few bugs I had to work around.
Apparently there's something called winpexpect; I have no idea how well it works though.

It's a real shame. UrJTAG is a nice program. Perhaps it would be worthwhile to write a SWIG based wrapper around all of its (apparently undocumented) C API.
It looks like I might want to do that anyway; UrJTAG is using excessive CPU for me and I tracked it down to its readline support, which cannot be disabled at runtime.
inh
full member
Activity: 155
Merit: 100
Uhh, why?Huh

I guess there is some appeal to not needing the FPGA software, but JTAG is ungodly slow.  It's not an appropriate means of communication between a mining host and the FPGA at all.  I don't think this idea is wise nor useful (except to save the step of FPGA programming I guess).
Because it's what the existing code that fpgaminer wrote was using and it saves trying to hook up an extra cable of some kind for communication, basically. There are already methods of running miners without using the FPGA software if you don't want to use JTAG.

I'm no expert but JTAG is plenty fast enough to deal with a getwork request every 21 seconds or so Smiley There isn't a lot of data.

fpgaminer thank you for the explanation!
hero member
Activity: 686
Merit: 564
Uhh, why?Huh

I guess there is some appeal to not needing the FPGA software, but JTAG is ungodly slow.  It's not an appropriate means of communication between a mining host and the FPGA at all.  I don't think this idea is wise nor useful (except to save the step of FPGA programming I guess).
Because it's what the existing code that fpgaminer wrote was using and it saves trying to hook up an extra cable of some kind for communication, basically. There are already methods of running miners without using the FPGA software if you don't want to use JTAG.
hero member
Activity: 560
Merit: 517
A getwork request returns a Target (256-bits), Hash1 (256-bits), Midstate (256-bits), and Data (1024-bits).

A SHA-256 hash needs two pieces of information, a 256-bit starting state, and 512-bits of data. It returns the resulting 256-bit hash.

Here are the steps for a Bitcoin Hash:
Code:
data = highest 512-bits of Data
for nonce in range(0, 2**32):
     4th 32-bit word of data = nonce
     hash = sha256(Midstate, data)
     hash = sha256(Sha256InitialState, Hash1 << 256 | hash)

     if hash <= Target:
          Send a result back to the pool server or bitcoind

Note that Sha256InitialState is a 256-bit number defined by the SHA-256 standard.

For FPGA mining, we assume that Hash1 is always the same (which it is), and that Target is always 0x00000000_FFFFFFFF_..._FFFFFFFF (which it is, for pool mining). As you can see in the first line of the pseudo-code above, only the last 512-bits of Data are used for hashing in this algorithm. Also, everything after the 4th 32-bit word of data is always the same.

So, after all is said and done, the FPGA only needs the 256-bit Midstate, and 96-bits of Data. It then returns any 32-bit nonces that give us a hash <= 0x00000000_FFFFFFFF_..._FFFFFFFF.

Quote
I've seen the wiki that says what goes in to a block header hash
Than I should note that the 1024-bit Data value returned by getwork is actually just the block header, padded to 1024-bits. It's padded using SHA-256's padding algorithm. Check the SHA-256 wiki page if you are interested.
Pages:
Jump to: