Pages:
Author

Topic: Project Evil Genius – Custom SHA2-256 Circuits on a FPGA - page 2. (Read 12359 times)

sr. member
Activity: 384
Merit: 250
Cool, thanks for the info. I definitely will have to learn more about scrypts. I downloaded the paper, but just skimmed the info. One day, I will have to sit down, and go through how it hashes the data.

Yeah, I would definitely pipeline the data. Having a long combinational path will make it really slow. This is where the fun stuff happens. You have to figure out how many stages are the best. Then adding in memory makes it more of a challenge. I wonder if you can run the ram 2x as fast and have 0.5Mbits instead of 1Mbits. This will save some space, plus you will learn how to deal with different clock gearing. But I don’t know if that is possible or not, and you need the 1Mbits.   

No worries on the code. I am glad to see it is written in Verilog (more personal preference than anything). We all have to start somewhere.

My live code for my DE0-Nano board is actually using a 0.5MBit scratchpad (the EP4CE22 chip only has 600kBit ram), so I have to interpolate the missing half of the scratchpad (basically one extra pass through the salsa-mix for half the addresses). I can't really see how I can parallelise this as each ram read address depends on the results of the prior salsa-mix (scrypt was explicitly designed to be awkward to parallelise). From some reading I've seen that a larger scratchpad eg 8MBit can speed up the scrypt, which I don't quite understand yet, so once I've read up some more on it then perhaps some tricks will be apparent (yeah, I just wanted to get something running quickly so I coded a direct analog of the cgminer CPU scrypt.c code, rather than doing my research first  Roll Eyes )

The pipelining of the salsa-mix is definitely an issue, but its tricky due to the dependency of the scratchpad reads on addresses generated from the prior salsa. Adding extra register stages allows a faster clock, but needs extra clock cycles to complete, completely cancelling out any gain (and there is no gain from the pipelining itself due to the address dependancy). Once I understand the algorithm better, perhaps I can come up with a solution (I'll need to take a look at the CUDA code to see what's done on the GPUs).

Thanks for the kind words, and good luck.
sr. member
Activity: 378
Merit: 250
hi

on wich fpga you ll use as basis ?

maybe it ll be interesting to make an asic from your fpga vhdl if it gives better hashing power so we ll be able to get a more powerful asic

keep us posted

legendary
Activity: 1066
Merit: 1098
Guys you do realise he is trying to improve FPGA efficiency, and patent the results? This is a hell of a long way from open source.

OMG he's been outed as a Capitalist!  Someone kill him before he gets away!   Shocked

Improving on stuff and making a profit from his efforts!  This cannot be endured...

 Roll Eyes
legendary
Activity: 1666
Merit: 1185
dogiecoin.com
Guys you do realise he is trying to improve FPGA efficiency, and patent the results? This is a hell of a long way from open source.
member
Activity: 102
Merit: 10
Only if you were here a year ago, now this is a bit late if not obsolete very soon. I'm hoping the best and I'm very interested to see how well you manage to do with this project.

Yeah, looking back on this, I should have skipped the Crypto Extractor/Dominatrix Engine and went straight for the FPGA. But the good thing is there still a huge demand out there. The ASIC companies cannot meet demand, but how long with that be? But that is playing the “what if” game, and I have learnt to never play that game.

But on the flip side, there is a ton of FPGA miners out there. If I can give them a large boost in performance, it might make them last a little longer. Also there is scrypt (Litecoin) after this. Since scrypt uses a SHA-256 circuit, I can re-use it. I believe there still time for Litecoins.

But there is one more big plus. I might be able to work with an ASIC company. They would love to have a design that can drastically increase performance, which can give them an edge over their competitors. And if I can get it working on a FPGA, it just proves the design works. Plus, I was an Electrical Engineer with a proven track record with papers and patents, before becoming disabled. So there is still a large benefit out there. But this is majorly putting the cart in-front of the horse. I still got a lot of work to do, and being disable it comes and goes. But it looks very very promising right now, and I have nothing to lose and everything to gain.



your work is appreciated, im going to run mine until it costs more in power than its worth just to contribute to the network.
sr. member
Activity: 384
Merit: 250
I haven’t looked into scrypts too much, but will work on it after this project. I can re-use the SHA-256 circuit for the scrypts project.

You mentioned you needed 1Mbit of ram per hasher core. What is the gear ration between the hasher and the RAM? Is it 1:1, or some other ratio? Basically, if the hasher runs at 100MHz, what speed is the RAM running at?

Thanks,

Doom

I'm running the core at 25MHz (it didn't seem worth pipelining the salsa mix, given that scrypt is essentially a serial algorithm, so I've implemented it as one huge combinatorial tree), and clocking the on-chip ram at the same speed (just to keep it simple). Using the on-chip ram makes the ram interface very simple as its just a 1024bit wide data path. External RAM (as would be needed for to get any sort of performance from the fpga) is going to be more complicated and would need to run faster, which might make pipelining the salsa worthwhile. Bear in mind that I'm a complete amateur with FPGA/logic design, so don't expect anything sophisticated  Roll Eyes

Anyway you can take a look at my code here https://github.com/kramble/FPGA-Litecoin-Miner any suggestions would be welcome. I created a thread to discuss it earlier today in the altcoins section https://bitcointalksearch.org/topic/ann-open-source-fpga-litecoin-miner-260598
legendary
Activity: 1946
Merit: 1006
Bitcoin / Crypto mining Hardware.
Just a quick update. I did finally get the OpenCL Kernel done and compiling. But I had problems running it, and one of my power supplies went bad. I have two power supplies for my computer, since I have two 5970 cards. I don’t know if the two where related or not. I believe the power supply is under warranty, so I will have to send it in. I do have a spare, plus I can always run one power supply with one card plugged in.

But I am going to skip the OpenCL. I have gotten my second pre-design done, and it has better space savings then the first. I will not report the space savings, since I don’t want people trying to figure out what I am doing. However it is another good space savings jump, but not as good as the last one. Also, I can continue to pipeline the data too. The design is very performance driven. It should be easy to meet timing with this design. I don’t think I am going to get any better than this design, so I am going to start writing the Verilog code starting tomorrow, while my wife and daughter are out. I drew out the first few stages, so I have a guide to follow.


Smiley
legendary
Activity: 1946
Merit: 1006
Bitcoin / Crypto mining Hardware.
why don't you try developing scrypt mining for LTC for opensource community. There's avery high demand for it IMO.

Jasinlee has a project running at http://ltcfpga.com/ which seems fairly advanced (but NOT opensource)

I'm currently working on an opensource implementation (just for the LOLs), using the on-chip FPGA ram (it needs 1Mbit per hasher core). I've got the simulation running fine using a register array for ram. Unfortunately I'm only estimating around 1khash/sec performance per hasher core. The next step is to port it to my DE0-Nano board (it can only fit half the scratchpad, so its going to interpolate which is even slower). I'll post it on my github once its in a presentable state.

That's awesome Smiley kudos to you. Would it be possible to load it on github so others can also try to use  your scrypt miner?
sr. member
Activity: 384
Merit: 250
why don't you try developing scrypt mining for LTC for opensource community. There's avery high demand for it IMO.

Jasinlee has a project running at http://ltcfpga.com/ which seems fairly advanced (but NOT opensource)

I'm currently working on an opensource implementation (just for the LOLs), using the on-chip FPGA ram (it needs 1Mbit per hasher core). I've got the simulation running fine using a register array for ram. Unfortunately I'm only estimating around 1khash/sec performance per hasher core. The next step is to port it to my DE0-Nano board (it can only fit half the scratchpad, so its going to interpolate which is even slower). I'll post it on my github once its in a presentable state.
legendary
Activity: 1946
Merit: 1006
Bitcoin / Crypto mining Hardware.
Okay got it. Sometimes the synthesis software can also do logic optimization, which could reduce the size of hardware, though it's very rarely succesful in substantiial reduction of hand optimized designs.
legendary
Activity: 1946
Merit: 1006
Bitcoin / Crypto mining Hardware.
I had a great night last night for the Evil Genius project. I got my first pre-design partially working in software. The size of the SHA-256 circuit was reduced by 0.52 to 0.37 (not giving out the exact size reduction). The next step is I am going to build a full software implementation in an OpenCL Kernel. I don’t believe the performance of the Kernel will be better than anything out there, because I cannot increase the GPUs hardware and GPUs are not specifically built for SHA-256. It is more of a proof of concept thing. Also, I will still be able to pipeline the work with the smaller design. So every clock cycle, you will get data. After the full implementation, I will see if I can patent the design method. The circuit has 64 stages for one hash, so 128 stages for the double hash.

This is just the first pre-design. I still have some other things I want to try out. But this is very promising. I was very excited last night. I cannot wait to start trying some other things out to see how far I can push the SHA-256 circuit.  

Edit: It should be reduce ‘to’ 0.52-0.37, instead of ‘by’. I am cutting myself short  Smiley. So now the circuit is basically half the size or smaller, so the performance increase will be 2x or more.

May I ask did you obtain the size reduction on a HDL module or was it something else?
sr. member
Activity: 1316
Merit: 254
Sugars.zone | DatingFi - Earn for Posting
Watching, good luck with this.  Grin
sr. member
Activity: 280
Merit: 250
Only if you were here a year ago, now this is a bit late if not obsolete very soon. I'm hoping the best and I'm very interested to see how well you manage to do with this project.
hero member
Activity: 642
Merit: 500
I coded up a quick SL3 cracker about a year ago.  It either ran on my Spartan 6 devkit, or the X6500, I can't recall.  I could probably dump the code to github if people are interested.  I didn't optimize it particularly well, just got it working.
I'd also be very interested in a git, and I'd certainly throw a tip your direction.
member
Activity: 88
Merit: 10
If this does work you should be able to get twice the engines into one FPGA. There is money in this as there are plenty of FPGAs out there and I'm sure ppl would love to double thier hash rate!!
full member
Activity: 202
Merit: 100
Great, keep going!
sr. member
Activity: 462
Merit: 250
Firing it up
I will use this thread to update my progress. I will update every once in a while to let people know my progress and how well the circuits are preforming. 

I will be creating custom digital circuits of a SHA2-256 Double Hashers for Bitcoin mining. I have already started on the first stage. The project will be written in Verilog. I chose Verilog, because it has better controls at the gate level than VHDL. I will not be using any C code for the hashing circuits. However I might use C code for the registers and connections to the computer, IE USB plug, and set-up data.

Looking at the Open Source FPGA code, I believe I can make a really good improvement over the Open Source code, which is converted C. I have vast experience in digital design and I have worked on many ASIC projects. However, I have not worked on a FPGA before, but I have worked alongside FPGA programmers to know the major problems that affect FPGAs.

To get more info on my background, go here:
http://www.cryptoextractor.com/crypto/author.html

Depending on the results, three things will happen. If I get really great results, then I would probably make some boards and sell them. If I get good results, I will probably buy old FPGA boards and reprogram them, and mine with them. I might sell some of the re-programmed boards. The last would be if I got OK results. Then I would just release the code as Open Source.


why don't you try developing scrypt mining for LTC for opensource community. There's avery high demand for it IMO.

Memory is problem. He will
sr. member
Activity: 322
Merit: 250
Supersonic
Also, have you looked at bitfury's code?  He has the most performant code for Spartan-6 LX150 chips, and I would be shocked if anyone beat his record (in MH/s) on that chip.  It's optimized down at the slice level and manually placed.  https://bitcointalk.org/index.php?topic=228677.msg2417706#msg2417706

Is there a compiled bitstream compatible with ztex out there?
full member
Activity: 202
Merit: 100


Quote
For sure it is possible to implement on FPGA.
I coded up a quick SL3 cracker about a year ago.  It either ran on my Spartan 6 devkit, or the X6500, I can't recall.  I could probably dump the code to github if people are interested.  I didn't optimize it particularly well, just got it working.

FPGAMINER

Getting SL3 unlock to FPGA miners would give them new life. I do not have much of them but they will be more universal if someone keeps them running SL3.

Also I am not rich, but willing to support job with some BTCs.
hero member
Activity: 560
Merit: 517
Quote
I still believe I can make a big performance jump over the code. I will try to get down to the gate level as much as possible and use all the logic there. I have even been looking through the specs and schematics to see how the slices work on the Spartan-6.
It's a lot of fun down there!  It's a shame the Spartan 6 architecture is so limited.  I suggest you take a look at the 7-series FPGAs, like the Kintex or Artix.  The architecture is nicer, and performance is much higher.  For example, I was able to implement a miner using the DSP48E1s on a Kintex.

Also, have you looked at bitfury's code?  He has the most performant code for Spartan-6 LX150 chips, and I would be shocked if anyone beat his record (in MH/s) on that chip.  It's optimized down at the slice level and manually placed.  https://bitcointalk.org/index.php?topic=228677.msg2417706#msg2417706

Unfortunately, or fortunately (depending on how you look at it), FPGA's will never beat ASICs in terms of performance per dollar, or performance per Watt.  So FPGA mining is a curiosity and plan-B sort of thing now.

Quote
For sure it is possible to implement on FPGA.
I coded up a quick SL3 cracker about a year ago.  It either ran on my Spartan 6 devkit, or the X6500, I can't recall.  I could probably dump the code to github if people are interested.  I didn't optimize it particularly well, just got it working.
Pages:
Jump to: