Nanominer - Modular FPGA Mining Platform - page 2.

lame.duck

legendary

Activity: 1270

Merit: 1000

Quote from: Wandering Albatross on February 15, 2012, 12:58:01 PM

It seems that the pain will only propagate using all these proprietary tools. Are there any open, linux SDKs for this hardware? Reinventing the wheel is not cost efficient.

For example for spartan6 there is:
www.petalogix.com/about/supported-fpga-and-cpu-families

This are not the tools required to generate FPGA bitstreams. This ist a (as far i see) an IDE to develop Software for Soft and hard-cores.

Wandering Albatross

member

Activity: 70

Merit: 10

It seems that the pain will only propagate using all these proprietary tools. Are there any open, linux SDKs for this hardware? Reinventing the wheel is not cost efficient.

For example for spartan6 there is:
www.petalogix.com/about/supported-fpga-and-cpu-families

makomk

hero member

Activity: 686

Merit: 564

Quote from: pieppiep on February 13, 2012, 10:29:05 AM

I don't understand.
If the h value of the midstate is 0x5be0cd19, then the next h value that is added to it must be exactly 0xa41f32e7 to get the 0x00000000 value to make a valid share at difficulty 1.
Isn't that a way bigger advantage? Now you know for sure you got a value you want and not some strange percentage.
I assume no pools use shares less than difficulty 1 and for mining in pools with difficulty greater than 1 it should be easy for the miner software on the host computer to check if the share with difficulty 1 is also valid for the pool with difficulty x.

Not only that, but some code out there already takes advantage of this particular optimization. For example, I know that my variant of fpgaminer's code does so for fully-unrolled miners - in fact it totally omits the last three SHA-256 rounds - and it looks like ztex's code (which is what most people are using these days) does too in all its variants.

Edit:

Quote from: wondermine on February 15, 2012, 01:24:50 AM

I'm squeezing 26.5MH/s out of a Nano with the latest design; that's verified accepted shares. I need to run the compile again with subscription edition, because this fitting is really poor, I'll have better numbers for your tomorrow. I've also got some rewriting to do that'll increase that speed. This Quartus web edition is really crippled.

Ooh, very impressive

- will be interesting to see what you manage to get up to in the end.

Dexter770221

legendary

Activity: 1029

Merit: 1000

So, what happend with 75+ MH/s from 22k part? 60 from twice a price is NOT competitive. 22k parts are avaible in TQFP packages, Easy to solder, design of 2layer PCB is also easy. Only that makes sense.

Inspector 2211

sr. member

Activity: 448

Merit: 250

There's also a guy going by the name of eldentyrell, who has managed to fit 3 SHA-256es (3 half-miners) into a Spartan6-150, but as of yet, he has not disclosed the clock rate yet, and it is unclear whether he is willing to put the bitstream into the public domain, something that Stefan of ZTEX has done. Something that the designer of the Icarus board has also done.

If you want to be a hero here, follow in eldentyrell's footsteps, squeeze 3 SHA-256 into a Spartan6-150 and put design and bitstream into the public domain.
And/or improve the Icarus implementation.
And/or improve the ZTEX implementation.

Inspector 2211

sr. member

Activity: 448

Merit: 250

>consider pursuing?

Frankly, no.
One could simply copy the ZTEX design, which is available under the GPL license, if memory serves, spend a tad over $200 per board (not for 2 or 10, but, say, for 50 or 100) and achieve a tad more than 200 MH/s. $1 per MH/s. Thus, I'd advise against pursuing a $2 per MH/s design.

wondermine

newbie

Activity: 59

Merit: 0

Quote from: lame.duck on February 14, 2012, 07:45:35 PM

There is already a mining script using a serial connection to FPGA ...

https://bitcointalksearch.org/topic/modular-python-bitcoin-miner-official-thread-62823

Let me restate. Would someone like to work with me to get that interfacing with the FPGA. I'm not good with python, and I'm not sure exactly how that wants its data. It's not just a matter of slapping serial on there and linking it up with the script, though I'm sure it's a great script.

----------------------------------
I'm squeezing 26.5MH/s out of a Nano with the latest design; that's verified accepted shares. I need to run the compile again with subscription edition, because this fitting is really poor, I'll have better numbers for your tomorrow. I've also got some rewriting to do that'll increase that speed. This Quartus web edition is really crippled.

In looking around a little, I've noticed prices on FPGA standalone chips, Cyclone IV E series, but larger than the Nano's chip by a little more than 2x. The chips are about $85 a pop, and would require another $30-45 in circuitry/board/etc, so about $130 but it would mean some design and manufacture on our part. That and the shipping would be cheaper than anyone else's, since I'd do the mass orders and ship cheaply for everyone.

Using the aforementioned chip and a custom board you could get 53MH/s (likely more, but I know you guys like verified numbers) for $130 at 0.5W max. Is that something that I should consider pursuing? I have a friend who's excellent at circuit design and expressed some interest, but I'd need to know there was interest. I know the number isn't dazzling, though it will likely be at least 60, but that power figure and the fact that the board would be a couple inches by a couple inches is very attractive when compared to PCs...

Let me know. Either way hopefully these improving numbers are good indicators to you all; I haven't exhausted all the optimizations I've been working on, so more better numbers to come.

lame.duck

legendary

Activity: 1270

Merit: 1000

There is already a mining script using a serial connection to FPGA ...

https://bitcointalksearch.org/topic/modular-python-bitcoin-miner-official-thread-62823

wondermine

newbie

Activity: 59

Merit: 0

DE4-230 Stratix IV GX: 800 MH/s

Who wants to write a script to handle UART interfacing rather than this JTAG business? PM/email me if you're interested.

++More (non-mathematical) design improvements to be implemented in the near future.

rjk

sr. member

Activity: 448

Merit: 250

1ngldh

Quote from: wondermine on February 13, 2012, 05:23:46 PM

I'm not an expert as far as GPU mining, but unless underclocking RAM actually invalidates the RAM functionally, it will not be a problem. As far as increasing power consumption, I don't believe it should be a detectable amount, this again from an educated guess; we can see. The reason I say so is that "not using" the RAM doesn't mean "not powering" the RAM, it means not accessing it. Now, we're just accessing it, the amount of electricity we're sending to the memory hasn't changed (the electricity related to logic signaling is absurdly small). RAM doesn't lie electrically dormant, just logically.

No, underclocking just makes the RAM run slower. The speed of the RAM is independent of the core in a GPU. The reason I ask is because with traditional miners, there is a little bit of a curve in efficiency through the range of 150-400 Mhz in RAM speed. Some speeds might be unstable and slow mining, others could be a little bit faster mining, but never very much difference.

Even if the RAM usage isn't much, it still can affect the speed if it is needed in a critical part of the hash algo, since the clocks are independent, is basically all I was trying to say. The reason for messing about with the RAM speed in the first place is to save several watts of power.

wondermine

newbie

Activity: 59

Merit: 0

Quote from: pieppiep on February 13, 2012, 10:29:05 AM

Quote from: wondermine on February 12, 2012, 10:49:40 PM

From my forum at http://www.nonverba.org/forum:
Now, the final addition we do is to add the round constant, something that will always be the same.
In this specific situation it's 0x5be0cd19. Now, before we commit our 128th clock cycle to adding that to our previous 32-bit "h" value, we can add very resource friendly code to determine whether our "h" value is too high to yield a winning number. We can say that if "h" is greater than 0xa41f32e6, it cannot yield a winning digest and we should not waste that clock cycle. Great, that was easy, and isn't nearly the magnitude of problem I'm working on, but it's here to get the idea across. This one is a completely predictable example, the true optimization will come from probability-derived solutions earlier. Anyways...

I don't understand.
If the h value of the midstate is 0x5be0cd19, then the next h value that is added to it must be exactly 0xa41f32e7 to get the 0x00000000 value to make a valid share at difficulty 1.
Isn't that a way bigger advantage? Now you know for sure you got a value you want and not some strange percentage.
I assume no pools use shares less than difficulty 1 and for mining in pools with difficulty greater than 1 it should be easy for the miner software on the host computer to check if the share with difficulty 1 is also valid for the pool with difficulty x.

I'll answer this last one here.

You're right, there's a certain value that will work 100% of the time for difficulty one on that last addition. This answer was given for the sake of simplicity, but it was also given in non-absolute terms to illustrate how probabilistic advantages will work. Real advantage can be taken earlier in the algorithm where we're dealing with probability thresholds, not absolute certainty. SHA-2 at 64 rounds is random (succeeds at mathematical randomness tests). However given certain values at certain stages, that unravels. The ultimate example is adding a round constant which is a value we know 100% for certain. The goal we're going for is to find them a little earlier, say round 126-7 where the probability might be in the fractions of a percentage, and the relationship to check may not be greater than, less than, or equal to; it's going to be something completely different that specially relates to a SHA-2 round.

Further answers directly related to the mathematics of it will be at http://www.nonverba.org/forum.

Quote from: rjk on February 12, 2012, 11:26:03 PM

I was wondering what method you were going to use to determine which partially finished hashes to throw away prematurely, but I suppose I understand a little better now. I suppose the question is, how does this affect (for instance) miners that underclock RAM (since you are now going to be using some)? And is the added complexity going to raise power requirements by activating parts of the chips that might otherwise remain dormant? Thinking specifically in GPU terms here.

I'm not an expert as far as GPU mining, but unless underclocking RAM actually invalidates the RAM functionally, it will not be a problem. As far as increasing power consumption, I don't believe it should be a detectable amount, this again from an educated guess; we can see. The reason I say so is that "not using" the RAM doesn't mean "not powering" the RAM, it means not accessing it. Now, we're just accessing it, the amount of electricity we're sending to the memory hasn't changed (the electricity related to logic signaling is absurdly small). RAM doesn't lie electrically dormant, just logically.

pieppiep

hero member

Activity: 1596

Merit: 502

Quote from: wondermine on February 12, 2012, 10:49:40 PM

From my forum at http://www.nonverba.org/forum:
Now, the final addition we do is to add the round constant, something that will always be the same.
In this specific situation it's 0x5be0cd19. Now, before we commit our 128th clock cycle to adding that to our previous 32-bit "h" value, we can add very resource friendly code to determine whether our "h" value is too high to yield a winning number. We can say that if "h" is greater than 0xa41f32e6, it cannot yield a winning digest and we should not waste that clock cycle. Great, that was easy, and isn't nearly the magnitude of problem I'm working on, but it's here to get the idea across. This one is a completely predictable example, the true optimization will come from probability-derived solutions earlier. Anyways...

I don't understand.
If the h value of the midstate is 0x5be0cd19, then the next h value that is added to it must be exactly 0xa41f32e7 to get the 0x00000000 value to make a valid share at difficulty 1.
Isn't that a way bigger advantage? Now you know for sure you got a value you want and not some strange percentage.
I assume no pools use shares less than difficulty 1 and for mining in pools with difficulty greater than 1 it should be easy for the miner software on the host computer to check if the share with difficulty 1 is also valid for the pool with difficulty x.

ummas

sr. member

Activity: 274

Merit: 250

@wondermine
i understand the need to have own forum, where you can speare information better, but dont forget to post here any news about nanominer. This place is the best place i know to get more ppl intrested in nanominer. ideas are gr8. just pull out somme hashes first please Cheesy

rjk

sr. member

Activity: 448

Merit: 250

1ngldh

I was wondering what method you were going to use to determine which partially finished hashes to throw away prematurely, but I suppose I understand a little better now. I suppose the question is, how does this affect (for instance) miners that underclock RAM (since you are now going to be using some)? And is the added complexity going to raise power requirements by activating parts of the chips that might otherwise remain dormant? Thinking specifically in GPU terms here.

wondermine

newbie

Activity: 59

Merit: 0

From my forum at http://www.nonverba.org/forum:

So I've been talking to a number of people about the possibility of using side-channel vulnerabilities in the bitcoin mining process to speed up mining. It's all been vague until now, so I'm going to give a very tangible example.

Assume we roll up the algorithm: 2x SHA-2 = 128 clocks/mining operation.
Now, the final addition we do is to add the round constant, something that will always be the same.
In this specific situation it's 0x5be0cd19. Now, before we commit our 128th clock cycle to adding that to our previous 32-bit "h" value, we can add very resource friendly code to determine whether our "h" value is too high to yield a winning number. We can say that if "h" is greater than 0xa41f32e6, it cannot yield a winning digest and we should not waste that clock cycle. Great, that was easy, and isn't nearly the magnitude of problem I'm working on, but it's here to get the idea across. This one is a completely predictable example, the true optimization will come from probability-derived solutions earlier. Anyways...

By eating a couple of logic elements and little block ram, we can implement this check.
Now, how does that all translate into performance gain? Well, 64.11% of hashes fall into this range. That means that 64.11% of the time, we can save one clock cycle on a core, which is 1/128 or a 0.78125% speed increase.
0.6411 * 0.0078125 = 0.005% increase in performance. Pulling, say, 500MH/s, that's 2.50MH/s performance enhancement over time. (I'm pretty sure this math is accurate, but if it's not perfect, it's at least close, these are the ranges we're talking about.)

It doesn't maybe sound like a lot, but think about it, we just pulled 2.5MH/s out of thin air. Or rather, out of unused block ram. It didn't cost us time, or power, or logic. The earlier these constants can be checked out, the less their probability of success, but the higher their effect on performance.

In case this was tl;dr material: We just pulled 2.5MH/s out of thin air and can do it repeatedly.

There are a lot of these optimizations that could fit on even a Cyclone series FPGA. The only thing standing between us and them is time and mathematical analysis. Hopefully this gives you a clearer picture of what I'm talking about, and lets you know that it's actually feasible and just requires time and effort, rather than being voodoo.

Oh, and this process isn't just for FPGAs, it works on any mining platform. These are the optimizations you can get your hands on before anyone else with donations. Seeing the incentive yet?

wondermine

newbie

Activity: 59

Merit: 0

The start of a repository of resources, information, and eventually tutorials has been started here: http://nonverba.org/forum/viewtopic.php?f=8&t=6

Check it out, sign up, post, get some discussion going. There's been a fair number of people mentioning they'd like to get into FPGA programming, so let's do it.

finway

hero member

Activity: 714

Merit: 500

Interesting!

Inspector 2211

sr. member

Activity: 448

Merit: 250

Quote from: wondermine on February 12, 2012, 03:38:42 AM

Are you familiar with the term "SerDes"?

Yes, as in "device that iS not suitablE for sending 64 bits to an addeR anD rEceiving 32 bitS back, all within 1 or 2 ns total" - SERDES.

You'd be shooting for 100 Gbit/s SerDes units - possible in ASICs now (if expensive), not yet available in FPGAs.

Furthermore, FPGAs typically have only about half a dozen or a dozen or maybe two dozens of SerDes units - not 875.

nelisky

legendary

Activity: 1540

Merit: 1002

I'm a single forum type of person, but I will try to keep that one in sight too.

Also, I can get you a VPS as a donation, send me an email (my address is in my profile) with what you need and we'll see what I can manage.

wondermine

newbie

Activity: 59

Merit: 0

Alright everyone, I've put up a forum on my website dedicated to Nanominer. There's a whole bunch of new information as well as better ways to keep track of the project there, so head on over to
http://nonverba.org/forum/ and check it out!

Topic: Nanominer - Modular FPGA Mining Platform - page 2. (Read 18959 times)