A custom designed FPGA miner for LTC? - page 2.

Nova!

full member

Activity: 140

Merit: 101

Now we're looking at the code.
This wasn't the exact file I had but maybe something has changed I don't know, it's close enough in the places that matter anyways.

Anyways for the crux of my argument, take lines 124 through 142 which consist of the bulk of the random number generator.
These are currently implemented as defines.

defines are a sort of macro they're going to be put into the final output as the code they represent.

Ask yourself what happens if you just have that section of code isolated as it's own separate core.
Then modify the code to call into that core rather than keep repeating that section over and over again?

Now we're on the same page. It would be a start. There are some other things I see a well, but I'll just keep that under my hat until I get a chance to make sure I'm right.

Nova!

full member

Activity: 140

Merit: 101

Quote from: WindMaster on May 25, 2013, 08:31:46 PM

Quote from: Nova! on May 25, 2013, 08:11:07 PM

mtrlt, please enlighten us.

Random tidbit that may be of interest if anyone is unsure whether mtrlt knows what he's talking about. Nova, you mention above that you've looked at the OpenCL source for scrypt mining. Assuming that you're talking about the OpenCL kernel from Reaper or cgminer, hop back into the source, scroll up to the top and examine the copyrights at the top of the file.

https://github.com/ckolivas/cgminer/blob/master/scrypt130511.cl#L2

At no point did I imply he didn't know what he was talking about.
However posting the entirety of the code and pointing at it and saying look, really wasn't an option.
I was asking him to explain what's going on in it in just a couple of sentences or paragraphs.

Nova!

full member

Activity: 140

Merit: 101

Quote from: WindMaster on May 25, 2013, 04:22:59 PM

You can calculate scrypt+salsa20/8(1024,1,1) as used in Litecoin with a fixed 128kB buffer + a bit of extra scratchpad memory, all day long, without calculating any sort of dynamic list that determines how much memory will be involved in calculating the hash. And the memory access pattern will be exactly the same every time you calculate the hash. In fact, my own FPGA implementation of scrypt with external DDR3 leveraged this fact by shifting every scrypt core 1 clock cycle from the previous one, such that a burst read or write to/from DDR3 would fetch all the data needed (or written) by each core precisely when that core was going to use (or generate) it. This was possible because the memory access pattern and amount of memory needed is exactly the same every time.

The OP doesn't have a shortcut at all. Even if scrypt worked the way he described, the OP's suggestion of a 2 core approach as a "shortcut" would be a retarded design for an FPGA implementation.

I'm very interested in learning more about this approach.
You say that you shift every scrypt core 1 clock cycle from the previous one.
You then say that a 2 core approach would be retarded.

You either did or did not implement multiple cores, but it sounds to me like you implemented all of scrypt and loaded it to seperate cores. You mean completely seperate chips or literal cores on the same chip?

You say it's retarded, but you make it sound like it works flawlessly for you. Primary difference being you're calling all the way out to RAM and I'm saying let's keep that on-die.
Also I believe you're talking about all of scrypt in a single functional unit and I'm saying let's break it out into a generation core and a usage core. You may have actually found a better solution. I'd like to know your hashrate and platform though, if you don't mind disclosing it.

WindMaster

sr. member

Activity: 347

Merit: 250

Quote from: Nova! on May 25, 2013, 08:11:07 PM

mtrlt, please enlighten us.

Random tidbit that may be of interest if anyone is unsure whether mtrlt knows what he's talking about. Nova, you mention above that you've looked at the OpenCL source for scrypt mining. Assuming that you're talking about the OpenCL kernel from Reaper or cgminer, hop back into the source, scroll up to the top and examine the copyrights at the top of the file.

https://github.com/ckolivas/cgminer/blob/master/scrypt130511.cl#L2

Quote from: Nova! on May 25, 2013, 08:11:07 PM

While doing so, don't try to point at the flaws in my explanation and say "it's not x but y & z". Instead start from scratch. Try to remember your audience here.
Frankly I'm genuinely interested in this. I'm also impressed with all these FPGA experts chiming in with their knowledge, looks like lots of people have fully working LTC FPGAs.

This part is true. However, most (all?) of the people that have actually done it have found that the cost/performance ratio is significantly worse than GPU's. This was actually the case for BTC too, FPGA's never had the edge in the cost/performance ratio, only an edge on power consumption. On the scrypt side of things, it's my position from first-hand experience that you'd have to have insanely expensive power or be willing to wait for multiple years for ROI on power cost savings for FPGA's to be worthwhile for LTC mining. We know what we're doing on the FPGA and ASIC development side of things, and yet we have a data center full of 5850 and 6950 based rigs mining LTC.

Nova!

full member

Activity: 140

Merit: 101

Quote from: mtrlt on May 25, 2013, 07:18:21 PM

Nova: You have blatantly misunderstood how hash functions work, and specifically how scrypt works. I agree with WindMaster, there is no way you have made, or will make a scrypt FPGA. I advise everyone to not send Nova money.

Ok, elaborate. Please explain in laymans terms how a cryptographic hash function in general works first. Then also explain in laymans terms how scrypt differs from the SHA-256 of bitcoin.

If I have completely misunderstood hashing over a lifetime of programming, then I really have some long hard thinking to do.

My guess is that you're focusing on an over simplified explanation, something I can present to people who may or may not have any sort of experience with programming or hardware dev and assuming that the reductions and omissions are there as an oversight or misunderstanding rather than the fact that they are not relevant to what I'm attempting to explain and thus intentionally omitted.

mtrlt, please enlighten us.

While doing so, don't try to point at the flaws in my explanation and say "it's not x but y & z". Instead start from scratch. Try to remember your audience here.
Frankly I'm genuinely interested in this. I'm also impressed with all these FPGA experts chiming in with their knowledge, looks like lots of people have fully working LTC FPGAs.

As for everyone else, the offer is still open here.
If anyone has sent me money and decided that they no longer feel comfortable with the idea of what I've stated before they can request a refund.
Also mtrlt is half right, I have not made a fully functional FPGA for LTC yet, it's sort of why I'm asking for help to raise funds to buy the bits I need.

Some good news is that people have been suggesting alternatives to build this on which could end up being much cheaper. I'm currently swimming in whitepapers

Anyways, I'm with mtrlt & windmaster on this. I expressly advise against anyone sending me money unless they understand that what they are getting is the collective result of what I learn in this process. While the goal is to produce an FPGA with a version of scrypt with a slower section optimized away, this effort may not produce a result other than "oh ok, so it didn't work because I was wrong about x". If I had all the information I needed or I had the money to purchase the equipment to check it out, I wouldn't be going this route.

I am learning a lot though. There do appear to be candidate boards that may work. I'm studying them carefully now.

mtrlt

member

Activity: 104

Merit: 10

Nova: You have blatantly misunderstood how hash functions work, and specifically how scrypt works. I agree with WindMaster, there is no way you have made, or will make a scrypt FPGA. I advise everyone to not send Nova money.

Nova!

full member

Activity: 140

Merit: 101

Quote from: Viceroy on May 25, 2013, 06:16:20 PM

Can you write an efficient scrypt miner that we can put on the amazon cluster? I have $100 in credit plus an offer to beat up a bunch of tesla's for 24 hours.

I'm actually not sure about that. For instance I know that GPU mining on the telsa is horribly inefficient for bitcoin. I can only guess that it would be even worse for, litecoin.
However I do wonder how the CPUMiner would fare on one of those 64ECU units.

From a cost perspective spinning up a ton of micro instances as spot instances may infact be your best bet there and just point them at a pool. I did that a month ago and it caused the pool operator to shut down thinking they were under DDOS attack. Trimmed it back from 200 units to 20 and let it mine a week and got nothing for my trouble but a big amazon bill.

I'm going to take a look at the way scrypt is implemented in the CPU miner and see if there is anyway to optimize it a bit though because now you've got my brain working in that direction, thanks!

Viceroy

hero member

Activity: 924

Merit: 501

Can you write an efficient scrypt miner that we can put on the amazon cluster? I have $100 in credit plus an offer to beat up a bunch of tesla's for 24 hours.

Nova!

full member

Activity: 140

Merit: 101

Quote from: WindMaster on May 25, 2013, 04:30:04 PM

Just preserving a copy so the OP can't change his original posts in this thread or the other one to remove all the incorrect claims that are red flags:

Quote from: Nova! on May 23, 2013, 02:26:37 AM

Scrypt is resistant because it is memory hard.
The amount of memory required is controlled specifically by a psuedo randomly generated list that is changed at every hashing cycle. This means that the setup and take down of the list is expensive and it has to be done with each iteration. The alternative is to only generate a small subset of it, but the generation algorithm is itself CPU intensive.

The solution to this problem is to have 2 cores and a metric crapton of on die ram.
In this design, 1 core runs the prng algo, the other does the hashing.
They need to coordinate a little with one another, but it is most definitely doable.
It's much, much harder than SHA256 and the hash rate will never be equal to an ASIC running SHA-256, but relative speed ups should theoretically remain the same as we saw in the progression of BitCoin mining.

Nevertheless, LTC can in fact be mined by even single core FPGAs at a much higher rate than with GPUs I estimate 10x to 100x speed up depending on a number of factors including bus speed, on-die ram, internal clock speed etc.

Also ASICs can be built from some FPGAs and those ASICs can be still faster.

The trick is to find an FPGA with as much on die memory as possible and then ensure that your implementation takes full advantage of it.

For the record I built an FPGA scrypt miner a few weeks ago.
That particular FPGA has a direct path to ASIC from the mfr because it's designed specifically for prototyping ASICs.

The value of LTC is not high enough at this time to justify the cost of the FPGA and in my case at least, an error in the code that I was using caused it to overheat (I couldn't get temp data out of it, the miner was reading 0 the whole time, it never slowed down and sounded like a jet landing the whole time). It quickly became a paperweight.
A $10,000 paperweight.

Nevertheless I did a lot of work on it in my spare time and I am considering a kickstarter project to fund continued development.
I almost started one until I found out the true cost of the FPGA I managed to toast and realized it was probably out of reach of pretty much everyone including myself.
It would have to come down by an order of magnitude in cost before it would be a financially viable option for most folks.

Quote from: Nova! on May 24, 2013, 06:18:42 PM

I have found what I believe is a shortcut in scrypt that if implemented correctly in hardware could dramatically speed up the hashrate.
I believe it should work and I know how I would implement it if I had the resources to acquire the FPGA and tools I need.

To show good faith I will elaborate on the algo and how the shortcut would work.
This is really over simplified, but you are free to take this idea and roll with it.

scrypt the algo used by LTC and in fact all hashing algos, are comprised of 2 predominant steps.
#1 Generate a random list
#2 Hash across it.

To generate consistent results the random algo is actually deterministic pseudo-random and the setup for it is determined by a seed.
We will call this the prng.

The other step is hashing which is pretty well understood, you take a value from list a and replace it with a value from list b.
When you are done iterating you now have a hash.

scrypt differs mostly because it uses an entirely new list so frequently.
The setup and tear down of this list requires quite a bit of CPU time and a lot of time is wasted on the memory bus performing storage & retrieval operations.
It cannot be done concurrently because the list itself changes frequently.

The shortcut is to have a multicore setup and a ton of on-die ram.
A dedicated prng core which does the setup and teardown for the second core.

The secondary core is the hashing core. It would tell the prng core to setup a new list.
Then it would retrieve position x off the list from the shared memory space.
Other than that it would also perform all the normal hashing functions in a dedicated memory space.

I believe the total I need to make this work is about $12k USD, the FPGA I'm targeting right now is $10k and a license for the dev tools will be about $2k.
If I can find a less expensive option then I will go for that, but there aren't that many FPGAs that meet requirements right now.
The particular target FPGA also has a direct path to ASIC from the mfr.

If you're willing to donate to the effort, I will keep you in the loop with full disclosure including build instructions and a copy of the sources and the firmware.
I haven't decided on a license for this if it works, but you will at least have a right to personal use.
Perhaps if enough people are interested in production level manufacturing we could go a different route. I'm not particularly interested in making this something I do for the rest of my life, but the contrarian in me is very excited by the potential here.

The LTC donation address is below.
LKfKkRMvMf2stQMNzQdKCvaf2YueAv1QSa

You can also donate BTC to the key in my sig.
There is no maximum but if you do decide to donate please send at least 0.5 LTC or the equivalent in BTC.
Then post just the address you donated from and I'll PM you here with a bitmessage key to join the group.

Thanks in advance!

Here let's preserve a copy together. I agree, if I change anything in the original description after reading your "red flags" then I must be a scammer. It could never be that I learned new information or tried to clarify or found out I was wrong about something.

So far that hasn't been the case.
By the way you have some interesting points. They are a bit outside the realm of what I was trying to get at here, but if you see further optimizations please feel free to pitch in. You are actually giving us all a valuable learning experience in the right way to implement an FPGA for LTC. BTW what chipset and board are you using? What devtools? Thanks!

Nova!

full member

Activity: 140

Merit: 101

Quote from: WindMaster on May 25, 2013, 04:22:59 PM

This is a public service announcement for anyone that feels inclined to start sending BTC or LTC to nearly anyone that drops the words "Litecoin" and "FPGA" in the same post, even when it's apparent to everyone with in-depth knowledge on the subject that the OP likely doesn't know what he's talking about.

Quote from: Puycheval on May 25, 2013, 01:27:35 PM

Quote from: Nova! on May 24, 2013, 06:18:42 PM

scrypt differs mostly because it uses an entirely new list so frequently.

I think the big problem is that you can't unroll salsa mixing because of its recursive form. Thus you can't parallelize calculations as you can do with sha256. The only thing you can do is to have multiple instance of your 'cores' run in parallel. But I don't think Stratix have enough on-die ram (52 Mbit max) to overwhelm a pool as you said.

The OP's claim is way worse than that. If you look at what he posted over in the 'scrypt is "memory intensive" therefore no ASICs, but how?' thread, he elaborates a bit more on how he thinks scrypt works, what what he actually means when he talks about setting up and tearing down an entirely new list:

Quote from: Nova! on May 23, 2013, 02:26:37 AM

Scrypt is resistant because it is memory hard.
The amount of memory required is controlled specifically by a psuedo randomly generated list that is changed at every hashing cycle. This means that the setup and take down of the list is expensive and it has to be done with each iteration. The alternative is to only generate a small subset of it, but the generation algorithm is itself CPU intensive.

The OP failed the basic scrypt knowledge test, I'm afraid. I saw a flawed explanation posted somewhere that looked like the OP's description, but can't remember where I saw it. This is far enough "out there" that I bet the OP had to have read it in the same place.

You can calculate scrypt+salsa20/8(1024,1,1) as used in Litecoin with a fixed 128kB buffer + a bit of extra scratchpad memory, all day long, without calculating any sort of dynamic list that determines how much memory will be involved in calculating the hash. And the memory access pattern will be exactly the same every time you calculate the hash. In fact, my own FPGA implementation of scrypt with external DDR3 leveraged this fact by shifting every scrypt core 1 clock cycle from the previous one, such that a burst read or write to/from DDR3 would fetch all the data needed (or written) by each core precisely when that core was going to use (or generate) it. This was possible because the memory access pattern and amount of memory needed is exactly the same every time.

Quote from: Puycheval on May 25, 2013, 01:27:35 PM

Quote

The shortcut is to have a multicore setup and a ton of on-die ram.
A dedicated prng core which does the setup and teardown for the second core.

I don't see the shorcut here. Are you thinking of a two stages pipeline with dual port ram in the middle ?

The OP doesn't have a shortcut at all. Even if scrypt worked the way he described, the OP's suggestion of a 2 core approach as a "shortcut" would be a retarded design for an FPGA implementation.

Quote from: Puycheval on May 25, 2013, 01:27:35 PM

To conclude, I don't understand why you need funding for your idea because you can test everything with simulation. Altera provides a free web edition of their dev tools that don't allow you to target Stratix but you can target Cyclone V. You should be able to validate your idea with 12 Mb of on-die ram. Then you'll have tangible results to get funds for a dev board which are really expensive Tongue

+1

In fact, if we look at his post in the other thread, he claims he already implemented it, and destroyed the FPGA on his dev board while it "sounded like a jet landing the whole time":

Quote from: Nova! on May 23, 2013, 02:26:37 AM

For the record I built an FPGA scrypt miner a few weeks ago.
That particular FPGA has a direct path to ASIC from the mfr because it's designed specifically for prototyping ASICs.

The value of LTC is not high enough at this time to justify the cost of the FPGA and in my case at least, an error in the code that I was using caused it to overheat (I couldn't get temp data out of it, the miner was reading 0 the whole time, it never slowed down and sounded like a jet landing the whole time). It quickly became a paperweight.
A $10,000 paperweight.

Does not compute, for anyone with technical knowledge on the subject. In the highly unlikely case that this did actually occur, it would mean the OP already has the dev tools as well and would have no need to replace the whole dev board (as he states earlier in this thread that the dev board costs much more than the FPGA IC), it would be more cost productive to desolder the FPGA IC, clean up the BGA pads and reflow a new FPGA onto the board.

Quote from: Nova! on May 23, 2013, 02:26:37 AM

Also ASICs can be built from some FPGAs and those ASICs can be still faster.

Altera's Hardcopy program is really just a mask programmed FPGA that Altera has pre-qualified for your particular netlist to run at a little higher speed. I wouldn't call it a true ASIC, it doesn't achieve anywhere near the speed-up that you'd normally experience going from an FPGA to an actual real ASIC implementation built from your original Verilog source, and only achieves a few % cost reduction over Altera's equivalent FPGA's. The only reason it costs less than the equivalent FPGA is that Altera doesn't have to qualify and test the FPGA for every possible design someone could load on it, they only have to test and qualify it for your specific netlist that was mask programmed on the die. Best not point at Hardcopy as a valid route to an ASIC implementation for LTC.

Hopefully this gives people a little better idea what the odds are the OP is trying to scam people. And I see people in this thread have already been sending LTC and BTC to him! Wow..

OP: Take some time to learn how scrypt works. Read Percival's original Tarsnap scrypt whitepaper. Check out the source code for a few scrypt implementations. That way you can have the correct details on the next BS / scam attempt. Suggesting that scrypt's memory requirements are dynamic and determined by an expensively computed list calculated on each iteration was your biggest mistake here.

So I take it you have a functioning FPGA for LTC? I would actually love to know even more about your design. You do come off sounding like you have an impressive amount of knowledge on the subject. As for me, I'm probably still standing at the top of Mt Stupid especially when I try to relate the concepts to others.
Nevertheless I am a programmer. I've spent half my life finding optimizations in code, I'm pretty sure I see this one.

Ok so going back to where we started from.
#1 I am not raising money to develop a super secret top of the line nuclear rocket powered FPGA. I have said from the beginning that I think I see a way to optimize a section of scrypt away. I would like the opportunity to explore that option. That is why the thread is entitled A custom designed FPGA miner for LTC. The implication is that there would be difference in the way it works, but the output should be the same but faster.

#2 I'm pretty certain it's a well known fact by now that I had a board from a previous employer, I was working on porting the OpenCL miner to it and it ran fast and it ran hot and it fried the board. This may have had something to do with the fact that I was using a version of the dev environment which I probably should not have been. Hence the need for legit dev tools, which by the way is not just the IDE. Had I known I was working with a $10k board at the time I would have taken a bit more caution. But when I left my previous employer I asked if I could buy it at cost and they sold it to me for $850. I wasn't privy to the fact that it probably cost them a bit more and now days I'm wondering if they even knew. I'm pretty sure I did mention those facts in the referenced thread. If not I'm sure I've mentioned them enough in other threads that it should be considered disclosed by now. But yes everyone, you should be aware. My original plan was just to compile the OpenCL miner and directly run it on the hardware. I did this with an expired/not legit version of the dev tools, but it seemed to work. The side effect of doing it that way was that it ran great for 3 hours, got hot enough to toast marshmallows then would shut down for an hour. After a few days of working with this in the end fried itself. Hence I need to rule out the actual dev tools. Also it made me look closely at the source which is where I think I see my answer.

#3 Of course I'm going to post the simplest laymans explanation of what is going on under the hood. I declared explicitly in the beginning when I mentioned "This is really oversimplified but here it goes..." At which point I then explain hashing in general and that scrypt is hard to FPGA because it requires access to a lot of RAM. If that RAM is on the die it really does help to speed things along quite a bit even in a default scrypt implementation. Run the OpenCL miner on the chip I'm talking about and see. I also then go on and explain that the innovation is creating what are effectively 2 cores and letting them communicate. One core has just the gates for the PRNG, the other core has the gates for the remainder of the hashing. They share a common memory space for the lookup table. I believe by decoupling them and taking advantage of certain reductions in complexity that we would see a speed up. I don't know that this is in fact the case, but it looks right to me and I'm willing to do what it takes to prove or disprove my theory.

#4 I was unaware that the Altera ASIC was not a true ASIC. This is important information and is game changing in my eyes, since part of the point is to provide a direct path to asic for anyone who wanted to participate. Because of this I am willing to fully refund anyone who has contributed. All they have to do is ask. Thank you for bringing that to my attention. I will keep it under advisement as I try to find a new path.

BChydro

hero member

Activity: 1426

Merit: 506

I personally would rather leave scrypt mining FPGA or ASIC free. That's the main appeal of it to me. But if an FPGA device were created I'm sure there would be ample interest in it, but then the question is now where will all the GPU miners go???

WindMaster

sr. member

Activity: 347

Merit: 250

Just preserving a copy so the OP can't change his original posts in this thread or the other one to remove all the incorrect claims that are red flags:

Quote from: Nova! on May 23, 2013, 02:26:37 AM

Scrypt is resistant because it is memory hard.
The amount of memory required is controlled specifically by a psuedo randomly generated list that is changed at every hashing cycle. This means that the setup and take down of the list is expensive and it has to be done with each iteration. The alternative is to only generate a small subset of it, but the generation algorithm is itself CPU intensive.

The solution to this problem is to have 2 cores and a metric crapton of on die ram.
In this design, 1 core runs the prng algo, the other does the hashing.
They need to coordinate a little with one another, but it is most definitely doable.
It's much, much harder than SHA256 and the hash rate will never be equal to an ASIC running SHA-256, but relative speed ups should theoretically remain the same as we saw in the progression of BitCoin mining.

Nevertheless, LTC can in fact be mined by even single core FPGAs at a much higher rate than with GPUs I estimate 10x to 100x speed up depending on a number of factors including bus speed, on-die ram, internal clock speed etc.

Also ASICs can be built from some FPGAs and those ASICs can be still faster.

The trick is to find an FPGA with as much on die memory as possible and then ensure that your implementation takes full advantage of it.

For the record I built an FPGA scrypt miner a few weeks ago.
That particular FPGA has a direct path to ASIC from the mfr because it's designed specifically for prototyping ASICs.

The value of LTC is not high enough at this time to justify the cost of the FPGA and in my case at least, an error in the code that I was using caused it to overheat (I couldn't get temp data out of it, the miner was reading 0 the whole time, it never slowed down and sounded like a jet landing the whole time). It quickly became a paperweight.
A $10,000 paperweight.

Nevertheless I did a lot of work on it in my spare time and I am considering a kickstarter project to fund continued development.
I almost started one until I found out the true cost of the FPGA I managed to toast and realized it was probably out of reach of pretty much everyone including myself.
It would have to come down by an order of magnitude in cost before it would be a financially viable option for most folks.

Quote from: Nova! on May 24, 2013, 06:18:42 PM

I have found what I believe is a shortcut in scrypt that if implemented correctly in hardware could dramatically speed up the hashrate.
I believe it should work and I know how I would implement it if I had the resources to acquire the FPGA and tools I need.

To show good faith I will elaborate on the algo and how the shortcut would work.
This is really over simplified, but you are free to take this idea and roll with it.

scrypt the algo used by LTC and in fact all hashing algos, are comprised of 2 predominant steps.
#1 Generate a random list
#2 Hash across it.

To generate consistent results the random algo is actually deterministic pseudo-random and the setup for it is determined by a seed.
We will call this the prng.

The other step is hashing which is pretty well understood, you take a value from list a and replace it with a value from list b.
When you are done iterating you now have a hash.

scrypt differs mostly because it uses an entirely new list so frequently.
The setup and tear down of this list requires quite a bit of CPU time and a lot of time is wasted on the memory bus performing storage & retrieval operations.
It cannot be done concurrently because the list itself changes frequently.

The shortcut is to have a multicore setup and a ton of on-die ram.
A dedicated prng core which does the setup and teardown for the second core.

The secondary core is the hashing core. It would tell the prng core to setup a new list.
Then it would retrieve position x off the list from the shared memory space.
Other than that it would also perform all the normal hashing functions in a dedicated memory space.

I believe the total I need to make this work is about $12k USD, the FPGA I'm targeting right now is $10k and a license for the dev tools will be about $2k.
If I can find a less expensive option then I will go for that, but there aren't that many FPGAs that meet requirements right now.
The particular target FPGA also has a direct path to ASIC from the mfr.

If you're willing to donate to the effort, I will keep you in the loop with full disclosure including build instructions and a copy of the sources and the firmware.
I haven't decided on a license for this if it works, but you will at least have a right to personal use.
Perhaps if enough people are interested in production level manufacturing we could go a different route. I'm not particularly interested in making this something I do for the rest of my life, but the contrarian in me is very excited by the potential here.

The LTC donation address is below.
LKfKkRMvMf2stQMNzQdKCvaf2YueAv1QSa

You can also donate BTC to the key in my sig.
There is no maximum but if you do decide to donate please send at least 0.5 LTC or the equivalent in BTC.
Then post just the address you donated from and I'll PM you here with a bitmessage key to join the group.

Thanks in advance!

WindMaster

sr. member

Activity: 347

Merit: 250

This is a public service announcement for anyone that feels inclined to start sending BTC or LTC to nearly anyone that drops the words "Litecoin" and "FPGA" in the same post, even when it's apparent to everyone with in-depth knowledge on the subject that the OP likely doesn't know what he's talking about.

Quote from: Puycheval on May 25, 2013, 01:27:35 PM

Quote from: Nova! on May 24, 2013, 06:18:42 PM

scrypt differs mostly because it uses an entirely new list so frequently.

I think the big problem is that you can't unroll salsa mixing because of its recursive form. Thus you can't parallelize calculations as you can do with sha256. The only thing you can do is to have multiple instance of your 'cores' run in parallel. But I don't think Stratix have enough on-die ram (52 Mbit max) to overwhelm a pool as you said.

The OP's claim is way worse than that. If you look at what he posted over in the 'scrypt is "memory intensive" therefore no ASICs, but how?' thread, he elaborates a bit more on how he thinks scrypt works, that what he actually means when he talks about setting up and tearing down an entirely new list:

Quote from: Nova! on May 23, 2013, 02:26:37 AM

Scrypt is resistant because it is memory hard.
The amount of memory required is controlled specifically by a psuedo randomly generated list that is changed at every hashing cycle. This means that the setup and take down of the list is expensive and it has to be done with each iteration. The alternative is to only generate a small subset of it, but the generation algorithm is itself CPU intensive.

The OP failed the basic scrypt knowledge test, I'm afraid. I saw a flawed explanation posted somewhere that looked like the OP's description, but can't remember where I saw it. This is far enough "out there" that I bet the OP had to have read it in the same place.

You can calculate scrypt+salsa20/8(1024,1,1) as used in Litecoin with a fixed 128kB buffer + a bit of extra scratchpad memory, all day long, without calculating any sort of dynamic list that determines how much memory will be involved in calculating the hash. And the memory access pattern will be exactly the same every time you calculate the hash. In fact, my own FPGA implementation of scrypt with external DDR3 leveraged this fact by shifting every scrypt core 1 clock cycle from the previous one, such that a burst read or write to/from DDR3 would fetch all the data needed (or written) by each core precisely when that core was going to use (or generate) it. This was possible because the memory access pattern and amount of memory needed is exactly the same every time.

Quote from: Puycheval on May 25, 2013, 01:27:35 PM

Quote

The shortcut is to have a multicore setup and a ton of on-die ram.
A dedicated prng core which does the setup and teardown for the second core.

I don't see the shorcut here. Are you thinking of a two stages pipeline with dual port ram in the middle ?

The OP doesn't have a shortcut at all. Even if scrypt worked the way he described, the OP's suggestion of a 2 core approach as a "shortcut" would be a retarded design for an FPGA implementation.

Quote from: Puycheval on May 25, 2013, 01:27:35 PM

To conclude, I don't understand why you need funding for your idea because you can test everything with simulation. Altera provides a free web edition of their dev tools that don't allow you to target Stratix but you can target Cyclone V. You should be able to validate your idea with 12 Mb of on-die ram. Then you'll have tangible results to get funds for a dev board which are really expensive Tongue

+1

In fact, if we look at his post in the other thread, he claims he already implemented it, and destroyed the FPGA on his dev board while it "sounded like a jet landing the whole time":

Quote from: Nova! on May 23, 2013, 02:26:37 AM

For the record I built an FPGA scrypt miner a few weeks ago.
That particular FPGA has a direct path to ASIC from the mfr because it's designed specifically for prototyping ASICs.

The value of LTC is not high enough at this time to justify the cost of the FPGA and in my case at least, an error in the code that I was using caused it to overheat (I couldn't get temp data out of it, the miner was reading 0 the whole time, it never slowed down and sounded like a jet landing the whole time). It quickly became a paperweight.
A $10,000 paperweight.

Does not compute, for anyone with technical knowledge on the subject. In the highly unlikely case that this did actually occur, it would mean the OP already has the dev tools as well and would have no need to replace the whole dev board (as he states earlier in this thread that the dev board costs much more than the FPGA IC), it would be more cost productive to desolder the FPGA IC, clean up the BGA pads and reflow a new FPGA onto the board.

Quote from: Nova! on May 23, 2013, 02:26:37 AM

Also ASICs can be built from some FPGAs and those ASICs can be still faster.

Altera's Hardcopy program is really just a mask programmed FPGA that Altera has pre-qualified for your particular netlist to run at a little higher speed. I wouldn't call it a true ASIC, it doesn't achieve anywhere near the speed-up that you'd normally experience going from an FPGA to an actual real ASIC implementation built from your original Verilog source, and only achieves a few % cost reduction over Altera's equivalent FPGA's. The only reason it costs less than the equivalent FPGA is that Altera doesn't have to qualify and test the FPGA for every possible design someone could load on it, they only have to test and qualify it for your specific netlist that was mask programmed on the die. Best not point at Hardcopy as a valid route to an ASIC implementation for LTC.

Hopefully this gives people a little better idea what the odds are the OP is trying to scam people. And I see people in this thread have already been sending LTC and BTC to him! Wow..

OP: Take some time to learn how scrypt works. Read Percival's original Tarsnap scrypt whitepaper. Check out the source code for a few scrypt implementations. That way you can have the correct details on the next BS / scam attempt. Suggesting that scrypt's memory requirements are dynamic and determined by an expensively computed list calculated on each iteration was your biggest mistake here.

Nova!

full member

Activity: 140

Merit: 101

Quote from: ElGabo on May 25, 2013, 01:38:41 PM

Sent 1 LTC form LXj15uZkCFMecbKcLVMsswTtnWvRqqhuUm

Could you send me some info by PM?

Yes, I'm also finding out some exciting information in this thread and am trying to incorporate the new information in the writeup.

Puycheval

member

Activity: 95

Merit: 10

Quote from: Nova! on May 25, 2013, 01:35:55 PM

That's a good idea. I was unaware there was a free tool that wouldn't let me target the chip I'm planning to target.
Is there a compelling reason to choose Cyclone V over Stratix V? Other than cost I mean?

Cyclones are a smaller, slower version of the Stratix lineup. The biggest Cyclone V is approximatively a third of the biggest Stratix V for LE count.
I suppose it should be enough to validate the shortcut.

ElGabo

hero member

Activity: 635

Merit: 500

Sent 1 LTC form LXj15uZkCFMecbKcLVMsswTtnWvRqqhuUm

Could you send me some info by PM?

Nova!

full member

Activity: 140

Merit: 101

Quote from: Puycheval on May 25, 2013, 01:27:35 PM

I'm new to crypto currencies and I'm looking for a FPGA implementation too.

Quote from: Nova! on May 24, 2013, 06:18:42 PM

scrypt differs mostly because it uses an entirely new list so frequently.

I think the big problem is that you can't unroll salsa mixing because of its recursive form. Thus you can't parallelize calculations as you can do with sha256. The only thing you can do is to have multiple instance of your 'cores' run in parallel. But I don't think Stratix have enough on-die ram (52 Mbit max) to overwhelm a pool as you said.

Quote

The shortcut is to have a multicore setup and a ton of on-die ram.
A dedicated prng core which does the setup and teardown for the second core.

I don't see the shorcut here. Are you thinking of a two stages pipeline with dual port ram in the middle ?

To conclude, I don't understand why you need funding for your idea because you can test everything with simulation. Altera provides a free web edition of their dev tools that don't allow you to target Stratix but you can target Cyclone V. You should be able to validate your idea with 12 Mb of on-die ram. Then you'll have tangible results to get funds for a dev board which are really expensive Tongue

That's a good idea. I was unaware there was a free tool that wouldn't let me target the chip I'm planning to target.
Is there a compelling reason to choose Cyclone V over Stratix V? Other than cost I mean?

Puycheval

member

Activity: 95

Merit: 10

I'm new to crypto currencies and I'm looking for a FPGA implementation too.

Quote from: Nova! on May 24, 2013, 06:18:42 PM

scrypt differs mostly because it uses an entirely new list so frequently.

I think the big problem is that you can't unroll salsa mixing because of its recursive form. Thus you can't parallelize calculations as you can do with sha256. The only thing you can do is to have multiple instance of your 'cores' run in parallel. But I don't think Stratix have enough on-die ram (52 Mbit max) to overwhelm a pool as you said.

Quote

The shortcut is to have a multicore setup and a ton of on-die ram.
A dedicated prng core which does the setup and teardown for the second core.

I don't see the shorcut here. Are you thinking of a two stages pipeline with dual port ram in the middle ?

To conclude, I don't understand why you need funding for your idea because you can test everything with simulation. Altera provides a free web edition of their dev tools that don't allow you to target Stratix but you can target Cyclone V. You should be able to validate your idea with 12 Mb of on-die ram. Then you'll have tangible results to get funds for a dev board which are really expensive Tongue

Nova!

full member

Activity: 140

Merit: 101

Quote from: anderl on May 25, 2013, 10:57:56 AM

Quote from: Nova! on May 25, 2013, 09:22:18 AM

Quote from: anderl on May 25, 2013, 05:07:38 AM

Quote from: BladeRunner on May 25, 2013, 04:50:36 AM

Perhaps Nova can tell us exactly what FPGA IC costs 10,000 dollars. I have looked at the manufactures and I dont see any chip that 10K

You haven't looked hard enough. Some of the latest gen FPGA Stratix V development boards by Altera are running about 9k to 12k each.

I've worked with their older boards. There is another theoretical route he can try to take which uses some of the recent innovations with SRAM. I have my own theories and I've mocked up some code but I don't have plans to buy a board to implement them.

For Scrypt it is not worth the time and money. The total circulation and average daily transactions of LTC does not make it a good investment. Anyone trying to implement a FGPA for Scrypt while LTC is under $5 is a scam (not directed at OP), or has not done the math.

Or is interested in it for reasons of academic curiosity and not an attempt to make a commercial product or recoup investment later

I'm interested in hearing your SRAM idea as well as your other theories.

When LTC gets over $5. Right now I'm find using ASICs for BTC and GPUs for LTC and scrypts.

But look into what is coming to market right now in FPGAs.

I can't say as I disagree with you there at all. My only response to that would be it's pure research.
Xerox was researching GUI's back at a time when almost no one had a home computer.
It didn't make financial sense either it was more or less a "hey cool look what we can do" sort of thing.
That's really all this is intended to be.

soniq

sr. member

Activity: 462

Merit: 250

Quote from: Nova! on May 25, 2013, 09:06:17 AM

Quote from: soniq on May 25, 2013, 02:53:25 AM

Sent some LTC your way, definitely interesting.

Thanks what address did you send from?

I sent you a PM

Topic: A custom designed FPGA miner for LTC? - page 2. (Read 5825 times)