A RAM based fpga LTC miner - page 3.

tacotime

legendary

Activity: 1484

Merit: 1005

Quote from: DeathAndTaxes on August 31, 2013, 05:27:53 PM

TL/DR
Whatever the relative performance of this FPGA is to a CPU miner it would be WORSE if the p value was higher. LTC decision to use a low p value makes what otherwise would be a nearly impossible task into one which is merely challenging.

I doubt it. You should have a look at Solar Designer's TMTO data with N=2^14, r=2^3, p=1.

As you can see, as memory exponentially decreases integer ops exponentially increase. He was easily able to get the memory usage into the kilobytes and still crank out hashes. I'd guess that exactly the same is true with N=2^10, r=1, p=1 too. It's the same balancing act you run into no matter what value you use for N or r; at higher N you may increase the difficulty by a smaller constant factor, but overall I doubt increasing N or r will make scrypt much more FPGA/ASIC unfriendly when they finally iron out the FPGA implementation.

01BTC10

vip

Activity: 756

Merit: 503

Waiting anxiously to read the specs on this Cheesy

hasle2

full member

Activity: 122

Merit: 100

I wish I had the time to learn how to design these things. Looks like so much fun.

DeathAndTaxes

donator

Activity: 1218

Merit: 1080

Gerald Davis

You used a lot of double speak. First I am aware of the space time tradeoff however rather than explain it in every single post it is useful to look at the max scratchpad size. 128KB scratchpad is going to require less memory and less bandwidth than a 16MB scratchpad regardless of what space time tradeoff is employed. A device only has a finite amount of computing power and while you can trade time for space needing less space to start with always helps.

As for higher parameter value having no effect on the relative performance of CPU, GPU, and FGPA/ASICs that is just false. Scrypt was designed to be GPU and specialized device resistant. This is important in password hashing as most servers are using CPU and attacker will likely choose the most effective component for brute forcing. By making CPU performance superior it prevents attackers from gaining an advantage. You can test this yourself. Modify cgminer OpenCL kernel to use a higher p value. Around 2^14 GPU relative performance is essentially gone. It is comparable to a CPU throughput. At 2^16 GPU relative performance is falling far behind. At 2^20 the GPU never completes.

You say one one hand that the memory requirement doesn't matter and on the other hand that FPGA are hard because they need lots of memory and wide buses. Well guess what the higher the p value the MORE memory and wider busses that is needed. At 2^14 roughly 128x the max scratchpad size is going to mean 128x as much bandwidth is necessary. So the lower the p value the EASIER the job is for FPGA and ASIC builders. They can use less memory and narrower busses that means less cost, less complexity, higher ROI%. Sure one isn't required to use max scratchpad size because one can compute on the fly but once again the whole point to the space-time tradeoff is that the advantage to doing so is reduced.

Lastly yes the 128KB is per core but so is the 16MB using the default parameters. If 128KB per core increases memory, bandwidth, and/or die size per core then a 16MB requirement would maker it even harder. So yes the parameters chosen by LTC makes it 128x less memory hard than the default. You use circular logic to say the max scratch pad size is irrelevant because one can optimize the size of the scratchpad to available resources. This doesn't change the fact that due to the space-time tradeoff you aren't gaining relative performance. Using a higher max scatchpad requires either more memory and bandwidth OR requires more computation. The throughput on the FPGA, GPU, CPU is going to be reduced. Now if they were all reduced equally it wouldn't matter all that matters is relative not nominal performance. However the LTC parameters chosen are horrible for CPU usage. CPU have a limited ability for parallel execution. Usually 4 or 8 independent cores. 128KB per core * 8 = 1MB. That's right today with systems that can install multiple GB for very cheap cost the Scrypt paramters chosen bottleneck performance on a CPU. GPU on the other hand are highly parallel execution engines but they have limited memory and that memory is at a higher cost than CPU have access to.

TL/DR
Whatever the relative performance of this FPGA is to a CPU miner it would be WORSE if the p value was higher. LTC decision to use a low p value makes what otherwise would be a nearly impossible task into one which is merely challenging.

ssvb

newbie

Activity: 39

Merit: 0

Quote from: DeathAndTaxes on August 30, 2013, 10:03:53 PM

LTC uses the parameters (2^10, 1, 1) which results in a token 128KB max scratchpad size. That isn't a typo it is kilobytes.

You are just forgetting to multiply this scratchpad size by the number of "cores", "threads" or some other entities (the way how you call them depends on the underlying technology) in the miner device. All these "cores" are simultaneously doing hashes calculations, each with its own scratchpad. The reason why FPGAs and ASICs work so great for SHA-256 is that the number of gates needed for a single SHA-256 "core" is really small, so one can fit an enormous amount of such cores on a single chip. But each scrypt "core" needs a scratchpad for storing intermediate data, and if the scratchpad is implemented as a SRAM memory, then the number of gates per scrypt "core" just skyrockets. You can fit significantly less scrypt "cores" on a single chip than SHA-256 "cores". There are some tricks for the scratchpad size reduction (LOOKUP_GAP is the right keyword, you can search for it in the forum), which reduce the size of the scratchpad, but this reduction is not free and results in more computations. That's why you can see some people mentioning Space–time tradeoff. The optimal lookup-gap setup depends on the balance between the memory size/performance and the computational power for doing arithmetic operations. It is also possible to use the external memory instead of on-chip SRAM, but the external memory must naturally have wide buses and a lot of bandwidth (memory latency is not critical for scrypt though). The scrypt GPU miners are relying on the GDDR5 speed, with a popular scratchpad size configuration being 64KB (lookup-gap=2), which indicates that the memory speed is the bottleneck and the excessive computational power is already traded off in order to reduce the burden on the memory.

I also suggest checking https://github.com/ckolivas/cgminer/blob/master/SCRYPT-README to find a lot of information, which is intended to be user-comprehensible:
"--lookup-gap
This tunes a compromise between ram usage and performance. Performance peaks at a gap of 2, but increasing the gap can save you some GPU ram, but almost always at the cost of significant loss of hashrate. Setting lookup gap overrides the default of 2, but cgminer will use the --shaders value to choose a thread-concurrency if you haven't chosen one.
SUMMARY: Don't touch this"

Quote

The default Scrypt parameters (2^14, 8, 1) result in a 16MB max scratchpad size roughly 128x as "memory hard".

The LTC scrypt parameters are sufficient for making sure that GPUs are required to have a lot of high bandwidth memory for decent hashing speed. It's all that matters. Increasing the size of the scratchpad is not going to bring any improvements (if by improvements you mean making CPU mining more competitive). Actually some scrypt based cryptocurrencies tried to make it more "memory hard" and failed to really fend off the GPUs. Also the 128x claim is just silly because you are forgetting that bigger scratchpads also inevitably mean more arithmetic operations involved in a single hash calculation. As I mentioned earlier, it is the balance between the memory speed and the arithmetic calculations speed that is important. And the LTC scrypt somehow managed to get it right, even if this actually happened unintentionally.

Regarding the FGPA device on the picture at the start of this topic. Looks like it is going to have external memory bandwidth roughly similar to what is available for the triple channel DDR3 systems. This is still less than the memory bandwidth of a mid-range GDDR5 equipped GPU. I doubt that this FPGA device is capable of demonstrating any mind blowing hashing speed. Still if it manages to scale well with the lookup-gap increase, have low power consumption and/or low device cost, then it might be possibly competitive.

BTW, the appearance of competitive FPGA devices might make people more motivated to try better optimizing scrypt for AMD GPUs (squeeze every last bit of performance and/or reduce power consumption). Bring it on, this stuff may become fun again Wink

Wolf0

member

Activity: 81

Merit: 1002

It was only the wind.

Quote from: divan0w on August 31, 2013, 05:55:29 AM

So what now, VGA mining is finally over?

Even IF this FPGA is real, it's not going to kill GPU mining. It's just going to enable people in places where electricity costs a ton.

hope2907

sr. member

Activity: 432

Merit: 250

yes it is over

divan0w

newbie

Activity: 43

Merit: 0

So what now, VGA mining is finally over?

digitalindustry

hero member

Activity: 798

Merit: 1000

‘Try to be nice’

Ha ha the downward price pressure could be our friends trying to diversify ha ha .

Then after the Wired write up to me Nova has never looked so good.

It could turn out that Nova is one of the most honest currencies around after Nybble of course, I dont expect many to understand of course but those that do , do.

digitalindustry

hero member

Activity: 798

Merit: 1000

‘Try to be nice’

But how does one account for the fact that ASIC companies are out there in the game and have invested to try to fill a market requirement, most of the investment was the time up until now , so when you see the situation from this point of view , its easy to see that they may move in that direction , I have no doubt that it wont be ABC123 .

The more one thinks about LTC , the more one tends to start getting that little paranoid conspiratorial feeling.

Then one looks back on experience and one realizes that its humans nature to do this sort of thing.

Then one realizes that only though these events , do dullards in thier owns designs help the whole.

minerapia

full member

Activity: 168

Merit: 100

Quote

all Bitcoin ASIC companies had to derive their works from existing the current SHA-2 ASICs, starting from scratch would have cost far more than the Bitcoin economy could have supplied, and far more than ASIC companies could afford to spend at their current price points. Its like if current ASIC companies decided to start using 14nm chips. It would be unimaginably expensive to create a technology that doesn't exist yet, its far cheaper to modify existing designs.

Starting from scratch is equally expensive, FYI your analogy is totally wrong. They didnt create or modify any 'technology' for the their asics, its matter of coding and tools.
next time try using google first before "I'd say its a pretty strong gut feeling."

even reading simple wiki article helps,
http://en.wikipedia.org/wiki/Application-specific_integrated_circuit

hendo420

sr. member

Activity: 420

Merit: 250

Quote from: DeathAndTaxes on August 31, 2013, 12:13:09 AM

There was an ASIC in your hand calculator from the 1970s. It didn't cost a billion dollars to design either.

Calculating for inflation it may have.

hendo420

sr. member

Activity: 420

Merit: 250

Quote from: SaltySpitoon on August 30, 2013, 11:52:06 PM

I'm at the point where I have to read over my posts 30 times to make sure I spelled everything rigt, and should probably get some sleep.

DeathAndTaxes

donator

Activity: 1218

Merit: 1080

Gerald Davis

Quote from: SaltySpitoon on August 30, 2013, 11:52:06 PM

all Bitcoin ASIC companies had to derive their works from existing the current SHA-2 ASICs, starting from scratch would have cost far more than the Bitcoin economy could have supplied, and far more than ASIC companies could afford to spend at their current price points.

That is not correct. Bitcoin ASICs are essentially glorified SHA-2 calculators. Input binary blob & target. Output any nonces which result in SHA-2(SHA-2(blob+nonce)) < target. Not to take anything away from what the Bitcoin ASIC companies did but the chips are just performing the "math" of the SHA-2 algorithm. Most of the "smarts" is not the customs ASICs but in the cheap micrprocessor (Rasberry Pi or embeded computer). Far more complex chips are made in universities every year as academic projects. There was an ASIC in your hand calculator from the 1970s. It didn't cost a billion dollars to design either and the tools were a lot more primitive back then.

SaltySpitoon

legendary

Activity: 2590

Merit: 2156

Welcome to the SaltySpitoon, how Tough are ya?

Quote from: DeathAndTaxes on August 30, 2013, 10:03:53 PM

Quote from: SaltySpitoon on August 30, 2013, 09:45:50 PM

I'm aware that this is a FPGA which is doable with Scrypt, however I'd like to go off in a minor tangent. People seem to underestimate how difficult it will be to create a Scrypt ASIC. SHA256 Asics have been used for many many years. They were not new technology, meaning the billions of dollars of research that others had done getting SHA256 ASICs working is not there already for proposed Scrypt ASICs. All the BTC mining ASIC companies needed to do, was make a product that would work for BTC specific hashing, rather than what they were and still are used for, encrypting and decrypting files. The company that decides to start making LTC Asics will need a whole lot more than a few hundred thousand BTC to get their products out the door.

Back on topic, LTC FPGAs actually aren't that difficult to make in theory. LTC's Scrypt hashing requires actually a much lower amount of memory than other scrypt implementations (I believe its 196mb/cycle although I may be off) at that point, or whatever it actually is, I remember the math behind it, but not the actual numbers, you can provide additional hashing power at 1/2 the memory required, and you can still end up with a higher hashrate over current GPUs, while still using fairly inexpensive FPGA technology. So rather than needing to create a new FPGA board that can handle uneconomical amounts of memory, you can just work on designing a chip that will hash fast, and lose performance based on how much memory you can actually supply.

I'll look back over my research tomorrow, and get all of the numbers and such down. I'm tired so I may have said something dumb, I'll correct it later.

LTC uses the parameters (2^10, 1, 1) which results in a token 128KB max scratchpad size. That isn't a typo it is kilobytes. The default Scrypt parameters (2^14, 8, 1) result in a 16MB max scratchpad size roughly 128x as "memory hard".

To my knowledge no Bitcoin ASIC company used existing SHA-2 IP and modified it.

you are correct, I was thinking it was 196kb for some reason (as mentioned tired) all Bitcoin ASIC companies had to derive their works from existing the current SHA-2 ASICs, starting from scratch would have cost far more than the Bitcoin economy could have supplied, and far more than ASIC companies could afford to spend at their current price points. Its like if current ASIC companies decided to start using 14nm chips. It would be unimaginably expensive to create a technology that doesn't exist yet, its far cheaper to modify existing designs. I haven't actually sat down and talked to the ASIC manufacturers, but I'd say its a pretty strong gut feeling.

I've got some super secret projects that would be neat if I could run by you tomorrow (ok not that super secret). I'm at the point where I have to read over my posts 30 times to make sure I spelled everything rigt, and should probably get some sleep.

DeathAndTaxes

donator

Activity: 1218

Merit: 1080

Gerald Davis

Quote from: SaltySpitoon on August 30, 2013, 09:45:50 PM

I'm aware that this is a FPGA which is doable with Scrypt, however I'd like to go off in a minor tangent. People seem to underestimate how difficult it will be to create a Scrypt ASIC. SHA256 Asics have been used for many many years. They were not new technology, meaning the billions of dollars of research that others had done getting SHA256 ASICs working is not there already for proposed Scrypt ASICs. All the BTC mining ASIC companies needed to do, was make a product that would work for BTC specific hashing, rather than what they were and still are used for, encrypting and decrypting files. The company that decides to start making LTC Asics will need a whole lot more than a few hundred thousand BTC to get their products out the door.

Back on topic, LTC FPGAs actually aren't that difficult to make in theory. LTC's Scrypt hashing requires actually a much lower amount of memory than other scrypt implementations (I believe its 196mb/cycle although I may be off) at that point, or whatever it actually is, I remember the math behind it, but not the actual numbers, you can provide additional hashing power at 1/2 the memory required, and you can still end up with a higher hashrate over current GPUs, while still using fairly inexpensive FPGA technology. So rather than needing to create a new FPGA board that can handle uneconomical amounts of memory, you can just work on designing a chip that will hash fast, and lose performance based on how much memory you can actually supply.

I'll look back over my research tomorrow, and get all of the numbers and such down. I'm tired so I may have said something dumb, I'll correct it later.

LTC uses the parameters (2^10, 1, 1) which results in a token 128KB max scratchpad size. That isn't a typo it is kilobytes. The default Scrypt parameters (2^14, 8, 1) result in a 16MB max scratchpad size roughly 128x as "memory hard".

To my knowledge no Bitcoin ASIC company used existing SHA-2 IP and modified it.

SaltySpitoon

legendary

Activity: 2590

Merit: 2156

Welcome to the SaltySpitoon, how Tough are ya?

I'm aware that this is a FPGA which is doable with Scrypt, however I'd like to go off in a minor tangent. People seem to underestimate how difficult it will be to create a Scrypt ASIC. SHA256 Asics have been used for many many years. They were not new technology, meaning the billions of dollars of research that others had done getting SHA256 ASICs working is not there already for proposed Scrypt ASICs. All the BTC mining ASIC companies needed to do, was make a product that would work for BTC specific hashing, rather than what they were and still are used for, encrypting and decrypting files. The company that decides to start making LTC Asics will need a whole lot more than a few hundred thousand BTC to get their products out the door.

Back on topic, LTC FPGAs actually aren't that difficult to make in theory. LTC's Scrypt hashing requires actually a much lower amount of memory than other scrypt implementations (I believe its 196mb/cycle although I may be off) at that point, or whatever it actually is, I remember the math behind it, but not the actual numbers, you can provide additional hashing power at 1/2 the memory required, and you can still end up with a higher hashrate over current GPUs, while still using fairly inexpensive FPGA technology. So rather than needing to create a new FPGA board that can handle uneconomical amounts of memory, you can just work on designing a chip that will hash fast, and lose performance based on how much memory you can actually supply.

I'll look back over my research tomorrow, and get all of the numbers and such down. I'm tired so I may have said something dumb, I'll correct it later.

yochdog

legendary

Activity: 2044

Merit: 1000

The next great unicorn....

marnem

hero member

Activity: 728

Merit: 500

Boss of WallstreetCafe

watching

Pt0x

sr. member

Activity: 266

Merit: 250

I hope to see a good hash rate coming out from this device!

Topic: A RAM based fpga LTC miner - page 3. (Read 13874 times)