A RAM based fpga LTC miner - page 2.

tadakaluri

hero member

Activity: 616

Merit: 500

Looks promising.....

nightengale

hero member

Activity: 574

Merit: 500

Doesn't really matter what the specs are if he's not going to share...?

Lauda

legendary

Activity: 2674

Merit: 3000

Terminated.

I don't belive this...yet. Wink

Zubilica

hero member

Activity: 837

Merit: 1000

Quote from: JLM on September 02, 2013, 07:54:03 AM

Spec?

??
Price?

??
Available on... Huh

Watching.

Donno if it will see the light of day. He has also develop a BTC FPGA. Never sold or marketed, only for his private use.

JLM

full member

Activity: 164

Merit: 100

Spec?

??
Price?

??
Available on... Huh

Watching.

YipYip

hero member

Activity: 574

Merit: 500

Quote from: theDF on September 02, 2013, 01:22:31 AM

Quote from: CoinBuzz on September 02, 2013, 12:54:34 AM

Quote from: theDF on September 01, 2013, 10:40:15 PM

Is there any connection with this? - https://bitcointalksearch.org/topic/asic-testing-on-scrypt-285656

I hardly believe. So, No.

Yeah, mistery solved

Quote from: Tomatocage on September 02, 2013, 12:52:37 AM

Quote from: samfisher on September 01, 2013, 08:47:46 PM

1) It IS an ex BTC miner.

Correct, and it is an ASIC for Scrypt. And he still mines BTC too.

There is NO ASIC for Scrypt !!!...there is NO fucking FGPA for scrypt @ this point (proven)

So far we have a picture of a fgpa....WOW Awesome !! It may be able to mine scrypt but at what 1 hash for 1k investment

theDF

newbie

Activity: 56

Merit: 0

Quote from: CoinBuzz on September 02, 2013, 12:54:34 AM

Quote from: theDF on September 01, 2013, 10:40:15 PM

Is there any connection with this? - https://bitcointalksearch.org/topic/asic-testing-on-scrypt-285656

I hardly believe. So, No.

Yeah, mistery solved

Quote from: Tomatocage on September 02, 2013, 12:52:37 AM

Quote from: samfisher on September 01, 2013, 08:47:46 PM

1) It IS an ex BTC miner.

Correct, and it is an ASIC for Scrypt. And he still mines BTC too.

CoinBuzz

sr. member

Activity: 490

Merit: 250

Quote from: theDF on September 01, 2013, 10:40:15 PM

Is there any connection with this? - https://bitcointalksearch.org/topic/asic-testing-on-scrypt-285656

I hardly believe. So, No.

theDF

newbie

Activity: 56

Merit: 0

Is there any connection with this? - https://bitcointalksearch.org/topic/asic-testing-on-scrypt-285656

b!z

legendary

Activity: 1582

Merit: 1010

Looks like a cool gadget. Do you have instructions for building these?

digitalindustry

hero member

Activity: 798

Merit: 1000

‘Try to be nice’

Quote from: Taxidermista on September 01, 2013, 05:14:55 AM

It's fucking amazing the patient you all have. Fuck-ing-a-ma-zing. Must be something in the air...

I think its just that people are not quite as retarded as they were , or the ones that are still retarded all have BFL orders pending.

and if an effective FPGA has been developed , do you think they are going to be sold to you?

then there is that next bit that suggest that FPGA's are not going to be the super device you think they are.

ASICs are in the works., and some released likely , which are also not going to give the super performance that you might think either for the cost..

So where is the sense of urgency coming from ?

Lauda

legendary

Activity: 2674

Merit: 3000

Terminated.

Where are the specs!?!

Taxidermista

legendary

Activity: 1148

Merit: 1001

It's fucking amazing the patient you all have. Fuck-ing-a-ma-zing. Must be something in the air...

YipYip

hero member

Activity: 574

Merit: 500

Wheres the XPM FGPA Huh

...lolz

digitalindustry

hero member

Activity: 798

Merit: 1000

‘Try to be nice’

So cutting through all the " im smarter and understand "

It is basically is I stated before that an ASIC will just be a more effectient version of a GPU system as opposed to say a reorganization of the fundamentals .

I.e an ASIC may provide from 4x to maybe 10x efficency and less power / heat .

So therefore sCrypt may be the domain of ASIC in the future .

And then if there were to be a next possible iteration it would be out quicker .

BFL CEO

2 to 4 weeks ?

antimattercrusader

sr. member

Activity: 308

Merit: 250

Quote from: ?? on ??

@BFL Josh

Can we pre-order these though BFL???

~BCX~

lmao. STFU and take my ~~BTC~~ ~~LTC~~ YAC!!!!

Why can't we pre-order a 10GH Scypt-Jane unit through BFL at this time?

You see that 600gh/s Card? I think I'll pass....Still have not received my Jalepeno.. but bought a bunch of block erupters and a blade from http://www.wtcr.ca and got it next day in the US.

FiiNALiZE

hero member

Activity: 868

Merit: 500

CryptoTalk.Org - Get Paid for every Post!

ssvb

newbie

Activity: 39

Merit: 0

Quote from: DeathAndTaxes on August 31, 2013, 05:27:53 PM

You used a lot of double speak.

Nah, it's just you still having some trouble understanding

Quote

First I am aware of the space time tradeoff however rather than explain it in every single post it is useful to look at the max scratchpad size. 128KB scratchpad is going to require less memory and less bandwidth than a 16MB scratchpad if everything else is the same.

Let's have a look at the definition of what is "memory hard" in the scrypt paper: "A memory-hard algorithm is thus an algorithm which asymptotically uses almost as many memory locations as it uses operations; it can also be thought of as an algorithm which comes close to using the most memory possible for a given number of operations, since by treating memory addresses as keys to a hash table it is trivial to limit a Random Access Machine to an address space proportional to its running time", "Theorem 2. The function SMixr(B, N) can be computed in 4 * N * r applications of the Salsa20/8 core using 1024 * N * r + O(r) bits of storage"

You can see that scrypt is just equally memory hard for all the scratchpad sizes. The ratio between the number of scratchpad access operations and the number of salsa20/8 calculations remains the same.

Quote

As for higher parameter value having no effect on the relative performance of CPU, GPU, and FGPA/ASICs that is just false.

What I said was "Increasing the size of the scratchpad is not going to bring any improvements (if by improvements you mean making CPU mining more competitive)". How did it turn into "no effect on the relative performance of CPU, GPU, and FGPA/ASICs"? CPU miners are at a serious disadvantage right now, so the effect on the relative performance must be really significant in favour of CPU in order to turn the tables.

In practice, increasing the size of the scratchpad will make it harder to fit in CPU caches. To mitigate the unwanted latency of random accesses, scrypt uses parameter 'r'. Basically if r=1 (the default for LTC), then the scratchpad is accessed as 128 byte chunks at random locations. If r=8, then the memory accesses are done as 1024 byte chunks at random locations. In the former case, the cache miss penalty is hit once per 128 bytes. In the latter case, the cache miss penalty is hit once per 1024 bytes (the sequential accesses after the first cache miss are automatically prefetched, at least in theory). Having high 'r' value reduces the effect of memory access latency penalty for the CPU. And the latency is not an issue for the GPU in the first place. Additionally, if the CPU has to access the memory, then the memory controller must have enough bandwidth. For example, my Core i7 860 processor currently has ~29 kHash/s performance in cpuminer. And the STREAM benchmark (built as multithreaded with OpenMP support) shows ~10GB/s of practically available memory bandwidth. These ~10GB/s of memory bandwidth would translate to the theoretical hard hashing speed limit ~38 kHash/s if the CPU caches were not helping. There is not much headroom as I can see, and my processor does not even have AVX2.

Quote

Scrypt was designed to be GPU and specialized device resistant. This is important in password hashing as most servers are using CPU and attacker will likely choose the most effective component for brute forcing. By making CPU performance superior it prevents attackers from gaining an advantage. You can test this yourself. Modify cgminer OpenCL kernel to use a higher p value. Around 2^14 GPU relative performance is essentially gone. It is comparable to a CPU throughput. At 2^16 GPU relative performance is falling far behind.

This is confusing, did you actually mean the 'N' value? Please just provide the patches for your changes to cgminer and cpuminer that you used for this comparison.

But in general, the GPU tuning is not easy because there are many parameters to tweak. Poorly selected configuration can result in poor hashing performance even for the LTC scrypt. You can find many requests for help with the configuration in the forum. So your poor performance report does not mean anything.

Quote

At 2^20 the GPU never completes.

And surely you can raise the memory requirements so high, that they would make mining problematic on the current generation of the video cards purely thanks to insufficient amount of GDDR5 memory. But guess what? In a year or so, the next generation of video cards will have more memory and suddenly GPU mining will again become seriously better than on the CPU. Designing the algorithm around some magic limits which may become ineffective at any time is not the best idea. The current "small" scratchpad size for scrypt focuses on memory bandwidth instead of relying on artificial limits such as memory size (which can be easily increased, especially in the custom built devices).

Quote

You say one one hand that the memory requirement doesn't matter and on the other hand that FPGA are hard because they need lots of memory and wide buses. Well guess what the higher the p value the MORE memory and wider busses that is needed. At 2^14 roughly 128x the max scratchpad size is going to mean 128x as much bandwidth is necessary.

Yes, but only if also backed by roughly 128x more computational power. And likewise, the enormous computational power of FPGA/ASIC must be backed by a lot of memory bandwidth, otherwise it will be wasted.

Quote

So the lower the p value the EASIER the job is for FPGA and ASIC builders. They can use less memory and narrower busses that means less cost, less complexity, higher ROI%. Sure one isn't required to use max scratchpad size because one can compute on the fly but once again the whole point to the space-time tradeoff is that the advantage to doing so is reduced.

They can't use slower external memory, because it already needs to be damn fast.

Quote

Lastly yes the 128KB is per core but so is the 16MB using the default parameters. If 128KB per core increases memory, bandwidth, and/or die size per core then a 16MB requirement would maker it even harder.

Yes, the absolute hashing speed would just drop significantly with the 16MB scratchpad. But it would drop on CPU, GPU, FPGA or any other kind of mining device.

Quote

So yes the parameters chosen by LTC makes it 128x less memory hard than the default.

Sigh. Please just read the definition of "memory hard" in the scrypt paper.

Quote

You use circular logic to say the max scratch pad size is irrelevant because one can optimize the size of the scratchpad to available resources. This doesn't change the fact that due to the space-time tradeoff you aren't gaining relative performance. Using a higher max scatchpad requires either more memory and bandwidth OR requires more computation. The throughput on the FPGA, GPU, CPU is going to be reduced. Now if they were all reduced equally it wouldn't matter all that matters is relative not nominal performance.

Wait a second, where does this "reduced equally" come from? The space-time tradeoff just means that if you have a system with excessive computational power but slow memory, then you can still tweak lookup-gap to trade one for another. That is instead of being at a huge disadvantage compared to more optimally balanced system. This kinda "equalizes" the systems with vastly different specs, which is a total opposite of "reduces equally".

Quote

However the LTC parameters chosen are horrible for CPU usage. CPU have a limited ability for parallel execution. Usually 4 or 8 independent cores. 128KB per core * 8 = 1MB.

This just means that you don't know much about the CPU mining. The point is that modern superscalar processors can execute more than one instruction per cycle, this is called instruction level parallelism. Also there are instruction latencies to take care of. In order to fully utilize the CPU pipeline, each thread has to calculate multiple independent hashes in parallel. Right now cpuminer calculates 3 hashes at once per thread (or even 6 with AVX2). Now do the math.

Quote

That's right today with systems that can install multiple GB for very cheap cost the Scrypt paramters chosen bottleneck performance on a CPU. GPU on the other hand are highly parallel execution engines but they have limited memory and that memory is at a higher cost than CPU have access to.

The memory must be also fast, not just large.

TL/DR

For the external memory, I'm assuming that sufficient size is available to be used for as many cores as practically useful (in this case only the memory bandwidth is an important factor). For the on-chip SRAM memory, the bandwidth should be not a problem as the memory can be tightly coupled with each scrypt core, but the size does matter and can't be large enough (the CPU caches are really small when compared with the DDR memory modules for a reason). The current best performing scrypt mining devices (AMD video cards) are relying on the external memory bandwidth. This FPGA design seems to be essentially a GPU clone.

DeathAndTaxes

donator

Activity: 1218

Merit: 1080

Gerald Davis

Quote from: tacotime on August 31, 2013, 09:12:38 PM

As you can see, as memory exponentially decreases integer ops exponentially increase. He was easily able to get the memory usage into the kilobytes and still crank out hashes. I'd guess that exactly the same is true with N=2^10, r=1, p=1 too. It's the same balancing act you run into no matter what value you use for N or r; at higher N you may increase the difficulty by a smaller constant factor, but overall I doubt increasing N or r will make scrypt much more FPGA/ASIC unfriendly when they finally iron out the FPGA implementation.

Yes that is the space-time tradeoff and they used it to reduce the memory requirements to roughly what LTC Scrypt requires EXCEPT to do so requires a 100x increase in integer performance. If anything you just showed how weak LTC Scrypt is. Another way to look at it is say you had a FPGA card with output of X kh/s using the full scratchpad size of 128KB. Now trying to run N 2^14 you don't have sufficient memory or bandwidth but like the chart shows you could use the space-time tradeoff to reduce the memory requirement to 128KB. Great the memory requirement is similar to LTC Scrypt ... EXCEPT you now need either a FGPA with 100x the integer performance (how much do you think that is going to increase the cost) OR you are going to have 1/100th the hashrate.

vnhyp0

member

Activity: 106

Merit: 10

This look like a potentially interesting development. Capitalism really drives innovation to extremes in some cases.

I look forward to more information about this project, beekeeper.

Topic: A RAM based fpga LTC miner - page 2. (Read 13874 times)