Author

Topic: Why not utilize RAM in video cards? (Read 2400 times)

rjk
sr. member
Activity: 448
Merit: 250
1ngldh
March 19, 2012, 11:36:17 PM
#14
Exactly the reason why i opened this thread was that i've not written a SHA256 algorithm from scratch based on documentation (nor studied it), tho i've written some other 10^(10^100) simpler algos.

But my original intent was not to attack SHA256 but rather utilize more of the underlying hardware for even some benefit, even how miniscule a single benefit is. But yeah, i guess it boils down to that.

Bitcoin mining is essentially attacking SHA256 at full force, and making it "faster" by reducing the number of operations needed to complete a round is essentially breaking the protocol, which has been deemed unbreakable (as of yet).
sr. member
Activity: 402
Merit: 250
March 19, 2012, 11:29:46 PM
#13
Exactly the reason why i opened this thread was that i've not written a SHA256 algorithm from scratch based on documentation (nor studied it), tho i've written some other 10^(10^100) simpler algos.

But my original intent was not to attack SHA256 but rather utilize more of the underlying hardware for even some benefit, even how miniscule a single benefit is. But yeah, i guess it boils down to that.
donator
Activity: 1218
Merit: 1079
Gerald Davis
March 19, 2012, 11:15:54 PM
#12
I agree with rjk.  If you have a specific attack you believe has  merit post some code  or psuedo code.  On the other hand if you haven't thouroughly looked over the SHA-256 function to the point that you understand all of the operations, why they exist, and can manually hash a block of data from start to finish then I recommend you start there.  Much of what you describe sounds like (may not be true but it sounds like) someone who hasn't taken a very detailed look at how SHA-256 (or more generically a cryptographic hash) works.  

What you describe wasn't even possible with SHA-1 but a key point to SHA-2 was improving the element inter dependencies to make meet in the middle attacks (which are theoretically possible but nobody has pulled it off on SHA-1) infeasible.  While you are looking at SHA-256 take a look at SHA-1 and see how much the inter-element computations have improved from SHA-1 to SHA-2.
donator
Activity: 1218
Merit: 1079
Gerald Davis
March 19, 2012, 11:08:04 PM
#11
What i meant, there is cases where at certain situation a certain outcome is certain to 100%.
Expanding THOSE lookups where we know that something will result into something 100% of the time is my intention.

No such scenarios exist beyond the current ones.

Quote
Let's say beginning of the hash is e5da3, and for some statistical miraculous reason if char[0] == e, char[3]==d, char[5] == 3 then char[6] == a when we know that what we are hashing is N characters long.
Or maybe if the beginning of hash is 6be3c118da8acd56 we know the last character is 5, thus cannot be our match.
Or maybe if the first round beginning is 6be3c118da8acd56 we know that next round first 3 chars will be dcc, and therefore not a match.
Pending statistical analysis naturally.

I hate to break it to you but no modern cipher can be broken that easily. Hell no cipher of 1960s could be broken that easily.  There is no prefix, suffix, middle charecter, or other recognizable element of a block header which produces too large of a block hash.  If there was then SHA-256 would be broken. 

Quote
I don't understand why that kind of statistical analysis is so difficult to fathom.
Similar things ARE being done with FPGAs, and probably on the GPU code.

No they aren't.  That would be a round reduction attack and no know general purpose round reduction attack exists for SHA-256.  The entire cryptographic community has been looking over round reductions for the better part of a decade.  If you think you will find one via some statistical analysis of a trillion or so hashes well start looking into it but before you waste your time look at the psuedo code of SHA-256 round computation.

Specifically the Ch function.  Notice also how elements A to H rotate after each round.  



Cryptographically strong hashes are designed to defeat specifically what you are looking to do.  If you could do what you think you can do forget Bitcoin the implications go far beyond it.  You would be a rockstar in cryptographic circles, and could likely write yourself a job opening in any of the 3 letter agencies you wanted.  Then again the NSA spent nearly 4 years vetting algorithms, hosting open challenges to look for attacks and SHA-256 has been THE target to hit among cryptographers so I think the odds of you stumbling accidentally on something that rips it wide open is limited.

Quote
Also, we know the format the original data is in, which might result in some odd coincidences in the resulting double hash.

SHA-256 has no know preimage attacks against the full strength cipher.


Still don't let this dissuade you.  What you are describing is an attack, not an attack of Bitcoin but a credible attack against SHA-256.
rjk
sr. member
Activity: 448
Merit: 250
1ngldh
March 19, 2012, 10:35:53 PM
#10
Ok, maybe i am not articulating myself clearly enough.

DaT: Those figures were examples, completely from hat, meant to make a point. Plus you misunderstood what i were saying Cheesy
rjk: Such a lookup of table of finished hashes or something, was not at all my intention.


What i meant, there is cases where at certain situation a certain outcome is certain to 100%.
Expanding THOSE lookups where we know that something will result into something 100% of the time is my intention.

Let's say beginning of the hash is e5da3, and for some statistical miraculous reason if char[0] == e, char[3]==d, char[5] == 3 then char[6] == a when we know that what we are hashing is N characters long.
Or maybe if the beginning of hash is 6be3c118da8acd56 we know the last character is 5, thus cannot be our match.
Or maybe if the first round beginning is 6be3c118da8acd56 we know that next round first 3 chars will be dcc, and therefore not a match.
Pending statistical analysis naturally.

I don't understand why that kind of statistical analysis is so difficult to fathom.
Similar things ARE being done with FPGAs, and probably on the GPU code.

Also, we know the format the original data is in, which might result in some odd coincidences in the resulting double hash.
Simple answer: No, won't work - have a look at the SHA256 algorithm to find out why.

Slightly longer answer: actually, some optimization has been made requiring less than 64 full rounds to complete, but the savings was very little, and only was at the tail end of a round.

The statistics are not going to be able to do what you are asking of them because of the final parts making the output completely random. You cannot statistically analyze random numbers in order to predict them or to predict parts of them, because then they would not be random. If you have an understanding of how SHA256 works, and have had an epiphany of how to create some kind of side channel attack (which is what your proposal amounts to), then show us some code so we can have a look at it and pick it apart.
sr. member
Activity: 402
Merit: 250
March 19, 2012, 10:29:59 PM
#9
Ok, maybe i am not articulating myself clearly enough.

DaT: Those figures were examples, completely from hat, meant to make a point. Plus you misunderstood what i were saying Cheesy
rjk: Such a lookup of table of finished hashes or something, was not at all my intention.


What i meant, there is cases where at certain situation a certain outcome is certain to 100%.
Expanding THOSE lookups where we know that something will result into something 100% of the time is my intention.

Let's say beginning of the hash is e5da3, and for some statistical miraculous reason if char[0] == e, char[3]==d, char[5] == 3 then char[6] == a when we know that what we are hashing is N characters long.
Or maybe if the beginning of hash is 6be3c118da8acd56 we know the last character is 5, thus cannot be our match.
Or maybe if the first round beginning is 6be3c118da8acd56 we know that next round first 3 chars will be dcc, and therefore not a match.
Pending statistical analysis naturally.

I don't understand why that kind of statistical analysis is so difficult to fathom.
Similar things ARE being done with FPGAs, and probably on the GPU code.

Also, we know the format the original data is in, which might result in some odd coincidences in the resulting double hash.
rjk
sr. member
Activity: 448
Merit: 250
1ngldh
March 19, 2012, 09:10:25 PM
#8
Having such a lookup table would require you to know the end result of the solution that we are bruteforcing, which is impossible. It sounds like you are saying to proceed through about half of the SHA256 calculation and then create a lookup - this won't work because you can't predict the output from only running half the algorithm.
donator
Activity: 1218
Merit: 1079
Gerald Davis
March 19, 2012, 08:58:41 PM
#7
500 million hashes is 0.000000000000000000000000000000000000000000000000000000000000000000001% (not exact) of all potential hashes.

So 99.99999999999999999999999999999999999999999999999999999999999% (not exact) you will find nothing to lookup.

sr. member
Activity: 402
Merit: 250
March 19, 2012, 08:47:22 PM
#6
Yeah, underclocking memory is one important step to lower temps (and cut electricity usage).

And i meant more like in a rainbow table fashion, for vast amounts. Like you said, big bandwidth, high latency. but there is also lots of ram.

No one of you actually were talking about what i were talking about. Thinking outside of the box here.

I was thinking utilizing that vast amount of ram AND vast bandwidth.
Like say after round  N compare to a hash table to see if we already have pointers to the end result or N rounds deeper.
And do it at vast amounts of hashes at once.

Say we calculate 1/4th of the double-SHA first for 500million hashes, then compare tables to known bad prefixes first 1/4th which will not result in what we want with a 99.999%+ confidence.
Only the remainder progress further.

There is ALREADY similar optimizations (looking on the FPGA threads), where at step # we know something will not result into what we want, or gives always certain result, part can be skipped.
Here i'm talking about it on a more massive scale. Think of it as "Big Data".
Of course that will require vast amounts of data to be analyzed to spot the cases where this can be done.

If extra latency of even 0.5s happens for 10s worth of work, but it results in 10.6s worth of work there is a net gain with the drawback that your result submitting might be 0.5s later than someone elses, but you complete a little bit more work every 10.5s you will over the long term haul in more results.

Also, another point to look at is that could somehow different parts of the GPU be utilized to achieve same results even if algorithm is totally "bonkers" for SHA, but always returns the correct result, ie. turning part of the algo into floating point operations (which afaik are done in different part, right?), it doesn't matter even if it's as FP operation 10x more expensive, but if it saves 1 cycle from integer operations every 160 cycles, that is already a 0.00625% increase in speed for just that operation if there is sufficient floating point capacity to do that.

I haven't worked on this kind of code, i don't know the exact inner workings of GPU and the code that goes with it, but as a coder it strikes me odd that atleast i don't see discussion about attempts to utilize the unused parts. Optimization is afterall mostly about squeezing the last little bit of work from the underlying hardware, sometimes cutting a step, sometimes adding thousands of steps to utilize additional portion of hardware for a net gain. (Think Memcached and other key-value datastores working in conjunction with MySQL ... or distributing data utilizing bittorrent)
donator
Activity: 1218
Merit: 1079
Gerald Davis
March 19, 2012, 09:35:46 AM
#5
GPU memory has high bandwidth but horrible latency.  It is useless for anything where latency matters.  

Since it is so hard to earn coins people sometimes forget how pathetically easy one hash is to complete.

Average GPU will take less than a hundred millionth of a second to complete.  That is actually a double hash involving 160 rounds.  The time to complete a round is on the order of a billionth of a second.   GPU main memory is two orders of magnitude slow so any design which used it would be something like this.

wait 10,000 ns
process sha round #1 - 1 ns
wait 10,000 ns
process sha round #2 - 1 ns

Smiley

More importantly there is more than enough registers available to keep everything in the SIMD without going to main memory.  Given main memory is slower why use it?  Main memory is only used to load inputs and hold nonces which meet target requirements.
sr. member
Activity: 256
Merit: 250
March 19, 2012, 09:28:42 AM
#4
Because off-chip video memory is incredibly slower as compared to registers and on-chip memory.
rjk
sr. member
Activity: 448
Merit: 250
1ngldh
March 19, 2012, 08:56:15 AM
#3
SHA256 has been beaten to death and pulled apart until it is almost impossible to optimize any better for video cards, barring unforseen side-channel attacks. You aren't going to get any extra performance using the video card's memory for anything, in fact many people underclock the memory dramatically in order to save on power usage, while minimally impacting hash rate. Some memory is used, but not much at all.
legendary
Activity: 1526
Merit: 1001
March 19, 2012, 01:49:50 AM
#2
Interesting. Hope some tech-savvy people will take a look at it.
sr. member
Activity: 402
Merit: 250
March 18, 2012, 04:11:25 PM
#1
I know you guys are far more skilled and knowledgeable in terms of doing this kind of chance, but in the odd chance i had just a different view perspective.

Video cards usually sport a lot of VERY fast RAM as well, so couldn't this be used for portions of the algorithm in a 'rainbow table' fashion.
So for some particularly heavy operation in the SHA256 algorithm, trying to fetch the answer from video card ram, or maybe even system RAM as a fail over.

This has to be a very small portion key=value based store which is pre-generated upon launch, but a costly one, with a very high ratio of processing cost vs. memory needed.

As there is a latency cost, it will likely mean that the code needs to be split into 2 portions from the "look at tables" position, so that results are being cached for the 2nd stage (after lookup) to keep all SPs fully utilized all the time despite increased latency.

This would help utilize more of the (already paid for) hardware for a performance increase.
Jump to: