Author

Topic: Procedural Hash-Based Encryption & Compression (Read 1853 times)

legendary
Activity: 1442
Merit: 1005
November 20, 2013, 11:42:31 AM
#6
Was that what you were trying to say?
That could be entirely the exact or part of the things I wanted to say. The problem was lack of sleep, my apologies Smiley
legendary
Activity: 2646
Merit: 1138
All paid signature campaigns should be banned.
Fun fact: Bitcoin uses ECDSA. The single curve that was selected can generate all 2^160 addresses possible. And each private key can generate immense numbers of addresses, the internal space of the function is huge and different for each of these addresses. If displayed as a bitmap each 256 bit key would generate a different image for each address, impossible to compute from the address backwards. It is actually a procedural algorithm.

Where to begin...  The curve used by Bitcoin will generate slightly less than 2256 possible key pairs (private key + public key).  The slightly less than 2256 possible public keys are hashed to produce 2160 possible public key addresses.  There are many more public keys than public key addresses so a single public key address corresponds to about 2256 - 2160 = 2(256-160) = 296 different public keys.

Was that what you were trying to say?  If so you seem to have things confused.
legendary
Activity: 2618
Merit: 1007
However you can generate a set of instructions which describe how to grow the seed and where to find the hex code that you need within the seemingly random hashes, to reconstruct the original file from those instructions. After writing some code and experimenting with this a bit I found that it is possible, however it takes a huge amount of computing power just to compress even a small file with a reasonable degree of compression.
It might very likely take more space to store the reconstruction instructions than the actual file itself.

One could "mine" for suitable seeds + offsets by generating a large number of tables and choosing the smallest set of instructions that leads to the desired result... What's nice about this is that it might be possible to re-use these tables for other data AND that it would probably work quite well on already traditionally compressed data.

It seems kinda similar to "Pi compression" - since the number Pi is infinite and every possible sequence has to come up at some point, you could compress data by just saying "take the 12345th digit of pi until the 50123th and divide by two - that's your data!" you can also make this as easy or complex as you want, in the end all you need to share is the "recipe" of how to manipulate what to get back your original data. (edit: check out http://en.wikipedia.org/wiki/Bailey%E2%80%93Borwein%E2%80%93Plouffe_formula by the way!)

If you want to be really impressed what's possible with compression, check out the .kkrieger game - less than 100 000 bytes (your average iPhone farting app is multiple times larger!) and a fully featured 3d shooter.

Edit:
[...]and that is why I'm unable to provide the source code for this algorithm until I can be sure about the laws regarding encryption algorithms, because I know they are tough.
I doubt that there are any limitations to use SHA256 and newly invented algorithms are anyways not restricted afaik. The US gave up on restrictions regarding crypto int he end of the 90s thankfully, they probably rather break into servers than waste electricity and time on breaking the actual crypto.
legendary
Activity: 1442
Merit: 1005
There is this concept called Pigeonhole Principle that prevents reverse-engineering the seed from a given real arbitrary data sample. So no, you can't magically compress real data to smaller hashes all the time. This is because the hashes will collide in some cases (which require further additional data to identify the collided fragments) or can not result in some values (some hashes are never generated with the given function). Many others ahead of you spent countless sleepless nights trying this and failed. I also wasted nights before testing concepts, algorithms and samples to come to the same conclusion. The pigeonhole principle is solid. This covers the compression part, you can't compress using hashes, you actually add more data by 5-10% depending on model.

About the obfuscation part. On paper, ECDSA, scrypt, even AES, RSA have huge internal key spaces, which is why they can be used for signing messages. What you are asking is actually achieved by standard encryption tools such as PGP or Truecrypt. The concept of Trapdoor function refers to the "difficulty" by which your encrypted message can be cheaply encrypted and decrypted knowing the key, but is very expensive to encrypt or decrypt without knowing the exact key.

Fun fact: Bitcoin uses ECDSA. The single curve that was selected can generate all 2^256 private and 2^160 public addresses possible. And each private key can generate immense numbers of addresses, the internal space of the function is huge and different for each of these addresses. If displayed as a bitmap each 256 bit key would generate a different image for each address, impossible to compute from the address backwards. It is actually a procedural algorithm.



So what was actually your question?
legendary
Activity: 1536
Merit: 1000
electronic [r]evolution
Here is an example of what a file encrypted using this method looks like: link

The decrypted version of the file contains the text "decrypt me if you can sucker!".

Now if anyone can actually decrypt it to that text without having the seed I'll be amazed.

One thing to keep in mind if you attempt it: ignore the first nibble in the file, it's just for padding.
legendary
Activity: 1536
Merit: 1000
electronic [r]evolution
Wasn't exactly sure where to post this thread, move if necessary. This is an idea I got a few days ago when I was watching a video about a game called FUEL. It has the largest game world of any console game ever made, and it fits on a regular DVD disk. The way it achieves this is by procedurally generating the world on the fly, meaning a relatively small algorithm is capable of generating a complex landscape.

After thinking about this problem for a short while I realized that hashing algorithms provide a good way of generating a large complex data set from a small seed. A large hex string can be generated from a small seed by hashing it over and over again and joining all the hashes together. The trick is finding a seed which generates a hex string which corresponds to the hex string you are trying to compress, which of course is not feasible.

However you can generate a set of instructions which describe how to grow the seed and where to find the hex code that you need within the seemingly random hashes, to reconstruct the original file from those instructions. After writing some code and experimenting with this a bit I found that it is possible, however it takes a huge amount of computing power just to compress even a small file with a reasonable degree of compression.

It is fairly quick to decompress the file but the time required for reasonable compression is far too long. It could be useful for someone who needs as much compression as possible and they have the computing power to spare. Like a company that wants to release a popular digital file which they know will cost a lot of bandwidth, but other than that it seems fairly useless as a compression technique.

However it does seem to make an extremely robust encryption mechanism because without the seed, the instructions mean absolutely nothing. The resulting encrypted file doesn't even contain an encrypted version of the original data, it only contains instructions for how to rebuild the file from random hex generated by the seed, but without the seed those instructions are meaningless.

The only downside of this encryption technique is that the resulting file will be larger than the original. This is because in order to make the encryption process fast enough, we must spend less time making our instructions as compact as possible (meaning we must spend less time hashing stuff), so the instructions used to rebuild the file turn out to be larger than the original file.

At first the file size was doubling in order to get a reasonable level of speed, but I spent some time tweaking it and got the output size down to about 1.5x the original size instead of 2x the original size, while maintaining the same level of speed. It's still fairly slow compared to other encryption algorithms, but the decryption process is still much quicker than the encryption process, as it was with the compression algorithm.

I used sha256 in my implementation and I know that is a highly robust hashing algorithm and it has a huge key space, so I'm fairly certain that without the seed, an attacker would have absolutely no hope of decrypting a file encrypted with this method, and that is why I'm unable to provide the source code for this algorithm until I can be sure about the laws regarding encryption algorithms, because I know they are tough.

I posted this thread just to get some feedback and thoughts on the idea and maybe learn if something similar to this already exists. It reminds me a bit of the way bitcoin gets work done through hashing. A crypto-coin where people can submit files for compression and miners get paid to solve these types of difficult compression problems would be interesting, and if the whole network is working on such problems it becomes feasible.
Jump to: