Pages:
Author

Topic: Crypto Compression Concept Worth Big Money - I Did It! (Read 13906 times)

full member
Activity: 168
Merit: 100
copper member
Activity: 1498
Merit: 1528
No I dont escrow anymore.
legendary
Activity: 2646
Merit: 1138
All paid signature campaigns should be banned.
He will be enriched by the experience!
legendary
Activity: 3416
Merit: 1912
The Concierge of Crypto
I thought about this, or very similar to this, several years back. I could never get it to work after throwing everything at it. So I put it aside and worked on solving Fermat's last theorem. When I was close to a solution, someone beat me and published it.

I went back to my ultimate compression algorithm, and the best I could come up with that actually worked was fractal related and only on specific kinds of data.

I don't want to discourage you, but so far, your table method does not make sense to me, or I can't see how it can work, at least on normal data.

Try it on the blockchain and see if you can losslessly compress 14 gigabytes to something significantly smaller. Give it away for free. You'll get rich somehow.
sr. member
Activity: 476
Merit: 250
You need to accept the basic problem:
X distinct input files must generate X distinct output files, otherwise you cannot retrieve the original files.
Do you agree or disagree with this?

If I were to accept your basic premise, I would have to say that one word cannot possibly mean anything more than just one thing.  Therefore, the word "bow" cannot mean to bend over at the waist, when it clearly means a piece of ceremonial cloth tied at the neck.  And there goes the other half of the "____ and arrows" equation.  One input can in fact stand in place of many outputs, its all based on relative meaning.  Space.  Whether that space is in our imaginations or in the real world.  We can clearly simultaneously understand that "bow" means bow and arrow, bow tie, and take a bow.  One input many outputs.  There are millions of guys named "Smith"  ... by your logic, there cannot be more than one Smith, since 1 input = 1 possible output.  

You are almost there.
What you have shown is that "Smith" is not a unique compression output for a person whose surname is Smith, because many many people would also be compressed to the same name. So Person->Surname is a lossy compression method, because it is impossible to retrieve the original person from just the surname.
Similarly, natural language is a lossy compression of the objects or ideas it represents. Given just the word "bow", I cannot know if you meant bow and arrow or take a bow.
Lossy compression schemes can reduce sizes for all input files, but it doing so they lose the ability to exactly reconstruct the input file from the output file.
MP3/AAC for music and XVID/MP4 for video are common examples of lossy compression schemes.

What you have claimed to be able to do is create a lossless compression scheme that can compress [transform to less than their original size] all input files. That is simply not possible.

There are 256 possible different one byte input files.
If there were less than 256 different output files, then two input files would be mapped to the same output file, and it would not be possible when decompressing that output file to know which of the two input files was meant. Information would have been lost, so this would be a lossy compression scheme.
So our 256 input files must map to 256 different output files. Each file must be at least one byte long, so the total size of all files must be at least 256 bytes, so no space has been saved.

There are 256x256 possible different two byte input files.
If one of them was to map to a single byte output file, it would have to be the same as one of the output files created by compressing one of our single byte input files, which would meant that we could not differentiate between those two files when decompressing, so this would be a lossy compression scheme again.
So there must be 256x256 output files, each of which is at least two byes long. So no space has been saved.

By induction, the same proof shows that for any input file size, the total set of all files of that size or smaller must map to a set of files of at least the same total size. Hence the average compressed size of any one file must be at least as large as the size of the file itself.
sr. member
Activity: 452
Merit: 250
I also come up with a solution for the decompression time, so here is my solution and the next problem.

This may not be the best explanation, but I'll try. Imagine a user downloads the tiny compressed file that represents 10 gb of data. Now imagine writing that 10 gb of data to a hard drive hundreds of times in order to get the true 10 gb of data. During that time your using full resources of your computer or device, rendering it somewhat useless.

So of course I couldn't get that far and not come up with a solution. So what if in the new world we had devices that intercept the information and decompress 1 layer at a time, and each layer of decompression streamed it to the next layer of decompression. Now we are no longer compressing files, but internet data as it enters your home. Making it possible to download the 10gb file as if its much smaller.

Well, the problem here is... The tech has already been invented and utilized.
newbie
Activity: 28
Merit: 0

And you have inadvertently undermined your own point.
"Jimovious" takes more space than "Jim" twice.



No, you just got confused as to the label.  Two "aa" become "i"   This is a 50% reduction.  I purposely made the Jim name larger as Jimovious to point this fact out and you fell into my trap.  But the point here is that in Layer 2, we will ONLY have labels that refer to combinations of 2 which came from Table 1, so they are unique like Jimovious, it wasn't about the size of the name I assigned it, it was about it being unique.
If you have X possible input characters then you have X^2 possible combinations.
So you have reduced the number of characters, but increased the storage required for each one.
There is no overall saving.

You need to accept the basic problem:
X distinct input files must generate X distinct output files, otherwise you cannot retrieve the original files.
Do you agree or disagree with this?



I don't believe that I am able, at this time, to accept your basic problem (premise).  I would be like the Wright Brothers agreeing with people who said flight was altogether an impossible thing meant for birds and if God had wanted Man to fly, he would have given him wings.  

If I were to accept your basic premise, I would have to say that one word cannot possibly mean anything more than just one thing.  Therefore, the word "bow" cannot mean to bend over at the waist, when it clearly means a piece of ceremonial cloth tied at the neck.  And there goes the other half of the "____ and arrows" equation.  One input can in fact stand in place of many outputs, its all based on relative meaning.  Space.  Whether that space is in our imaginations or in the real world.  We can clearly simultaneously understand that "bow" means bow and arrow, bow tie, and take a bow.  One input many outputs.  There are millions of guys named "Smith"  ... by your logic, there cannot be more than one Smith, since 1 input = 1 possible output.  

By using tables, one can generate meanings for given sets of data.  I can say in my Table that "aa" = "i"  ... now let's say there are 50,000 matches in a 100,000-text piece of data.  That implies 100,000 total pieces of data that can be compared, being reduced from 2 pieces to 1 piece.  I save the data out using software based on this system, then it's now half the size it once was.   The filesize is now 50,000 characters in bytes (whatever that amounts to) but was 100,000 characters in bytes.  How can you tell me that it was not cut in half by my method using logic, when its now clearly half the size it was (hypothetically)!?

You say we cannot take 2 bits and represent them with only 1 bit.  Perhaps that's not true.  As things stand, a bit is merely a single register either on or off.  But some hardcore engineers have demonstrated that a gate can float halfway between open and closed, creating a third state, a fuzzy state.  So what if we designed a voltage subroutine into the programming of the compression/encoding scheme that was able to read the voltage level of a bit?  Now we see a bit can float more to the left or more to the right.  Something like this:   0    |  /  1   being 30/70    or    0  \ |   1  being 70/30.   So now we can say If the 2 bits are 00, they are 100% right.  But the bits are 01 they are 30/70% in favor of the right.   And if they are 10 they are 70/30% in favor of the left.  And if they 11 they are 100% left.  Thus, that one bit now has 4 states to represent 2 bits in place of 1 bit.  

Its still one bit, it still only holds one bit of info.  But we found a way to read that bit differently (bow-arrow, bow tie, take a bow, bow of the ship) despite the bit holding no extra space.  We used another method to change what the bit could do.  We created a reference where no reference previously existed.  That's all I'm trying to do with this new theory.


sr. member
Activity: 452
Merit: 250
I've thought about similar methods of compression... In the end, the issue isn't whether you can compress data down to almost nothing, its how much time it takes to decompress the data. Compressing data down is only part of the problem.
sr. member
Activity: 476
Merit: 250

And you have inadvertently undermined your own point.
"Jimovious" takes more space than "Jim" twice.



No, you just got confused as to the label.  Two "aa" become "i"   This is a 50% reduction.  I purposely made the Jim name larger as Jimovious to point this fact out and you fell into my trap.  But the point here is that in Layer 2, we will ONLY have labels that refer to combinations of 2 which came from Table 1, so they are unique like Jimovious, it wasn't about the size of the name I assigned it, it was about it being unique.
If you have X possible input characters then you have X^2 possible combinations.
So you have reduced the number of characters, but increased the storage required for each one.
There is no overall saving.

You need to accept the basic problem:
X distinct input files must generate X distinct output files, otherwise you cannot retrieve the original files.
Do you agree or disagree with this?
legendary
Activity: 2646
Merit: 1138
All paid signature campaigns should be banned.
It looks like you are still having a lot of fun thinking about this.  Good for you.
hero member
Activity: 686
Merit: 500
Wat
So you invented the Tardis from dr who ? lol
newbie
Activity: 28
Merit: 0
Hello Everyone,

The Achievement:
I have made an amazing discovery and spent a year trying to talk everyone I know into helping me with my solution for compressing 99.8% of all data out of a file, leaving only a basic crypto key containing the thread of how to re-create the entire file from scratch.
-snip-

Patent / Open Source or it didnt happen.

What you describe sounds like a very primitive compression method with a bit of "the engine just knows" magic.

Here is the whole thing boiled down a succinct as I can make it.   There's no magic here. 

Import data, convert to binary.  Save.  Re-import binary, convert to Layer 1, letters a-H (aAbBcC... etc) according to the Table above.  The file will double in size.  Save.  Re-import and begin Layer 2.  Encoding will begins here.  Encode 1st chunk of 1024K using Crossword Grid alignment, 20 spaces per row.  Fill Array with data from Layer 1.  Now search data from current row, 2 rows at a time simultaneously, 20 spaces apart (essentially on top of each other in the crossword puzzle grid.  Let's call that the CWG from now on.  When a match is found, replace topmost row's match with unique identifier that means the same thing to the 2nd Layering engine as the match found.  For example "aa" is replaced by "i"  Now delete the bottommost match, leaving an empty space (the compression/encoding).  Now shift the whole array to the right, displacing that empty space, which now becomes a zero at Row 1 Column 1.  Now continue forward.  Find all matches until the last line is (the last 20 cells are) reached.  At the last line, no more compression can be done because there isn't a line under it to compare to.  What we are left with here is essentially a boiled down key.  Nothing more can be done with this chunk.  That key is saved inside the file we are building as such:

1004(0)_GbcDeEafFBAAcbeEBDfga_6(0)

Where the first block above is the topmost last line to the halfway point of the line and the 2ndmost line is the bottommost line to where it ends halfway (20 spaces total combined) and the final part is a message to software that 1004 zeros precede those two keys.  Now the engine can get rid of the 0's, leaving that small chunk to retrieve all the data later.

It would do so like this .... 

0000000000000000000
0000000000000000000
0000000000000000000
0000000000000000000
0000000000000000000
0000000000000000000
GbcDeEafFBAcbeEBDfga000000

Now the engine counts how many zeros are in the block before the first actual piece of data to know how many iterations it had done to reach that final sequence.  It counts the zeros, here we see 120 zeros total.  The engine is told the key goes on the last line plus the number of zeros (empty spaces) that were left over. Now having figured out how may iterations to start from backwards, it begins comparing data back out, starting by re-ordering the entire sequence to the left.  So the key would actually look like this:

0000000000000000000
0000000000000000000
0000000000000000000
0000000000000000000
0000000000000000000
0000000000000GbcDeE
afFBAcbeEBDfga000000

Which we can easily see the last line because the G and the a are not overtop of each other, meaning this is indeed the true last line.  Depending on how good the compression is, the last line can occur anywhere in the block, as such :

0000000000000000000
0000000000000000000
0000000GbcDeEafFBAc
beEBDfga00000000000
0000000000000000000
0000000000000000000
0000000000000000000

As long as the block is totally intact and none of the pieces overlap, it is complete.  The reason we need to know how many empty spaces were left at the end is so we can separate how many iterations occured with how many empty spaces were left, since not all the zeros here mean iterations. 

I hope you can see this as clearly as I see it in my head.  Its efficient and would totally work.  I hope you will be able to see that by studying this.
newbie
Activity: 28
Merit: 0
OP just let it go mate, move on.

I'll tell you what, I will decide when to quit.  And you can decide when to quit telling me to quit.  And we will see who quits first.  Deal?
newbie
Activity: 28
Merit: 0

And you have inadvertently undermined your own point.
"Jimovious" takes more space than "Jim" twice.



No, you just got confused as to the label.  Two "aa" become "i"   This is a 50% reduction.  I purposely made the Jim name larger as Jimovious to point this fact out and you fell into my trap.  But the point here is that in Layer 2, we will ONLY have labels that refer to combinations of 2 which came from Table 1, so they are unique like Jimovious, it wasn't about the size of the name I assigned it, it was about it being unique.

In Layer 2, "i" only means "aa" when extracted out.  And it tells the software where exactly the "aa"s go due to the Crossword Grid alignment.

And yes, you can keep sending the data back into itself recursively (which I know seems impossible) as long as you carefully make sure that there are absolutely NO mistakes in the rulesets that refer the 2 pieces down to the 1 unique piece in that Layer.  It's about the spatial arrangement.  The way space-time exists in 3-D space physically, and in the time-space 4D timespace as slices.   Space-time has physical space separated by time.  But Time-space has time locations separated by space.  So what we are doing here is attempting to put the data into space.  It all still exists, its just spread out in time and only appears to be shrunken down.

The point is that you must have 1 unique character that replaces 2 non-unique characters.  I'm still working on making sure I can do that 100%.  The interesting thing is that if you look at Table 1 above, referring to converting all binary, there are only 16 possible combinations of 4-bit sequences.  So all Binary data can be converted into Ascii first, then arranged into Crossword grid in certain chunk sizes, then crunched Layer by Layer to practically nothing through incursion (or is it called recursion?)  
copper member
Activity: 1498
Merit: 1528
No I dont escrow anymore.
Hello Everyone,

The Achievement:
I have made an amazing discovery and spent a year trying to talk everyone I know into helping me with my solution for compressing 99.8% of all data out of a file, leaving only a basic crypto key containing the thread of how to re-create the entire file from scratch.
-snip-

Patent / Open Source or it didnt happen.

What you describe sounds like a very primitive compression method with a bit of "the engine just knows" magic.
legendary
Activity: 1652
Merit: 1016
OP just let it go mate, move on.
newbie
Activity: 28
Merit: 0
Sorry, but the data in my post got shifted and garbled ... so I made another image.  Sorry the others were so big ... this one is a bit smaller.  Now you can see what I was trying to say in the Row-Column part in the above post ...  here it is:

http://imageshack.com/a/img30/3839/tzya.jpg
sr. member
Activity: 476
Merit: 250
However, some of you are just too ardent in believing that this is impossible and cannot be done, and I still believe that encoding can, if done correctly, result in a significantly smaller filesize than the original file.  And I've devised a method of showing it might work.

Oh dear.
You clearly haven't understood the basic point, which is that if you have X total number of input files, each of Y bytes size, then the total size of all of the encoded files must be at least X*Y.
If it is not, then you have lost some data, and two of the encoded files will map back to the same input file.
It doesn't matter what process you come up with or how clever it is, the basic fact is that for a compression system to be lossless, information cannot be lost.

Quote
It's like having a bunch of people, with random normal names, stand in a line.  And then you say if there are two "Jim's" standing in the crowd, we can take them both out and replace them with a "Jimovious" a name no one else will have in that group.  Now we have shrunken the number of bodies in the crowd by 1.  But if asked to reassemble the original crowd, we just say throw Jimovious out and replace with 2 Jim's.

And you have inadvertently undermined your own point.
"Jimovious" takes more space than "Jim" twice.
newbie
Activity: 28
Merit: 0
..what happened to OP? Did he give up on his billion-dollar idea?

No, I am still here.  A while ago this whole board got crashed when some kind of a hack happened and I was devastated that my whole thread was lost.  I couldn't find my thread at all, whole whole site was down.  Recently, while talking to some others about this wonderful but lost thread where I got schooled by all of you for trying to imagine something awesome, I did an internet search and by coincidence this very thread showed up in the search results!  I couldn't believe it when I opened the thread and found this website back and my thread back.  Now I'm copying out all of this text and saving it for future reference (well, except for the many many flames I got, who needs to remember that!?)  Your explanations were quite invaluable to me in understanding.

However, some of you are just too ardent in believing that this is impossible and cannot be done, and I still believe that encoding can, if done correctly, result in a significantly smaller filesize than the original file.  And I've devised a method of showing it might work.   Essentially, it works like this:

Layer 0:
Input Any File and Convert whatever language into binary.  Now grab the binary data in chunks of 4.

Layers 1 & 2:
http://imageshack.com/a/img46/6823/b4g0.jpg
Convert Using the Table Above.  Initially, yes, this does double the size of the file (since we'll be taking ONE 8-bit character and turning it into TWO 8-bit characters), but it must be done to begin the process of formatting into a recognizable structure.  Each layer has its own engine and rules that must be followed.

All Binary is converted into Ascii characters (inside a text document), letters small-a "a" to Large H - "H"  aAbBcCdDeEfFgGhH for layer 1.  Now ALL of the Binary data is converted into letters a to H (small and Large) characters.  And that is saved into a text file.

Now in Layer 2 (which is actually a group of sub-layers needed to convert all the Layer 1 "text" data) out of Layer 1 form into Layer 2 Form, so all the data is in an entirely new form, small i "i" to Large Z going  iIjJkK .. to .. zZ.  Its too hard to explain it all right here, so I'll move on for now, but this process is rather complex and lengthy, but must be done before going on to Layer 3.

Layer 3:  Another Grouping of Layers dedicated to changing all the i-Z data from Layer 2 into the numbers and symbols used in Ascii.
http://imageshack.com/a/img189/6539/9j3f.jpg
This complex system slowly painstakingly changes all Layer 2 data into Layer 3 data, where now all the data is now symbols and numbers.

Now at this point, you have 3-point system for changing all the data interchangeably back and forth, but what does that do?  How can that encode (and shrink data)?

Here is how:  Imagine a Crossword Puzzle 20 spaces long, where the previous data sets were sorted into the cells,  so that every 21st index forward (drops down and back to the extreme left cell) and is now under the space above it.  So every 21st space, you have a new row.  Now, the software looks for patterns according to the Layer Tables I've drawn out above.  They are not complete tables, as I've said the tables are multi-layered to account for all of the variables in the data sets.  Let's say you had this:

1      2      3      4      5       6      7      8       9     10    11    12     13    14    15    16    17    18   19   20
E   g   A   D   f   D   B   g   H   a   c    A   C   d   c      F     d      E     H    A
D   B   b    C   F   g   d   C   d   a   F   a   a   C   F      c     c      B     c     h
f   b   a   D   A   H   g   h   h   C   b   H   d   c      C      d     G     F      f     E     

3 Rows of 20 Columns.  Look at Row 1 and Column 10, and then look below it at Row 2 Column 20.  Both are "small a"s.  They can be encoded by exchanging Row 1 Column 10 with a reference from Table 2 above.  See the Table 2, where it says "aa" and below it there is a "i" ?   So then this happens:

1      2      3      4      5       6      7      8       9     10    11    12     13    14    15    16    17    18   19   20
E   g   A   D   f   D   B   g   H    i   c    A   C   d      c      F     d      E     H    A
B   b    C   F   g   d   C   d   F   F   a   a   C   F      c      c      B     c     h     f
b   a   D   A   H   g   h   h   C   b   H   d   c      C      d     G      F      f     E

The 2nd row "a" below the new "i" above it is deleted, and the entire sequence proceeding it is shifted left one space through the entire table.
When all of the data that can be encoded is removed, you begin to see the compression (or encoding whatever you would call it).

The whole thing is based on the Ascii Character Table, which could be used to reference itself.    When the program is going in reverse mode (extracting out the data) it comes to the "i" in Row 1 Column 10 and sees the "i" as "aa" (one "a" in the space where the "i" is and one "a" in Row 2 Column 20, directly under it.  First it shifts the whole table forward by one to make a space for the "a" to be added back in there."

This is a really tiny explanation, but it should be enough for the majority of you to see if this logic is feasible.  If you are changing two pieces of data into one based on how it spatially aligns in a crossword puzzle, then you can how using spatial alignments is how you can throw out one piece of data and yet still be able to recover it.

It's like having a bunch of people, with random normal names, stand in a line.  And then you say if there are two "Jim's" standing in the crowd, we can take them both out and replace them with a "Jimovious" a name no one else will have in that group.  Now we have shrunken the number of bodies in the crowd by 1.  But if asked to reassemble the original crowd, we just say throw Jimovious out and replace with 2 Jim's.  The engine knows where to put them in the crossword puzzle, because it would not have encoded them in the first place unless they had aligned directly above and atop each other.  So the engine knows to read two lines at a time, comparing  IF Row1Caret=Row2Caret then DO REPLACER().    Something like that.  Do you get the idea? 

Please friends ...  let me know your thoughts.

member
Activity: 100
Merit: 10
Vast
If it's really time travel, bringing any information back is going to make you rich. Winning numbers, stock performance, buying into Asicminer, ... Trying to eat the same taco... Oh wait, wrong game. That's Plants versus Zombies 2.

Would you be trying to eat the same taco, or would you need to? For the greater good of the universe
Pages:
Jump to: