Author

Topic: Proof of compression / blockchain geometric compression (Read 1078 times)

legendary
Activity: 1260
Merit: 1008
totally realized this could be done recursively on the leftover strings.

(shape index of original hash stack) stack 1
(leftover data string) - data string 1

stack data string 1 into new stack 2
shape index of stack 2
leftover data string - data string 2

etc





64 * 320 = 20480 digits, just fyi.
legendary
Activity: 1260
Merit: 1008
on the last point. I agree - it is difficult to determine if it is the most compressed or not. I may have addressed this in the OP, but yeah - this wouldn't serve as an ideal proof of work - 1) unknown if the "best" compression has been obtained, and 2) potential for optimization of the algorithm.

hrm. Yeah, I definitely thought about trying it on pen and paper - indeed, thats what some of my images were an attempt at. From rough estimates though, theoretical compression gets big at larger patterns (a 16 node shape could theoretically have 1:16 compression).

and just to lay it out there, I'm not furthering the conversation because I think I'm right or think your wrong etc, im just trying to make sure I understand the flaws.

"you need more space to store the stuff you store instead ("shapeXXX" instead of "shapeX")"

right, but thats all in the compression / decompression software, not that actual compressed data, right? person A and person B have huge ass library of shape definitions, but they only need to transfer data that tells them where to build what shapes.


If I take some hash stack, and find that it contains (assume perfect shapes here, no fancy rules):

200 squares, index 1-200
200 hexagons, index 1-200
200 16-gons, index 1-200


square are 1:4, hexagons 1:6, 16-gons 1:16 compression

data are transmitted as

(shape).(index)

so, we'll call each shape by its first letter

s.(index1).(index2).(index3).(index4) ->
h.(index1).(index2).(index3).(index4) ->
1.(index1).(index2).(index3).(index4) ->


Perhaps what you're getting at is the following - I don't fully understand how empty space in a data grid would be stored as data - does it take equally as much space to store the fact that a position in a grid is empty as it would to actually store a value?


That kinda makes sense to me. Because it turns things into a positional reference - i.e., take the string xxxxAxxxxxxAxxxxx
if we were to find a shape that included  the A's in the string, in order to reconstruct the hash stack, we might need to transfer the hash stack with "empty" values

so.

xxxx0xxxxxx0xxxxx

which, I guess yeah, it doesn't really lead to any reduction in data.

However, if this is what you're talking about, this could be overcome just by tweaking the reconstruction process.

The data grid is defined initially, so say we know we're starting with a 20 X 20 grid.
The shape array is read in, and all of the shape coordinate data is mapped onto the grid.
Then, the leftover data string (the original data for which there were no shapes), is read into the data grid sequentially into the empty spots. When the reconstruction hits a spot thats filled it just skips until the next empty spot.


so..

A B A
C D E
A F A

9 digits

would be transmitted with the following data:
data grid = 3 X 3
square, 1,1; 2; A
remainder data string BCDEF

3 3
s 2 1 1 A
BDEF

11 digits

data is reconstructed:
data grid built

. . .
. . .
. . .

shape read in

A . A
. . .
A . A

remainder data set read in

A B A
C D E
A F A

it seems that, yeah, this is an economy of scale thing.

A B A B N
C D E D Q
A B A B R
X D A D H

total digit size: 20


data grid: 5 X 4
shape: square, dimension: 2, coordinates: (1,1,A), (1,2,B), (2,2,D)
remainder data set: NCEQRXAH

which, at its digit minimum, could be represented as this
5 4
s 2 1 1 A 1 2 B 2 2 D
NCEQRXAH

which has a total digit size of 21

i really do think its a scale thing though. could be wrong, but who knows.

A B A B N A F A
C D E D Q B E B
A B A B R A R A
X D A D A B N B
U U H N J U O P
W T A D A D L Q

48 digits

8 6
s 2 1 1 6 1 3 4 A 1 2 6 2 B 2 2 D
NFCEQERRXNUUHNJUOPWTDDLQ

43 digits

43/48 = 10% compression.

granted, these are easily toy datasets, but with grids 64 digits wide by 320 tall (the number of transactions in block 338444), who knows?
legendary
Activity: 2618
Merit: 1007
Imagine you have 10 "shapes" and 1000 shapes: With 1000 shapes it is far more likely that you find one of these (e.g. shape 123 shows that 15 "3"s are arranged in a certain pattern). It however takes 3 times more space (1 digit for 10 shapes (0-9), 3 digits for 1000 shapes (0-999)) to store which shape should be used. So while you need less space to store the data itself ("shape123 at coordinates X/Y"), you need more space to store the stuff you store instead ("shapeXXX" instead of "shapeX"). If your data was completely random, no matter how you chose to describe it, it would always take the same amount of data for the description, no matter if you store it in cleartext or some other way.

this is where the application of this proposed "compression" gets called into question - as a compression algorithm, it will probably be slow and unwieldy. As a proof of work, it will be difficult and time consuming - but there will be 1 unique set of shapes that compresses the data the most.
This part is where I would request a formal proof... Wink

Maybe just work on a smaller scale (e.g. a 3x3 field with only 1s and 0s) and move forward from there, could be a great way to learn a bit of programming too! On that small scale, you can also do it with pen and paper - and it is still relatively easy to explore all 512 possible combinations and finding their unique smallest set of shapes.

Edit:
By the way, this kind of "work" would be a poor funtion for Proof of Work for Bitcoin related stuff because it is really hard to know if you have found the "final" solution. Yes, it is definitely not that hard to find some shapes - however if someone gives you some data and a few shapes that were found inside of it, how do you know that it actually is the best possible solution without trying to find it yourself by going through all possible shapes one by one? This is why Bitcoin for example uses cryptographic hashes that have to start with a bunch of zeroes, which are really difficult to find, but once you found them you just need to tell someone the input to the solution and they can very easily verify that the output really starts with a bunch of zeroes. With "shape compression" even if you have a bunch of shapes as the input, it is difficult to know if there still could be a better way to compress the data even more.
legendary
Activity: 1260
Merit: 1008
Thanks for the response!

I briefly checked out the wiki, will delve into it further... but in general, it seems that this compression type you would just need to define your "words" better in order to obtain true compression.

Agreed - the square provides minimal compression, but increases in compression (even with a square) might be possible if the same type of square exists n times. That way, the properties of the square are described once, and you just add indices to that type of square.

I imagine the data type to be position (or self deterministic - i.e., the decompression program determines the type of data between delimiters) as well, such that

(type).(cordinates).rule1.rule2.etc

i'm not sure I understand what you're saying here (I need to sit down more with it - headed out to a function pretty soon)

"if you use more shapes, you have more possibilities to find something that fits well but you need a larger encoding to decompress."

this is where the application of this proposed "compression" gets called into question - as a compression algorithm, it will probably be slow and unwieldy. As a proof of work, it will be difficult and time consuming - but there will be 1 unique set of shapes that compresses the data the most.

re: coding - I can barely make my way through linux Smiley maybe at some point I'll take another stab at code, but until then.... I just broadcast my ideas into the open source community to see what sticks

legendary
Activity: 2618
Merit: 1007
As far as I understand information theory:

If the input to this algorithm has very high entropy (e.g. output of a cryptographic hash function, encrypted data...), the amount of data you need to store the index of the shape to be used (e.g. "square with side length 4 starting at coordinate 0,5") will be in total as large as the size benefits you gain through compression. If you use a more limited library of shapes, you have fewer indices, but fewer chances to compress, if you use more shapes, you have more possibilities to find something that fits well but you need a larger encoding to decompress.

The thing that might still "save" you is called http://en.wikipedia.org/wiki/Kolmogorov_complexity - While it is easy to show that you need at max. about as many bytes as the input to algorithmically describe an output, it is not clear what the MINIMAL size of such a description would be.

Your idea at least looks very interesting, I am not sure if going forward with this geometric approach is going to work very well in code but it would be cool to see some small proofs of concept e.g. in Python or so. Smiley

Also I would recommend not to limit this to just Bitcoin/block chains - this has more to do with compression, the only thing that seems to fit to BTC well is that a lot of the data you're likely to encounter is at least already quite random, so you'll often see data that can't be compressed further with current methods.
legendary
Activity: 1260
Merit: 1008
So, here's a random idea I had. I have no idea if it even makes sense - a lot of the technical underpinnings of bitcoin is just black magic to me.

Basically, I got the idea because when I stare at grids of random numbers I see patterns. And at some point, a cryptocurrency block is a stack of hashes, which is a grid of random numbers.

If these hash stacks are indexed, and then considerable processing power is spent trying to find geometric shapes in the hash stacks, the amount of data required to communicate the hash stack can be reduced.

I realize that this type of work violates one of the rules of a proof of work - namely, you shouldn't be closer to completing the puzzle based on the amount of time spent working on the puzzle. So it might be a proof of work. Or it might be just a way to compress data.

2 primary suppositions:
Big blockchains not ideal for cryptocurrencies
ASICS not ideal for decentralization
(or you can ignore point 2 - asic debate is happening somewhere else I’m sure)

Solution: Incorporate proof of compression into the currency

What this exploits - blocks are made up of stacks of 64 character strings (or other equally sized strings).

These stacks can be encoded by shapes originating from their most north western coordinate.

A shape originates at its north west coordinate(node), and is defined by number of sides (edges), and the rules of the nodes. Node count starts at 0. As illustrated, their can be many rules (skip = skips a particular node; jump = start at node 0, then jump X nodes and apply rule, repeat; offset = doesn’t apply rule until a particular node). Shape build types can exist = the null argument is all the nodes are the same.










The POW might work slightly differently than conventional POW, because there is not a guarantee that a given block can be reduced to a target compression. (unlike the current bitcoin protocol, which can define a particular difficulty).

Thus, instead of a varying difficulty, perhaps at the end of block time (the amount of time the network has to encode a block), all the nodes communicate their solutions that are stamped with address, time, and compression ratio. The address with the best solution (the most compression) at the earliest timestamp receives the block award. The other nodes validate the solution, and the compressed block is added.

Difficulty can also be modified by allowing and disallowing certain types of shapes.

Actual compression is unknown. Compression increases if their are multiples of the same shape - the same rules apply, but to different coordinates.

Simple, perfect shapes give small compression (a square is roughly 1:4). Greater compression is achieved with many-sided shapes (16 sided polygon is potentially 1:16). Shapes can be complex aren’t don’t need to be closed - a spiral could be made where the length increases by 1 at every node.


It can be seen that the shape library can be huge. But this is on the fixed side of the equation - i.e., if person A and person B both have the compression / decompression client with the huge shape definition library, they can send smaller shape indexes.
Jump to: