Pages:
Author

Topic: This message was too old and has been purged (Read 2345 times)

sr. member
Activity: 378
Merit: 250
February 12, 2014, 08:20:55 AM
#40
MP, I agree. The mathmeticians are focused on "proofs", but they assume that the proof is correct, which may not
be the case if someone erred creating & checking that proof. And the proof may not apply to this particular situation.

They said we could never land on the moon, but we did!  :-D

Keep thinking outside the box.
newbie
Activity: 25
Merit: 0
February 12, 2014, 07:22:55 AM
#39
Your time would be better spent studying basic math. Failing to find an implementation isn't going to teach you any of the things you're missing here.

No need to be insulting, I would wager that most here have studied "basic" math.  The fact remains that compression is used and works everyday.  Math is a tool, this is a challenge.  We don't have to compress every number in existence, we only need to find the best way to compress 4.  If they happen to be 4 identical numbers, then the problem is trivial.  The only thing that is ultimately required is to either find or create an exploitable relationship between the numbers.

Perhaps your time would be better spent studying basic history, because history is just as "full" of things that _got_ done as it is of people who made proclamations about what couldn't be done.  Even if *I* can't do it, it still doesn't mean that it can't be done.

Btw, in my first attempt, I already using a modified huffman encoding scheme figured out how to consistently represent 128 bits in about 85.

0, 11, 100, 101

MP



legendary
Activity: 1260
Merit: 1168
February 12, 2014, 03:28:05 AM
#38
This message was too old and has been purged
full member
Activity: 144
Merit: 100
February 12, 2014, 02:48:18 AM
#37
Your time would be better spent studying basic math. Failing to find an implementation isn't going to teach you any of the things you're missing here.
newbie
Activity: 25
Merit: 0
February 11, 2014, 09:24:39 PM
#36
I understand the point about the folly of trying to use 2^32 things to represent 2^64 things.  I get that, however the point I come to is that we're not looking for a way to express 2^64 things using 2^32 things.  We are looking for a way to express 4 out of 2^64 things.  If you had a data stream that consisted of 2^64 unique symbols, I agree that there is no way to express that stream using less than 2^64 symbols...

BUT

We have a box of 2^64 symbols, and we are looking for a way to describe 4 of them as independently of the box they came from as we can.  If we can do that, then we ultimately only need 4 symbols.

0 1 2 3 4 5 6 7 8 9 A B C D E F

16 is the size of our box, we select 1, 5, 9 and D.  If we predetermine between the encoder and the decoder that 4x+1 is the magic incantation, then we can "compress" a 4 bit code into a 3 bit code  It doesn't matter that the box holds 16 symbols in total.  What matters is the relationship that is "understood" between encoder and decoder.

Yes, EK has given us numbers that can range from 0 to 2^32-1, but we only have to "store" 4 of them and if he gives us a different 4, we can identify a different relationship.  So I'm not disputing anyone's proof, I just currently remain unconvinced that the proofs presented necessarily describe and apply to *this* problem.

Mind you, that doesn't mean that I have an answer for this challenge... but it does mean that I'll probably have to programmatically explore some ideas before I convince myself that it can't be done.

MP
full member
Activity: 238
Merit: 100
February 11, 2014, 06:10:03 PM
#35
The set of possible inputs has 2^128 values. You're trying to find a way to map them one-to-one into 2^64 "compressed" representations. This is obviously impossible, so you're trying things that are complicated enough that you can't see why they won't work.

I disagree on that. Let us create a trivial and (stupid) counter example.

- Imagine you have a super computer with a lot of ram (several petabytes)
- Further imagine you have all 2^128 possible combinations lying around in a binary tree structure in your memory.
- the tree would (according to physical laws) have a height of log2(2^128) = 128.
- So you can reach each of those 2^128 entries in at most 128 steps.

Thus: Each of the 2^128 possibilities could be represented by a 8 bit long number.  Or am I completely misunderstanding something? Not sure, I am having a bottle of wine at the moment.

Each branch takes one bit, so you would need (surprise) 128 bits to specify the final leaf.  (Not an actual surprise Smiley)
legendary
Activity: 1260
Merit: 1168
February 11, 2014, 06:06:23 PM
#34
This message was too old and has been purged
full member
Activity: 238
Merit: 100
February 11, 2014, 06:04:19 PM
#33
Also, note that neither rxd2's proof by contradiction nor 12648430's reductio ad absurdum depend in any way on the actual method you use to compress the data.  You're trying to fit 40 pounds of manure in a 20 pound sack, failing, and going "Hmm, but what if I filled it with a hose from below..."
full member
Activity: 144
Merit: 100
February 11, 2014, 05:50:44 PM
#32
The set of possible inputs has 2^128 values. You're trying to find a way to map them one-to-one into 2^64 "compressed" representations. This is obviously impossible, so you're trying things that are complicated enough that you can't see why they won't work.
newbie
Activity: 25
Merit: 0
February 11, 2014, 12:51:09 PM
#31
Every number along the spiral could be reached using just two numbers: a radius and theta.

That's because the radius and theta are analogue values. You cannot store every one of those infinite points digitally, even if you use a radius and theta. I'm not ready to commit to saying that this bounty is impossible, but it's beginning to feel like it.

My point is that unless the data stream consists of the entire sample space/list of all possible values, (which in this case, it doesn't) you will never need to be able to store all of those infinite points digitally.  If given 4 numbers, you can identify or create a relationship between those numbers, then you can potentially transmit the relationship as opposed to the data.  In some cases this may be far less data to store or transmit.

For instance, if both sides or rather the encoder and decoder "understood" that the data consisted only of powers of 2, then you could send the data element 256 as 8, a reduction from 9 bits to just 4 (8 bits handles values 0-$FF, $100 is 9 bits, likewise 3 bits handles values 0-$7, 4 bits is required to express the number 8  ). To E-K's point concerning compression, I would propose that this is exactly a compression problem because although your numbers can be in the range of 0-2^32-1, you're only using 4 numbers from that set.  If perfect knowledge existed between encoder and decoder (which of course it doesn't), then you could ultimately express each number using 2 bits: 00, 01, 10, 11

Anyway, I got the spiral idea contemplating nature, spirals, and fractals.  I understand your point about analog vs digital addressing but this problem doesn't have to be solved in terms of ones and zeros.  We can use analogs such as any positive value is 1 and any negative value is 0, or any value greater than or equal to 0.5 is 1 and any value less than 0.5 is 0.  The attribute about the spiral that I like is that the further way from the center you get, the greater the distance between targets which means that you can be "closer" to a target with less resolution in your coordinate system.

I'm not willing to say that this will work, I'm just wondering if we can get around some of the well established limitations by looking at the problem differently and checking to see whether the limitations involved MUST apply to our case.

MP
sr. member
Activity: 479
Merit: 252
You, Me, and BTC: *Your* Liberty & Bitcoin Podcast
February 11, 2014, 10:16:59 AM
#30
Every number along the spiral could be reached using just two numbers: a radius and theta.

That's because the radius and theta are analogue values. You cannot store every one of those infinite points digitally, even if you use a radius and theta. I'm not ready to commit to saying that this bounty is impossible, but it's beginning to feel like it.
legendary
Activity: 1260
Merit: 1168
February 11, 2014, 10:12:51 AM
#29
This message was too old and has been purged
sr. member
Activity: 406
Merit: 250
February 11, 2014, 08:31:45 AM
#28
Here is how to store 2 integers in 1, taken off a site. It's programmed in C.
It can be modified to store 4 as 2. Only problem is that we have to assume that integers are less then 65535.
Code:
void take2IntegersAsOne(int x)
{
   // int1 is stored in the bottom half of x, so take just that part.
   int int1 = x & 0xFFFF; 

   // int2 is stored in the top half of x, so slide that part of the number
   // into the bottom half, and take just that part.
   int int2 = (x >> 16) & 0xFFFF

   // use int1 and int2 here. They must both be less than 0xFFFF of 65535 in decimal

}


void pass2()
{
  int int1 = 345;
  int int2 = 2342;
  take2Integers( int1 | (int2 << 16) );
}


Another solution:
Let's say I have a function that takes an X and Y coordinate, and returns a longint representing that Point's linear value. I tend to call this linearization of 2d data:
Code:
public long asLong(int x, int y) {
  return ( ((long)x) << 32 ) | y;
}

public int getX(long location) {
  return (int)((location >> 32) & 0xFF);
}

public int getY(long location) {
  return (int)(location & 0xFF);
}
Forgive me if I'm paranoid about order of operations, sometimes other operations are greedier than <<, causing things to shift further than they should.

Why does this work? When might it fail? It's convenient that integers tend to be exactly half the size of longints. What we're doing is casting x to a long, shifting it left until it sits entirely to the left of y, and then doing a union operation (OR) to combine the bits of both.

Let's pretend they're 4-bit numbers being combined into an 8-bit number:
Code:
x = 14     :      1110
y =  5     :      0101

x = x << 4 : 1110 0000

p = x | y  : 1110 0000
           OR     0101
             ---------
             1110 0101
Meanwhile, the reverse:
Code:
p = 229    : 1110 0101  
x = p >> 4 : 1111 1110  //depending on your language and data type, sign extension
                        //can cause the bits to smear on the left side as they're
                        //shifted, as shown here. Doesn't happen in unsigned types
x = x & 0xF:
             1111 1110
         AND 0000 1111
         -------------
             0000 1110  //AND selects only the bits we have in common

y = p & 0xF:
             1110 0101
         AND 0000 1111
         -------------
             0000 0101  //AND strikes again
This sort of approach came into being a long time ago, in environments that needed to squeeze every bit out of their storage or transmission space. If you're not on an embedded system or immediately packing this data for transmission over a network, the practicality of this whole procedure starts to break down really rapidly:

It's way too much work just for boxing a return value that almost always immediately needs to be unboxed and read by the caller. That's kind of like digging a hole and then filling it in.
It greatly reduces your code readability. "What type is returned?" Uh... an int.. and another int... in a long.
It can introduce hard-to-trace bugs down the line. For instance, if you use unsigned types and ignore the sign extension, then later on migrate to a platform that causes those types to go two's complement. If you save off the longint, and try to read it later in another part of your code, you might hit an off-by-one error on the bitshift and spend an hour debugging your function only to find out it's the parameter that's wrong.
If it's so bad, what are the alternatives?

This is why people were asking you about your language. Ideally, if you're in something like C or C++, it'd be best to say
Code:
struct Point { int x; int y; };

public Point getPosition() {
  struct Point result = { 14,5 };
  return result;
}
Otherwise, in HLLs like Java, you might wind up with an inner class to achieve the same functionality:

public class Example { public class Point { public int x; public int y; public Point(int x, int y) { this.x=x; this.y=y; } }
Code:
public Point getPosition() {
  return new Point(14,5);
}
In this case, getPosition returns an Example.Point - if you keep using Point often, promote it to a full class of its own. In fact, java.awt has several Point classes already, including Point and Point.Float

Finally, many modern languages now have syntactic sugar for either boxing multiple values into tuples or directly returning multiple values from a function. This is kind of a last resort. In my experience, any time you pretend that data isn't what it is, you wind up with problems down the line. But if your method absolutely must return two numbers that really aren't part of the same data at all, tuples or arrays are the way to go.

The reference for the c++ stdlib tuple can be found at http://www.cplusplus.com/reference/std/tuple/
sr. member
Activity: 378
Merit: 250
February 11, 2014, 06:47:31 AM
#27
Well, even if the orig challenge is ultimately not possible, at least it spurred a good discussion.  :-D
newbie
Activity: 25
Merit: 0
February 11, 2014, 12:35:48 AM
#26
A radius and theta specify INFINITE points. You save as much space as the information you throw away.

You are literally trying to prove that a certain set of integers is a strict subset of itself.
[/quote


All I'm saying is that if you impose a pattern on the data set:


        E D C
     F   3  2  B
       4      1  A
       5    0   9
         6 7 8

Starting at the center of the spiral, every marked position representing a possible data output can be uniquely "touched"/addressed using an angle and a distance.  The spiral can be wound tightly or loosely.  Furthermore, since it can be computed on both sides, as a dictionary, it doesn't have to be transmitted with the data stream.  Whether dealing with 2 values or 4 values, they could be composited into a single value that has an address on the spiral.

I simply haven't proven to myself that expressing data as addresses on a spiral (for instance) won't yield a signficant savings.  I have thought about the fact that I may end up loosing space to account for  the level of precision I might need, but such a system is still somewhat forgiving because my (r, theta - as measured from the center) need only land me to point where the target value is the value closest to where I end up.  So if my desired output is "F" (15), then the angle and distance I communicate doesn't need to land me on top of "F", it only needs to land me closer to F than E, 3, 4, or any other value on the spiral.  I like this approach because it is graphic (I can see and attempt to wrap my mind around it) and fuzzy (can yield integer values without using integer addresses).

Anyway,

Just thinking out loud...

MP
full member
Activity: 144
Merit: 100
February 10, 2014, 10:39:23 PM
#25
A radius and theta specify INFINITE points. You save as much space as the information you throw away.

You are literally trying to prove that a certain set of integers is a strict subset of itself.
newbie
Activity: 25
Merit: 0
February 10, 2014, 08:37:18 PM
#24
The most promising approach in my opinion is dictionary based compression which I believe is the basis of Lempel-Ziv compression.  There however the compression is ultimately limited by the size of the dictionary.

There is already a minimal-sized dictionary of the first 2^128 integers: the first 2^128 integers.

Compression feels like it should work because in everyday use, it usually does -- but that's because files generally have patterns, which can be taken advantage of. Compression cannot reduce space taken for some inputs without increasing space taken for others -- it can reduce the minimum required space, but cannot reduce the maximum without restricting the domain.

People have already posted multiple proofs that this cannot be done. The gist is this: there are 2^128 possible sets of 4 integers. If you "compressed" them into 2^64 possible compressed versions, you cannot have a compressed version for every input.

Yes, I get your point but you are correct in that compression does "feel" right.  We're ultimately talking storage.  One might say that 11111111 as characters occupies more storage than 255 stored as characters, which is more than FF when stored as characters, yet when stored as bits it occupies less space than all of these.

The function 3x+2y= 14, describes an infinite number of points.  So I don't think it's necessarily about finding ways to represent 4 numbers using 2, but finding for *each* of the 4 numbers a way to represent them using 2 numbers that can be reversed.  When first contemplating the problem I imagined a spiral

R= a+b*theta

Imagine all of the possible integers in question evenly spaced along the spiral as it wraps around.

Every number along the spiral could be reached using just two numbers: a radius and theta.

Perhaps in the end there wouldn't be much savings because of the resolution you'd need to express tiny angles,  but as a thought exercise, it seemed reasonable.

MP
sr. member
Activity: 434
Merit: 250
February 10, 2014, 08:20:18 PM
#23
I doubt this is possible. If someone can make it then he/she can sell it for much more.
He is asking for a perfect lose-less compression method which is a holy grail.
Imagine you can do that, you will be able to make 10TB fit in as little as 2 unsigned integers.
The request here is basically a guaranteed 50% or more lose-less compression.
(a philosopher's stone is a better analogy)

full member
Activity: 144
Merit: 100
February 10, 2014, 04:53:03 PM
#22
The most promising approach in my opinion is dictionary based compression which I believe is the basis of Lempel-Ziv compression.  There however the compression is ultimately limited by the size of the dictionary.

There is already a minimal-sized dictionary of the first 2^128 integers: the first 2^128 integers.

Compression feels like it should work because in everyday use, it usually does -- but that's because files generally have patterns, which can be taken advantage of. Compression cannot reduce space taken for some inputs without increasing space taken for others -- it can reduce the minimum required space, but cannot reduce the maximum without restricting the domain.

People have already posted multiple proofs that this cannot be done. The gist is this: there are 2^128 possible sets of 4 integers. If you "compressed" them into 2^64 possible compressed versions, you cannot have a compressed version for every input.
newbie
Activity: 25
Merit: 0
February 10, 2014, 07:45:46 AM
#21
Hi all,

While I agree that it _probably_ can't be done, part of me entertains that it possibly can be done.  To me this seems like a basic compression problem.  At first blush, I contemplated using variable length codes organized according to a frequency distribution of the data taken to bits at a time.  This would basically have been huffman encoding where upon evaluating the bit patterns 00,01,10,11 the variable length codes would have looked like 0, 11, 100, 101.  Thus the most frequent bit pattern would be coded as 0, the next most frequent bit pattern 11, and the least frequent bit patterns either 100, or 101.

Unfortunately, this had a best case scenario of 50% compression+the overhead of the frequency table required to map the 2 bit tokens to their variable length equivalents.

The most promising approach in my opinion is dictionary based compression which I believe is the basis of Lempel-Ziv compression.  There however the compression is ultimately limited by the size of the dictionary.

All case however revolve around identifying patterns that allow you to shrink the communicated sample space such that you are counting patterns of data as opposed to individual data items.

MP
Pages:
Jump to: