Non-bitcoin cryptography question - page 2.

DannyHamilton

legendary

Activity: 3528

Merit: 4945

Quote from: CIYAM on February 09, 2016, 09:10:13 AM

- snip -
If you don't want Alice to keep the data after broadcasting it to Brian then I think you're probably SOL
- snip -

Quote from: piotr_n on February 09, 2016, 09:33:35 AM

- snip -
well, good luck with that!

IMHO, it can't be done.

That's what I figured. I personally couldn't think of a way that it could be done, but thought it was worth asking in case there was something I wasn't aware of.

My only hope was there was some sort of function that I was unaware of such that:

Given:
DataSet = DataSet_A + DataSet_B + DataSet_C + DataSet_D

We could say that:
Func(DataSet) = Func(DataSet_A) + Func(DataSet_B) + Func(DataSet_C) + Func(DataSet_D)

Where:
The size of Func(X) is significantly less than X
Func(X) results in the same output everytime you use the same input
When X and Y are not identical, there is an extremely low chance that Func(X) would ever have the same result as Func(Y)

Notice that the "Where" section sounds exactly like a cryptographically strong hash, but I'd need a hash function that also had the attribute listed under "We could say that".

I can think of a very weak "hash" that meets the needs of the first part, but it isn't strong enough for the second part:

Convert the data to a binary representation
Add up all the 1's in the dataset
Sign the resulting value

Now if you do the same for the subsets, you'll find that I only need to store the quantity of 1's in each subset. I can now show that the sum of the signed quantities of 1's in the subsets is equal to the quantity of 1's that the data provider signed.

Unfortunately, this weak example is useless because it's to easy to move 1's around in the data instead of adding or removing them. Which means it fails the "When X and Y are not identical, there is an extremely low chance that Func(X) would ever have the same result as Func(Y)" test.

I need something better, but I don't know if something better exists. I've just got this nagging feeling in the back of my head that I saw something like what I want back in the early 90's. I'm probably worng and thinking of something else, but had to ask just in case.

piotr_n

legendary

Activity: 2058

Merit: 1416

aka tonikt

Quote from: DannyHamilton on February 09, 2016, 08:57:20 AM

Quote from: piotr_n on February 09, 2016, 04:00:50 AM

Make the provider to sign each N-KB block of the data - publish these signatures.
Then split the data using (a multiple of) the N-KB sizes.

There is a set of rules that need to be followed when providing the subsets. I can't count on those rules always resulting in alignment with N-KB blocks. The data provider doesn't know or care what the rules are for creating the subsets. They won't be breaking it up for me, that's what I'm supposed to be doing.

OK.. so you need to
1) get a big file created by the source party and signed with the source's private key.
2) cut it into pieces, signing each of the pieces with your private key, then distribute these signatures, as well as the content of the pieces.
3) figure out how a third party that would receive any of these pieces can make sure that it represents an actual piece of the original file, while not trusting you, nor having an access to the rest of the pieces

and all that without any cooperation from the source party...?

well, good luck with that!

IMHO, it can't be done.

CIYAM

legendary

Activity: 1890

Merit: 1089

Ian Knowles - CIYAM Lead Developer

Quote from: DannyHamilton on February 09, 2016, 08:57:20 AM

I'm trying to protect Brain against Alice lying. In your example Alice could fix the data on her end, and then send a hash of the fixed data. Cathy would think that Brian altered the data since the data she received doesn't match the data that Alice claims she sent to Brian. Brian needs to be able to prove that Alice changed the data.

Hmm... what I think you'd need to have happen is that Alice would need to be given the information about how the data was divided up and then sign a merkle tree of that information (which basically means that Alice is also doing the dividing up the same way to check the Merkle tree, however, assuming the work that determined the dividing up of the data up was much harder than processing a list of offsets then perhaps that isn't too much of a problem as Alice has all the data in the first place).

If you don't want Alice to keep the data after broadcasting it to Brian then I think you're probably SOL but if Alice can keep the information just long enough to sign off on the Merkle tree then you should be okay.

DannyHamilton

legendary

Activity: 3528

Merit: 4945

I thought that I explained the issue pretty well in my OP, but I'm getting answers that don't meet the requirements. Perhaps I didn't do as good of a job explaining as I thought I did.

I really don't think there is a solution where I don't need to store the original data, but I just wanted to make sure.

Here's why the 3 most recent suggestions don't meet my needs:

Quote from: xqus on February 09, 2016, 03:30:47 AM

- snip -
Then to prove it was not altered you would receive the original data, chunk it like you did, and prove that the signature matches
- snip -

Yes, I'm already aware that I can accomplish what I'm trying to do if I have access to all the original data. That's going to be a huge amount of data. If I need to store the original data forever, that's going to be a bit of a hassle. I'm trying to figure out if there's a way to store less.

I *think* if the data provider knew ahead of time (or if I could commnicate back to them) how the subset would be created, then I could just have them generate a merkle tree and sign the root. However, at the moment I'm unlikely to be able to have them generate that for me.

Quote from: CIYAM on February 09, 2016, 03:41:08 AM

Alice provides X amount of data which is encrypted and sent to
Brian who then takes a decrypted portion Y of this data and encrypts that and sends it to
Cathy who wants to know whether or not Alice really did send this data Y to Brian in the first place?

That sounds about right. In this case, I'm Brian. I want to be able to prove to Cathy, Alice, and anyone else that the data I gave Cathy was received in a larger set from Alice.

Quote from: CIYAM on February 09, 2016, 03:41:08 AM

Assuming that I have understood this then approach that I would take would involve Alice and Cathy communicating (if you do not wish that to occur then I think it is going to be very tricky and especially if the data is being broken up arbitrarily).

The data is being broken into subsets according to a set of rules. The rules are different for each type of data received, but they are always consistent for that type of data. So it isn't entirely arbitrary, but Alice doesn't know or care what the rules are for creating the subsets. She won't be breaking it up for me, that's what I'm supposed to be doing.

Quote from: CIYAM on February 09, 2016, 03:41:08 AM

If Brian gave you

I am Brian in this example, so he doesn't need to "give me" anything.

Quote from: CIYAM on February 09, 2016, 03:41:08 AM

some meta data (such as the offset in X that Y comes from) then without revealing the data Cathy could send this meta data to Alice (along with the length of Y) who could then send back a hash of Y which Cathy verifies.

I'm trying to protect Brain against Alice lying. In your example Alice could fix the data on her end, and then send a hash of the fixed data. Cathy would think that Brian altered the data since the data she received doesn't match the data that Alice claims she sent to Brian. Brian needs to be able to prove that Alice changed the data.

Quote from: piotr_n on February 09, 2016, 04:00:50 AM

Make the provider to sign each N-KB block of the data - publish these signatures.
Then split the data using (a multiple of) the N-KB sizes.

There is a set of rules that need to be followed when providing the subsets. I can't count on those rules always resulting in alignment with N-KB blocks. The data provider doesn't know or care what the rules are for creating the subsets. They won't be breaking it up for me, that's what I'm supposed to be doing.

piotr_n

legendary

Activity: 2058

Merit: 1416

aka tonikt

Quote from: DannyHamilton on February 08, 2016, 05:22:30 PM

tl;dr
So, if the provider doesn't know how the data will be broken up into subsets, is there any way I can store proof that the subset data came from the source data without needing to store all the source data itself?

Make the provider to sign each N-KB block of the data - publish these signatures.
Then split the data using (a multiple of) the N-KB sizes.

CIYAM

legendary

Activity: 1890

Merit: 1089

Ian Knowles - CIYAM Lead Developer

Let me see if I've understood the problem correctly:

Alice provides X amount of data which is encrypted and sent to
Brian who then takes a decrypted portion Y of this data and encrypts that and sends it to
Cathy who wants to know whether or not Alice really did send this data Y to Brian in the first place?

Assuming that I have understood this then approach that I would take would involve Alice and Cathy communicating (if you do not wish that to occur then I think it is going to be very tricky and especially if the data is being broken up arbitrarily).

If Brian gave you some meta data (such as the offset in X that Y comes from) then without revealing the data Cathy could send this meta data to Alice (along with the length of Y) who could then send back a hash of Y which Cathy verifies.

xqus

full member

Activity: 172

Merit: 100

I think it would be difficult. As long as they sign the whole set of data with one signature, and you break it up in parts before you sign it again.

Is it possible that you collect the data back from the receipts and then assemble it again, and then you can prove that it was bad when you received it?

If not I think you will need to store all the data, and prove it that way.

As far as I know there is no way to prove that for example 4 hashes would have to origin from signature X, or hash Y.

Maybe you could create a HMAC signature of the original data, signed with the hash of the chunked data. And then include that hash, along with some meta data and send that (signed) to both parts.
Then to prove it was not altered you would receive the original data, chunk it like you did, and prove that the signature matches, and that the chunked data was not altered.

DannyHamilton

legendary

Activity: 3528

Merit: 4945

Not sure if this is the right sub-forum for this thread, but I couldn't find a better place and when I asked in Meta, this is where I was directed to...

Lets say I want to receive a large collection of data from a data provider on a regular basis (perhaps daily).

We've both generated asymmetric keys and are certain that we have each other's public key (exchanged it in person).

They will be encrypting the data with my public key so that the only two entities that have control over who sees that data are themselves and me.

They will also be signing a SHA256 hash of the data with their private key each time they provide it so that I can be certain that the data I receive is actually from them and not an impostor.

So far, so good.

Now, lets say I'm authorized to de-crypt the data, break it up into smaller chunks (according to a set of rules), sign each chunk with my own private key, and re-encrypt each of these smaller chunks of data with the public keys of people with whom I'll be sharing these sub-sets of data.

If I've got this right, I can verify that the data I received was from the sender, and the final recipients can all verify that the data they receive is actually from me. Each recipient of the data I send can only see the data that has been encrypted with their public key and can't see any of the data that I provide to anyone else (unless that other entity or the originating entity chooses to share it).

Great, now, here's the problem I'm trying to solve and I'm wondering if there are any cryptographic techniques (tricks?) that can help.

Lets say the data provider sends me some sort of bad data. I never look at the contents of the data and have no way of knowing if it's accurate or not even if I did. I simply pass the bad data on in the sub-sets that I send out. The recipient of a bad sub-set incurs a significant loss due to decisions made based on this bad data. They contact (sue?) the data provider to attempt to get compensated for their loss.

Under the process that I've described so far, if I don't store all the data that the data provider ever sends me in perpetuity, the provider could theoretically claim that they gave me the correct data and that some bug in my process must have introduced that bad data. If they falsify their source data, they might even be successful in hiding the fact that they were the source of bad data. Unless I can prove that the data I sent was actually the data that I recieved, I could end up looking like the party that messed up.

tl;dr
So, if the provider doesn't know how the data will be broken up into subsets, is there any way I can store proof that the subset data came from the source data without needing to store all the source data itself?

Knowing too little about cryptography to know if my following example even makes sense, my thought was that maybe there'd be some other hash instead of SHA256 that I could use. Something where I could ask the provider to sign an XYZ hash value of the full set before sending it to me, and then as long as I store an XYZ hash of each of the subsets that I create, it could be proven that my XYZ hash was based on data that was also included in the providers XYZ hash.

If the provider knew how the subsets were going to be created, then I think a merkle tree of SHA256 hashes would work. The merkle root being the representation of all the data, with a separate SHA256 of each sub-set. However, I don't have the ability to go back and request the signature AFTER I've broken up the data, and the provider won't know the rules I'm using to break it up.

So, I think I'm looking for something similar to a merkle tree without the creator of the "root" knowing exactly how the branches will be broken up, although I'm open to any other concepts.

Topic: Non-bitcoin cryptography question - page 2. (Read 2328 times)