Non-bitcoin cryptography question | Bitcointalksearch.org

Quickseller

copper member

Activity: 2996

Merit: 2374

(I skipped over reading replies after #15 as they are much too technologically advanced for me to understand what is being discussed)

The problem described in the OP sounds a lot like the problem that torrent clients have when they wish to download a large file from a number of "servers" (I don't think this is the proper terminology in regards to torrents, however by "server" I mean from a host that is uploading a file to other peers) with an unknown reputation. When a torrent client wishes to download a file "x" they will download chunks of the file from a number of servers, and they need to know that the individual chunks of file "x", when put together will actually be a corrupt version of file "x".

I would suggest that you look into how the BitTorrent protocol works, and I believe you will be closer to your solution. My brief research lead me to this research paper, and I believe looking at the below references of this paper may give you additional information for what you are looking for:

Quote

[17] S. Knox, D. O’Cearuil, N. S. Holland, and L. Skrba. “Technology Survey: BitTorrent.”
http://ntrg.cs.tcd.ie/undergrad/4ba2.05/group4/index.html

[30] J. A. Pouwelse, P. Garbacki, D. H. Epema, and H. J. Sips, “The BitTorrent P2P File-sharing
System: Measurements and Analysis,” 4th In’ll Workshop on Peer-to-Peer Systems
(IPTPS’05), 2005.

One other thing that I would point out from a non-technical prospective is that a non-technical way to get around your problem is to make an agreement with the entities that you are providing the data to that you would not be held liable for any losses as a result of using such data and/or that you are providing the data on an "as is" basis.

I am unsure if your data provider would be willing to accept potentially liability from your customers in the event they were to make some mistake in sending data to you.

TPTB_need_war

sr. member

Activity: 420

Merit: 262

Quote from: jl777 on February 18, 2016, 08:56:12 PM

The hashing of raw data will probably take 90%+ of the resources.

Not if each chunk not larger than the hash bit width, e.g. 128 or 256-bit. I had stated the chunks could optionally even be each character or word.

Quote from: jl777 on February 18, 2016, 08:56:12 PM

Your merkle tree would save space, at the cost of more complexity and CPU time.

For verification by the final recipient, the Merkel tree should be faster if the hash computation time is on par with your fmul+expand, since fewer operations will done, because there are fewer operations due to proving only the subtrees, not every leaf as in your design.

Quote from: jl777 on February 18, 2016, 08:56:12 PM

especially the usage of merkletrees for encrypted data transmission

Yeah Merkel trees are another important algorithm in the toolset.

Apparently Ralph Merkel also invented public key cryptography in the 1970s but the journal editors rejected his white paper:

Quote from: http://www.merkle.com/

My first paper (and, in fact, the first paper) on public key cryptography was submitted in 1974 and initially rejected by an unknown "cryptography expert" because it was "...not in the main stream of present cryptography thinking...."

jl777

legendary

Activity: 1176

Merit: 1134

Quote from: TPTB_need_war on February 18, 2016, 07:46:51 PM

I wrote blake2, not sha256. Blake2 is 3 - 5X faster than SHA256. I doubt one needs 256-security for this so 128-bit hashes probably suffice, so that would be another doubling in speed.

I think some CPUs now have hardware acceleration for SHA256.

Btw, I wasn't discrediting your contribution. I was just offering another option and comparing the tradeoffs.

The hashing of raw data will probably take 90%+ of the resources. How exactly the signing and verification of hashes is done is not time sensitive, unless you use really, really slow methods.

My 100 nanoseconds processing per hash is really, really fast, at the cost of 32 bytes per sblock. Your merkle tree would save space, at the cost of more complexity and CPU time. Which method is better cannot be determined in a vacuum, it all depends on the specific requirements of the user.

Also note that my method will work with any hash function that produces 256 per hash, it is totally agnostic whether it is sha256, blake or rot13.

TB data -> hashing -> fmuls

I just did the "-> fmuls" part, with a stub sha256 as the input as the OP stated sha256 was used

James

P.S. This is an intriguing problem and gives me ideas for other usecases, especially the usage of merkletrees for encrypted data transmission

TPTB_need_war

sr. member

Activity: 420

Merit: 262

I wrote blake2, not sha256. Blake2 is 3 - 5X faster than SHA256. I doubt one needs 256-security for this so 128-bit hashes probably suffice, so that would be another doubling in speed.

I think some CPUs now have hardware acceleration for SHA256.

Btw, I wasn't discrediting your contribution. I was just offering another option and comparing the tradeoffs.

jl777

legendary

Activity: 1176

Merit: 1134

Quote from: TPTB_need_war on February 18, 2016, 03:57:09 AM

Quote from: jl777 on February 17, 2016, 04:57:02 PM

in curve25519 field the fmul is the fast operator, millions per second, and is typically called the "addition". both addition and multiply are order independent with unit element and I avoid the zero, so no issues with additive zero vs multiplicative zero

the "multiply" is cmult (complex multiply with montgomery)
even with cmult it does hundreds of thousands per second.

also, by keeping at the 320 bit polynomial form for the fmul, there is only the one fexpand per field element and one fcontract at the end

My guess is that the fmul will be speed competitive with sha256 of the merkle leaves

Blake2 can do apparently several hundred thousand hashes per second per thread roughly guesstimated for an Intel Xeon.

Since you have to hash each leaf before you multiply, then the multiplies need to be faster then the rest of the Merkel tree other than leaves. Also my other point (you quoted me before I edited my prior post) is that for the intermerdiary and recipient, the Merkel tree is more space efficient because not all the leaves have to provided with the proof.

So seems roughly on par speed wise but Merkel is more space efficient when only a portion of the leaves (i.e. the chunks) are being proved/provided to the recipient.

Even though I am pretty good at estimating speed of algorithms, I prefer to benchmark based on actual code running vs hypotheticals. It is very easy to overlook effects of the various levels of caching that happens in CPU and system and HDD.

without compiler optimizations:
10000000 expand/fmuls in 2559.064 milliseconds
10000000 sha256(32bytes) in 23867.496 milliseconds

with -O2
10000000 expand/fmuls in 1067.811 milliseconds, about 100 nanoseconds each
10000000 sha256(32bytes) in 7372.751 milliseconds, about 737 nanoseconds
1000 sha256(1MBbytes) in 8808.173 milliseconds, about 8.8 million nanoseconds

With -O2 optimizations, I am getting almost 1.4 million sha256 per second on my blazing fast 1.4Ghz i5. The sha256 of sblock takes almost 9 milliseconds, so just a bit more than 100/second
and 10 million expand/fmul's per second

"seems roughly on par speed wise" not sure what you mean by on par, but 7x to 10x faster seems a different game of golf. And it is literally 4 orders of magnitude faster than the sha256 of 1MB, so clearly some faster hash function is needed for doing TB datasets, if the blake is much faster and is cryptographically strong enough regarding collisions, sounds like a good improvment. But I just cant see anything approaching the 100 nanoseconds speed.

As I estimated, the cost for doing all the field mults is insignificant compared to the sha256 of only 32 bytes(!) or even the HDD data transfer.

As usual it is a speed vs time tradeoff. If you have plenty of time and not much HDD space and dont mind things 10x slower, then merkle tree. [merkle trees add a layer of complication and chance for bugs and also make it harder to explain] If you dont mind using 32 bytes per sblock (1MB) and want the 10 million ops per second, then the fmuls. Keep in mind this was just a lunch break first pass. If space is really so important, then I think I can make it smaller, but really at this point need to hear from OP if this even solves the problem at hand and being 10x faster than sha256 of 32 bytes, I dont think anybody can say the multiplies are too slow.

This is the problem with assuming things that everybody "knows". Sure, use that as a guideline, but I believe my experimental results far more than what is supposed to happen.

That is how I achieve 30 minute sync time for BTC blockchain from scratch. Not a week, not a day, just 30 minutes. It is on a 32GB VPS server with 1gbps connection, but still I think it is a very good result. I used to think BTC blockchain size was a very big problem, now it seems to be a long term issue with time enough to solve. Moore's law wont help, but time to create more efficient solutions will.

At first I thought some sort of ZK with pederson commitments was needed for this (probably since that is what I was looking at for something else). But since I only had half hour for lunch, I tried to come up with something I can code in that time, and a simple set of multiplies was my limit. Hopefully it works. For now I dont have any more time for this, at least till I hear from the OP

James

http://docs.supernet.org/ has first pass docs on the SuperNET API, I used the code from it to create the above solution. It is still a work in progress, but gives a good overview of the scope

TPTB_need_war

sr. member

Activity: 420

Merit: 262

Quote from: jl777 on February 17, 2016, 04:57:02 PM

in curve25519 field the fmul is the fast operator, millions per second, and is typically called the "addition". both addition and multiply are order independent with unit element and I avoid the zero, so no issues with additive zero vs multiplicative zero

the "multiply" is cmult (complex multiply with montgomery)
even with cmult it does hundreds of thousands per second.

also, by keeping at the 320 bit polynomial form for the fmul, there is only the one fexpand per field element and one fcontract at the end

My guess is that the fmul will be speed competitive with sha256 of the merkle leaves

Blake2 can do apparently several hundred thousand hashes per second per thread roughly guesstimated for an Intel Xeon.

Since you have to hash each leaf before you multiply, then the multiplies need to be faster then the rest of the Merkel tree other than leaves. Also my other point (you quoted me before I edited my prior post) is that for the intermerdiary and recipient, the Merkel tree is more space efficient because not all the leaves have to provided with the proof.

So seems roughly on par speed wise but Merkel is more space efficient when only a portion of the leaves (i.e. the chunks) are being proved/provided to the recipient.

jl777

legendary

Activity: 1176

Merit: 1134

I added some benchmarking:

Code:

#include 

double milliseconds()
{
    struct timeval tv; double millis;
    gettimeofday(&tv,NULL);
    millis = ((double)tv.tv_sec * 1000. + (double)tv.tv_usec / 1000.);
    //printf("tv_sec.%ld usec.%d %f\n",tv.tv_sec,tv.tv_usec,millis);
    return(millis);
}

int main(int argc,const char * argv[])
{
    int32_t i,j,sblocksize,len; uint8_t *buf; bits320 prod; bits256 tmp,hash;
    struct sha256_vstate md; FILE *fp,*sblocks; double startmilli = milliseconds();
    memset(tmp.bytes,0,sizeof(tmp)), tmp.bytes[0] = 9;
    prod = fexpand(tmp); // start with valid field element. might want to use the unit element
    for (j=0; j<8; j++)
        hash.uints[j] = rand();
    for (i=0; i<10000000; i++)
    {
        hash.bytes[0] &= 0xf8, hash.bytes[31] &= 0x7f, hash.bytes[31] |= 0x40;
        prod = fmul(prod,fexpand(hash)); // multiple field element
        for (j=0; j<8; j++)
            hash.uints[j]++;
    }
    printf("%d expand/fmuls in %.3f milliseconds\n",i,(milliseconds() - startmilli));

on a 1.3Ghz Intel i5: 10000000 expand/fmuls in 2497.819 milliseconds
so 250 nanoseconds on an old laptop
On a server caliber machine, should be around 100 nanoseconds per iteration.

I dont think a quarter of a second overhead for 1TB of data and 1MB sblocks is any issue

James

jl777

legendary

Activity: 1176

Merit: 1134

Quote from: TPTB_need_war on February 17, 2016, 04:27:09 PM

jl777, if the originator signed the root hash of the Merkel tree when he provided it to the intermediary, then the intermediary can prove that any fragment(s) of the document was signed by the originator. The originator is the one who breaks the document up into words or characters at the leaves. A Merkel tree is a very efficient structure space wise if the granularity is very high.

I suppose yours of deterministic mapping each hash to a field element and multiplying all together (and the originator signs the product) is more space efficient, but it is I think roughly an order-of-magnitude slower. Why not add instead of multiply since adding is much faster (one assumes hashes are difficult to preimage)?

I just iterare over fixed size chunks of the file, do sha256 and map that hash to field element

in curve25519 field the fmul is the fast operator, millions per second, and is typically called the "addition". both addition and multiply are order independent with unit element and I avoid the zero, so no issues with additive zero vs multiplicative zero

the "multiply" is cmult (complex multiply with montgomery)
even with cmult it does hundreds of thousands per second.

also, by keeping at the 320 bit polynomial form for the fmul, there is only the one fexpand per field element and one fcontract at the end

My guess is that the fmul will be speed competitive with sha256 of the merkle leaves

James

TPTB_need_war

sr. member

Activity: 420

Merit: 262

jl777, if the originator signed the root hash of the Merkel tree when he provided it to the intermediary, then the intermediary can prove that any fragment(s) of the document was signed by the originator. The originator is the one who breaks the document up into words or characters at the leaves. A Merkel tree is a very efficient structure space wise if the granularity is very high.

I suppose yours of deterministic mapping each hash to a field element and multiplying all together (and the originator signs the product) ~~is more space efficient~~, but it is I think roughly an order-of-magnitude slower. Why not add instead of multiply since adding is much faster (one assumes hashes are difficult to preimage)?

Correction: the Merkel tree is also more space efficient (as well as being faster), because the intermediary doesn't specify the hashes for all the leaves when writing a proof.

jl777

legendary

Activity: 1176

Merit: 1134

Quote from: TPTB_need_war on February 17, 2016, 04:08:40 PM

Okay then jl777. Another way to do it is have the originator sign a hash which is a root of a Merkel tree. Then you can break the document into words or even individual characters if you want. And you can send any portions of the document and prove the originator signed those.

merkle tree adds complication, not much but compared to 300 lines, measurable. The 300 lines includes the SHA256 and the field multiplication.

but with a merkle tree not sure if it satisfies the requirements of this problem. The originator can be assumed to be non-cooperative and the only one with the entire dataset. If the data is molested I would think we would need to determine if it was the intermediary or the final recipient that modified the data.

How can the intermediary who doesnt have the data use a merkle tree to verify that the final recipient has the right sblock? Since the merkle tree only verifies that all leaves are correct, it seems all we find out is that the data was corrupted, but not who corrupted it. Maybe that is enough if all parties are having signed protocol when the sblocks are delivered.

anyway, interesting problem. need feedback from the OP whether requirements are met. Lunch break is over, back to debugging I go

James

TPTB_need_war

sr. member

Activity: 420

Merit: 262

Okay then jl777. Another way to do it is have the originator sign a hash which is a root of a Merkel tree. Then you can break the document into words or even individual characters if you want. And you can send any portions of the document and prove the originator signed those.

jl777

legendary

Activity: 1176

Merit: 1134

Quote from: TPTB_need_war on February 17, 2016, 03:49:41 PM

Isn't the simple solution to include the signature of the entire document with each chunk you forward. If the signer claims he provided different text, he will need to produce a document which matches the hash of what he signed, but he can't do that and also lie.

There are usually simpler solutions. Just think out of the box and paradigm shift the problem.

I think the issue is that the originator of the data might be antagonistic and not cooperate and put the burden on the intermediary, but it is a good idea to include the signature in the hash of each sblock.

I think just using field multiplication of N different hashes for each sblock will allow to achieve whatever confidence level about collisions, at the cost of more space for this though. Using curve25519 field multiplication is VERY fast. Probably will do more than a million per second on a single core. The limitation will probably be HDD speeds loading the data or the sha256 calcs

James

P.S. Not sure it can get much simpler than 300 lines of C code...

TPTB_need_war

sr. member

Activity: 420

Merit: 262

Isn't the simple solution to include the signature of the entire document with each chunk you forward. If the signer claims he provided different text, he will need to produce a document which matches the hash of what he signed, but he can't do that and also lie.

There are usually simpler solutions. Just think out of the box and paradigm shift the problem.

jl777

legendary

Activity: 1176

Merit: 1134

I think the following is a relatively simple way to achieve your goals (at least pretty close to it). It requires the data provider to know how you are splitting up the data, at least at some level of granularity. Each of these blocks of data cant be changed, but you could mix and match any combination of the granular block.

So if you have a terabyte total data, then say 1MB is the smallest block. you would need 1 million hashes, but with a 30,000:1 reduction in size, maybe it is acceptable. It is a tradeoff decision that doesnt affect the calculation. The larger the smallest block (lets call it a sblock) the less flexibility to distribute it, the smaller the sblock the more flexibility, but also more elements need to be stored.

The data creator will need to know the size of the sblock and apply the same calculations as you do and then sign it so you have proof that is what was delivered.

For each sblock worth of data (we assume the last block is zerofilled) do SHA256 hash and then treat that as a curve25519 field element. Five bits will need to be hard coded, but there are still 251 bits and odds of collision are extremely small.

The following code assumes a 64bit CPU and should be close to a working sblocks calculation:

Code:

/******************************************************************************
* Copyright © 2014-2016 The SuperNET Developers. *
* *
* See the AUTHORS, DEVELOPER-AGREEMENT and LICENSE files at *
* the top-level directory of this distribution for the individual copyright *
* holder information and the developer policies on copyright and licensing. *
* *
* Unless otherwise agreed in a custom licensing agreement, no part of the *
* SuperNET software, including this file may be copied, modified, propagated *
* or distributed except according to the terms contained in the LICENSE file *
* *
* Removal or modification of this copyright notice is prohibited. *
* *
******************************************************************************/

#include
#include
#include
#include

struct sha256_vstate { uint64_t length; uint32_t state[8],curlen; uint8_t buf[64]; };

union _bits256 { uint8_t bytes[32]; uint16_t ushorts[16]; uint32_t uints[8]; uint64_t ulongs[4]; uint64_t txid; };
typedef union _bits256 bits256;

union _bits320 { uint8_t bytes[40]; uint16_t ushorts[20]; uint32_t uints[10]; uint64_t ulongs[5]; uint64_t txid; };
typedef union _bits320 bits320;

// special gcc mode for 128-bit integers
typedef unsigned uint128_t __attribute__((mode(TI)));

// sha256 is ported from libtom, the curve25519 is refactored donna_curve25519.c

#define STORE32L(x, y) \
{ (y)[3] = (uint8_t)(((x)>>24)&255); (y)[2] = (uint8_t)(((x)>>16)&255); \
(y)[1] = (uint8_t)(((x)>>8)&255); (y)[0] = (uint8_t)((x)&255); }

#define LOAD32L(x, y) \
{ x = (uint32_t)(((uint64_t)((y)[3] & 255)<<24) | \
((uint32_t)((y)[2] & 255)<<16) | \
((uint32_t)((y)[1] & 255)<<8) | \
((uint32_t)((y)[0] & 255))); }

#define STORE64L(x, y) \
{ (y)[7] = (uint8_t)(((x)>>56)&255); (y)[6] = (uint8_t)(((x)>>48)&255); \
(y)[5] = (uint8_t)(((x)>>40)&255); (y)[4] = (uint8_t)(((x)>>32)&255); \
(y)[3] = (uint8_t)(((x)>>24)&255); (y)[2] = (uint8_t)(((x)>>16)&255); \
(y)[1] = (uint8_t)(((x)>>8)&255); (y)[0] = (uint8_t)((x)&255); }

#define LOAD64L(x, y) \
{ x = (((uint64_t)((y)[7] & 255))<<56)|(((uint64_t)((y)[6] & 255))<<48)| \
(((uint64_t)((y)[5] & 255))<<40)|(((uint64_t)((y)[4] & 255))<<32)| \
(((uint64_t)((y)[3] & 255))<<24)|(((uint64_t)((y)[2] & 255))<<16)| \
(((uint64_t)((y)[1] & 255))<<8)|(((uint64_t)((y)[0] & 255))); }

#define STORE32H(x, y) \
{ (y)[0] = (uint8_t)(((x)>>24)&255); (y)[1] = (uint8_t)(((x)>>16)&255); \
(y)[2] = (uint8_t)(((x)>>8)&255); (y)[3] = (uint8_t)((x)&255); }

#define LOAD32H(x, y) \
{ x = (uint32_t)(((uint64_t)((y)[0] & 255)<<24) | \
((uint32_t)((y)[1] & 255)<<16) | \
((uint32_t)((y)[2] & 255)<<8) | \
((uint32_t)((y)[3] & 255))); }

#define STORE64H(x, y) \
{ (y)[0] = (uint8_t)(((x)>>56)&255); (y)[1] = (uint8_t)(((x)>>48)&255); \
(y)[2] = (uint8_t)(((x)>>40)&255); (y)[3] = (uint8_t)(((x)>>32)&255); \
(y)[4] = (uint8_t)(((x)>>24)&255); (y)[5] = (uint8_t)(((x)>>16)&255); \
(y)[6] = (uint8_t)(((x)>>8)&255); (y)[7] = (uint8_t)((x)&255); }

#define LOAD64H(x, y) \
{ x = (((uint64_t)((y)[0] & 255))<<56)|(((uint64_t)((y)[1] & 255))<<48) | \
(((uint64_t)((y)[2] & 255))<<40)|(((uint64_t)((y)[3] & 255))<<32) | \
(((uint64_t)((y)[4] & 255))<<24)|(((uint64_t)((y)[5] & 255))<<16) | \
(((uint64_t)((y)[6] & 255))<<8)|(((uint64_t)((y)[7] & 255))); }

// Various logical functions
#define RORc(x, y) ( ((((uint32_t)(x)&0xFFFFFFFFUL)>>(uint32_t)((y)&31)) | ((uint32_t)(x)<<(uint32_t)(32-((y)&31)))) & 0xFFFFFFFFUL)
#define Ch(x,y,z) (z ^ (x & (y ^ z)))
#define Maj(x,y,z) (((x | y) & z) | (x & y))
#define S(x, n) RORc((x),(n))
#define R(x, n) (((x)&0xFFFFFFFFUL)>>(n))
#define Sigma0(x) (S(x, 2) ^ S(x, 13) ^ S(x, 22))
#define Sigma1(x) (S(x, 6) ^ S(x, 11) ^ S(x, 25))
#define Gamma0(x) (S(x, 7) ^ S(x, 18) ^ R(x, 3))
#define Gamma1(x) (S(x, 17) ^ S(x, 19) ^ R(x, 10))
#define MIN(x, y) ( ((x)<(y))?(x):(y) )

int32_t sha256_vcompress(struct sha256_vstate * md,uint8_t *buf)
{
   uint32_t S[8],W[64],t0,t1,i;
   for (i=0; i<8; i++) // copy state into S
   S[i] = md->state[i];
   for (i=0; i<16; i++) // copy the state into 512-bits into W[0..15]
   LOAD32H(W[i],buf + (4*i));
   for (i=16; i<64; i++) // fill W[16..63]
   W[i] = Gamma1(W[i - 2]) + W[i - 7] + Gamma0(W[i - 15]) + W[i - 16];

#define RND(a,b,c,d,e,f,g,h,i,ki) \
t0 = h + Sigma1(e) + Ch(e, f, g) + ki + W[i]; \
t1 = Sigma0(a) + Maj(a, b, c); \
d += t0; \
h = t0 + t1;

   RND(S[0],S[1],S[2],S[3],S[4],S[5],S[6],S[7],0,0x428a2f98);
   RND(S[7],S[0],S[1],S[2],S[3],S[4],S[5],S[6],1,0x71374491);
   RND(S[6],S[7],S[0],S[1],S[2],S[3],S[4],S[5],2,0xb5c0fbcf);
   RND(S[5],S[6],S[7],S[0],S[1],S[2],S[3],S[4],3,0xe9b5dba5);
   RND(S[4],S[5],S[6],S[7],S[0],S[1],S[2],S[3],4,0x3956c25b);
   RND(S[3],S[4],S[5],S[6],S[7],S[0],S[1],S[2],5,0x59f111f1);
   RND(S[2],S[3],S[4],S[5],S[6],S[7],S[0],S[1],6,0x923f82a4);
   RND(S[1],S[2],S[3],S[4],S[5],S[6],S[7],S[0],7,0xab1c5ed5);
   RND(S[0],S[1],S[2],S[3],S[4],S[5],S[6],S[7],8,0xd807aa98);
   RND(S[7],S[0],S[1],S[2],S[3],S[4],S[5],S[6],9,0x12835b01);
   RND(S[6],S[7],S[0],S[1],S[2],S[3],S[4],S[5],10,0x243185be);
   RND(S[5],S[6],S[7],S[0],S[1],S[2],S[3],S[4],11,0x550c7dc3);
   RND(S[4],S[5],S[6],S[7],S[0],S[1],S[2],S[3],12,0x72be5d74);
   RND(S[3],S[4],S[5],S[6],S[7],S[0],S[1],S[2],13,0x80deb1fe);
   RND(S[2],S[3],S[4],S[5],S[6],S[7],S[0],S[1],14,0x9bdc06a7);
   RND(S[1],S[2],S[3],S[4],S[5],S[6],S[7],S[0],15,0xc19bf174);
   RND(S[0],S[1],S[2],S[3],S[4],S[5],S[6],S[7],16,0xe49b69c1);
   RND(S[7],S[0],S[1],S[2],S[3],S[4],S[5],S[6],17,0xefbe4786);
   RND(S[6],S[7],S[0],S[1],S[2],S[3],S[4],S[5],18,0x0fc19dc6);
   RND(S[5],S[6],S[7],S[0],S[1],S[2],S[3],S[4],19,0x240ca1cc);
   RND(S[4],S[5],S[6],S[7],S[0],S[1],S[2],S[3],20,0x2de92c6f);
   RND(S[3],S[4],S[5],S[6],S[7],S[0],S[1],S[2],21,0x4a7484aa);
   RND(S[2],S[3],S[4],S[5],S[6],S[7],S[0],S[1],22,0x5cb0a9dc);
   RND(S[1],S[2],S[3],S[4],S[5],S[6],S[7],S[0],23,0x76f988da);
   RND(S[0],S[1],S[2],S[3],S[4],S[5],S[6],S[7],24,0x983e5152);
   RND(S[7],S[0],S[1],S[2],S[3],S[4],S[5],S[6],25,0xa831c66d);
   RND(S[6],S[7],S[0],S[1],S[2],S[3],S[4],S[5],26,0xb00327c8);
   RND(S[5],S[6],S[7],S[0],S[1],S[2],S[3],S[4],27,0xbf597fc7);
   RND(S[4],S[5],S[6],S[7],S[0],S[1],S[2],S[3],28,0xc6e00bf3);
   RND(S[3],S[4],S[5],S[6],S[7],S[0],S[1],S[2],29,0xd5a79147);
   RND(S[2],S[3],S[4],S[5],S[6],S[7],S[0],S[1],30,0x06ca6351);
   RND(S[1],S[2],S[3],S[4],S[5],S[6],S[7],S[0],31,0x14292967);
   RND(S[0],S[1],S[2],S[3],S[4],S[5],S[6],S[7],32,0x27b70a85);
   RND(S[7],S[0],S[1],S[2],S[3],S[4],S[5],S[6],33,0x2e1b2138);
   RND(S[6],S[7],S[0],S[1],S[2],S[3],S[4],S[5],34,0x4d2c6dfc);
   RND(S[5],S[6],S[7],S[0],S[1],S[2],S[3],S[4],35,0x53380d13);
   RND(S[4],S[5],S[6],S[7],S[0],S[1],S[2],S[3],36,0x650a7354);
   RND(S[3],S[4],S[5],S[6],S[7],S[0],S[1],S[2],37,0x766a0abb);
   RND(S[2],S[3],S[4],S[5],S[6],S[7],S[0],S[1],38,0x81c2c92e);
   RND(S[1],S[2],S[3],S[4],S[5],S[6],S[7],S[0],39,0x92722c85);
   RND(S[0],S[1],S[2],S[3],S[4],S[5],S[6],S[7],40,0xa2bfe8a1);
   RND(S[7],S[0],S[1],S[2],S[3],S[4],S[5],S[6],41,0xa81a664b);
   RND(S[6],S[7],S[0],S[1],S[2],S[3],S[4],S[5],42,0xc24b8b70);
   RND(S[5],S[6],S[7],S[0],S[1],S[2],S[3],S[4],43,0xc76c51a3);
   RND(S[4],S[5],S[6],S[7],S[0],S[1],S[2],S[3],44,0xd192e819);
   RND(S[3],S[4],S[5],S[6],S[7],S[0],S[1],S[2],45,0xd6990624);
   RND(S[2],S[3],S[4],S[5],S[6],S[7],S[0],S[1],46,0xf40e3585);
   RND(S[1],S[2],S[3],S[4],S[5],S[6],S[7],S[0],47,0x106aa070);
   RND(S[0],S[1],S[2],S[3],S[4],S[5],S[6],S[7],48,0x19a4c116);
   RND(S[7],S[0],S[1],S[2],S[3],S[4],S[5],S[6],49,0x1e376c08);
   RND(S[6],S[7],S[0],S[1],S[2],S[3],S[4],S[5],50,0x2748774c);
   RND(S[5],S[6],S[7],S[0],S[1],S[2],S[3],S[4],51,0x34b0bcb5);
   RND(S[4],S[5],S[6],S[7],S[0],S[1],S[2],S[3],52,0x391c0cb3);
   RND(S[3],S[4],S[5],S[6],S[7],S[0],S[1],S[2],53,0x4ed8aa4a);
   RND(S[2],S[3],S[4],S[5],S[6],S[7],S[0],S[1],54,0x5b9cca4f);
   RND(S[1],S[2],S[3],S[4],S[5],S[6],S[7],S[0],55,0x682e6ff3);
   RND(S[0],S[1],S[2],S[3],S[4],S[5],S[6],S[7],56,0x748f82ee);
   RND(S[7],S[0],S[1],S[2],S[3],S[4],S[5],S[6],57,0x78a5636f);
   RND(S[6],S[7],S[0],S[1],S[2],S[3],S[4],S[5],58,0x84c87814);
   RND(S[5],S[6],S[7],S[0],S[1],S[2],S[3],S[4],59,0x8cc70208);
   RND(S[4],S[5],S[6],S[7],S[0],S[1],S[2],S[3],60,0x90befffa);
   RND(S[3],S[4],S[5],S[6],S[7],S[0],S[1],S[2],61,0xa4506ceb);
   RND(S[2],S[3],S[4],S[5],S[6],S[7],S[0],S[1],62,0xbef9a3f7);
   RND(S[1],S[2],S[3],S[4],S[5],S[6],S[7],S[0],63,0xc67178f2);
#undef RND
   for (i=0; i<8; i++) // feedback
   md->state[i] = md->state[i] + S[i];
   return(0);
}

#undef RORc
#undef Ch
#undef Maj
#undef S
#undef R
#undef Sigma0
#undef Sigma1
#undef Gamma0
#undef Gamma1

void sha256_vinit(struct sha256_vstate * md)
{
   md->curlen = 0;
   md->length = 0;
   md->state[0] = 0x6A09E667UL;
   md->state[1] = 0xBB67AE85UL;
   md->state[2] = 0x3C6EF372UL;
   md->state[3] = 0xA54FF53AUL;
   md->state[4] = 0x510E527FUL;
   md->state[5] = 0x9B05688CUL;
   md->state[6] = 0x1F83D9ABUL;
   md->state[7] = 0x5BE0CD19UL;
}

int32_t sha256_vprocess(struct sha256_vstate *md,const uint8_t *in,uint64_t inlen)
{
   uint64_t n; int32_t err;
   if ( md->curlen > sizeof(md->buf) )
   return(-1);
   while ( inlen > 0 )
   {
   if ( md->curlen == 0 && inlen >= 64 )
   {
   if ( (err= sha256_vcompress(md,(uint8_t *)in)) != 0 )
   return(err);
   md->length += 64 * 8, in += 64, inlen -= 64;
   }
   else
   {
   n = MIN(inlen,64 - md->curlen);
   memcpy(md->buf + md->curlen,in,(size_t)n);
   md->curlen += n, in += n, inlen -= n;
   if ( md->curlen == 64 )
   {
   if ( (err= sha256_vcompress(md,md->buf)) != 0 )
   return(err);
   md->length += 8*64;
   md->curlen = 0;
   }
   }
   }
   return(0);
}

int32_t sha256_vdone(struct sha256_vstate *md,uint8_t *out)
{
   int32_t i;
   if ( md->curlen >= sizeof(md->buf) )
   return(-1);
   md->length += md->curlen * 8; // increase the length of the message
   md->buf[md->curlen++] = (uint8_t)0x80; // append the '1' bit
   // if len > 56 bytes we append zeros then compress. Then we can fall back to padding zeros and length encoding like normal.
   if ( md->curlen > 56 )
   {
   while ( md->curlen < 64 )
   md->buf[md->curlen++] = (uint8_t)0;
   sha256_vcompress(md,md->buf);
   md->curlen = 0;
   }
   while ( md->curlen < 56 ) // pad upto 56 bytes of zeroes
   md->buf[md->curlen++] = (uint8_t)0;
   STORE64H(md->length,md->buf+56); // store length
   sha256_vcompress(md,md->buf);
   for (i=0; i<8; i++) // copy output
   STORE32H(md->state[i],out+(4*i));
   return(0);
}

void store_limb(uint8_t *out,uint64_t in)
{
   int32_t i;
   for (i=0; i<8; i++,in>>=8)
   out[i] = (in & 0xff);
}

uint64_t load_limb(uint8_t *in)
{
   return
   ((uint64_t)in[0]) |
   (((uint64_t)in[1]) << 8) |
   (((uint64_t)in[2]) << 16) |
   (((uint64_t)in[3]) << 24) |
   (((uint64_t)in[4]) << 32) |
   (((uint64_t)in[5]) << 40) |
   (((uint64_t)in[6]) << 48) |
   (((uint64_t)in[7]) << 56);
}

// Take a little-endian, 32-byte number and expand it into polynomial form
bits320 fexpand(bits256 basepoint)
{
   bits320 out;
   out.ulongs[0] = load_limb(basepoint.bytes) & 0x7ffffffffffffLL;
   out.ulongs[1] = (load_limb(basepoint.bytes+6) >> 3) & 0x7ffffffffffffLL;
   out.ulongs[2] = (load_limb(basepoint.bytes+12) >> 6) & 0x7ffffffffffffLL;
   out.ulongs[3] = (load_limb(basepoint.bytes+19) >> 1) & 0x7ffffffffffffLL;
   out.ulongs[4] = (load_limb(basepoint.bytes+24) >> 12) & 0x7ffffffffffffLL;
   return(out);
}

void fcontract_iter(uint128_t t[5],int32_t flag)
{
   int32_t i; uint64_t mask = 0x7ffffffffffffLL;
   for (i=0; i<4; i++)
   t[i+1] += t[i] >> 51, t[i] &= mask;
   if ( flag != 0 )
   t[0] += 19 * (t[4] >> 51); t[4] &= mask;
}

// Take a fully reduced polynomial form number and contract it into a little-endian, 32-byte array
bits256 fcontract(const bits320 input)
{
   uint128_t t[5]; int32_t i; bits256 out;
   for (i=0; i<5; i++)
   t[i] = input.ulongs[i];
   fcontract_iter(t,1), fcontract_iter(t,1);
   // donna: now t is between 0 and 2^255-1, properly carried.
   // donna: case 1: between 0 and 2^255-20. case 2: between 2^255-19 and 2^255-1.
   t[0] += 19, fcontract_iter(t,1);
   // now between 19 and 2^255-1 in both cases, and offset by 19.
   t[0] += 0x8000000000000 - 19;
   for (i=1; i<5; i++)
   t[i] += 0x8000000000000 - 1;
   // now between 2^255 and 2^256-20, and offset by 2^255.
   fcontract_iter(t,0);
   store_limb(out.bytes,t[0] | (t[1] << 51));
   store_limb(out.bytes+8,(t[1] >> 13) | (t[2] << 38));
   store_limb(out.bytes+16,(t[2] >> 26) | (t[3] << 25));
   store_limb(out.bytes+24,(t[3] >> 39) | (t[4] << 12));
   return(out);
}

// Multiply two numbers: output = in2 * in
// output must be distinct to both inputs. The inputs are reduced coefficient form, the output is not.
// Assumes that in[i] < 2**55 and likewise for in2. On return, output[i] < 2**52
bits320 fmul(const bits320 in2,const bits320 in)
{
   uint128_t t[5]; uint64_t r0,r1,r2,r3,r4,s0,s1,s2,s3,s4,c; bits320 out;
   r0 = in.ulongs[0], r1 = in.ulongs[1], r2 = in.ulongs[2], r3 = in.ulongs[3], r4 = in.ulongs[4];
   s0 = in2.ulongs[0], s1 = in2.ulongs[1], s2 = in2.ulongs[2], s3 = in2.ulongs[3], s4 = in2.ulongs[4];
   t[0] = ((uint128_t) r0) * s0;
   t[1] = ((uint128_t) r0) * s1 + ((uint128_t) r1) * s0;
   t[2] = ((uint128_t) r0) * s2 + ((uint128_t) r2) * s0 + ((uint128_t) r1) * s1;
   t[3] = ((uint128_t) r0) * s3 + ((uint128_t) r3) * s0 + ((uint128_t) r1) * s2 + ((uint128_t) r2) * s1;
   t[4] = ((uint128_t) r0) * s4 + ((uint128_t) r4) * s0 + ((uint128_t) r3) * s1 + ((uint128_t) r1) * s3 + ((uint128_t) r2) * s2;
   r4 *= 19, r1 *= 19, r2 *= 19, r3 *= 19;
   t[0] += ((uint128_t) r4) * s1 + ((uint128_t) r1) * s4 + ((uint128_t) r2) * s3 + ((uint128_t) r3) * s2;
   t[1] += ((uint128_t) r4) * s2 + ((uint128_t) r2) * s4 + ((uint128_t) r3) * s3;
   t[2] += ((uint128_t) r4) * s3 + ((uint128_t) r3) * s4;
   t[3] += ((uint128_t) r4) * s4;
   r0 = (uint64_t)t[0] & 0x7ffffffffffffLL; c = (uint64_t)(t[0] >> 51);
   t[1] += c; r1 = (uint64_t)t[1] & 0x7ffffffffffffLL; c = (uint64_t)(t[1] >> 51);
   t[2] += c; r2 = (uint64_t)t[2] & 0x7ffffffffffffLL; c = (uint64_t)(t[2] >> 51);
   t[3] += c; r3 = (uint64_t)t[3] & 0x7ffffffffffffLL; c = (uint64_t)(t[3] >> 51);
   t[4] += c; r4 = (uint64_t)t[4] & 0x7ffffffffffffLL; c = (uint64_t)(t[4] >> 51);
   r0 += c * 19; c = r0 >> 51; r0 = r0 & 0x7ffffffffffffLL;
   r1 += c; c = r1 >> 51; r1 = r1 & 0x7ffffffffffffLL;
   r2 += c;
   out.ulongs[0] = r0, out.ulongs[1] = r1, out.ulongs[2] = r2, out.ulongs[3] = r3, out.ulongs[4] = r4;
   return(out);
}

int main(int argc,const char * argv[])
{
   int32_t sblocksize,len; uint8_t *buf; bits320 prod; bits256 tmp,hash;
   struct sha256_vstate md; FILE *fp,*sblocks;
   if ( argc < 2 )
   printf("usage: %s datafile [sblocksize]\n",argv[0]);
   else if ( argc > 3 && (sblocksize= atoi(argv[2])) < 32 )
   printf("usage: %s datafile [sblocksize] # min sblocksize is 32\n",argv[0]);
   else if ( (fp= fopen(argv[1],"rb")) == 0 )
   printf("usage: %s datafile [sblocksize] # cant find %s\n",argv[0],argv[1]);
   else
   {
   if ( (sblocks= fopen("sblocks","wb")) == 0 )
   {
   printf("error creating sblocks output file\n");
   exit(-1);
   }
   if ( sblocksize == 0 )
   sblocksize = 1024*1024;
   printf("calculating sblocks for %s\n",argv[1]);
   memset(tmp.bytes,0,sizeof(tmp)), tmp.bytes[0] = 9;
   prod = fexpand(tmp); // start with valid field element. might want to use the unit element
   buf = calloc(1,sblocksize);
   while ( (len= (int32_t)fread(buf,1,sblocksize,fp)) > 0 )
   {
   sha256_vinit(&md), sha256_vprocess(&md,buf,sblocksize), sha256_vdone(&md,hash.bytes);
   fwrite(hash.bytes,1,sizeof(hash),sblocks); // save sha256 of sblock
   hash.bytes[0] &= 0xf8, hash.bytes[31] &= 0x7f, hash.bytes[31] |= 0x40;
   prod = fmul(prod,fexpand(hash)); // multiple field element
   memset(buf,0,sblocksize);
   }
   tmp = fcontract(prod);
   fwrite(tmp.bytes,1,sizeof(tmp),sblocks); // this is the value of all the data
   free(buf);
   fclose(fp);
   fclose(sblocks);
   return(0);
   }
   return(-1);
}

I didnt have a chance to verify it, but all it is doing is calculating SHA256 hash of each sblock, mapping it to a field element and multiplying it for a combined value. The field operations work at the 320 bit size so the 256 bit hash needs to be expanded and contracted.

Since it is a purely field multiplication, it has no other protections, signatures and whatnot will need to be added on top of this. Note that it is unordered as the fmul is just a pure multiply, if you need it to be ordered, then just include the sequence number into the sha256 calculation.

It is totally self contained portable C so save to a file, then gcc it to make the executable. Output is an sblocks file that has the corresponding hash for each sblock along with the total at the end.

Hopefully it works for what you need. It is just quick proof of concept code, but if any bit is changed in any of the sblocks, the hash will change which will change the combined product. By committing to the sblock hashes, you can prove that you didnt corrupt the data you got as long as the data provider also runs the same program and signs it. Any subset that is delivered can be verified independently to match your posted set of sblock hashes.

I think that satisfies your requirement?

James

P.S. If chance of collision (around 1 in 2^120) is too high, then a second hash can be used to create two elements per sblock. Odds of two different hash collisions with the same data is vanishingly small.

jl777

legendary

Activity: 1176

Merit: 1134

I think the hashes need to have some sort of algebraic structure and not just be random, so Pederson commitments seem to be part of the solution.

The part that I am unclear on is if you are not storing the data, but at some point need to provide all the data, that seems mutually exclusive.

But if you just need to be able to prove you didnt change the data, then by showing that your hashes add up to zero and it matches the combined hash from the data source, this seems possible

The data source would need to sign the combined hash.

http://crypto.stackexchange.com/questions/11923/difference-between-pedersen-commitment-and-commitment-based-on-elgamal?rq=1

Also, it seems unrelated, but the maths behind the Dining Cryptographers might be helpful.

James

spartacusrex

hero member

Activity: 718

Merit: 545

Onkel - I like.

Quote from: OnkelPaul on February 11, 2016, 04:28:06 PM

2. Data recipients could claim that you colluded with the recipient of the last block so that he signs an untrue statement about the final hash value. Since the data integrity is only proven when the final hash of the last block is equal to the hash signed by the original data source..

How about, if he kept the complete last block of data, as well as the last signed statement, so it would not be possible for him to cheat. Since he can show that it does actually hash to the final hash which is signed by Alice. And he can't change that block of data without performing a successful pre-image attack.

Actually I'm not sure how much of the final block of data he would need to keep. Maybe just the final 64 bytes. Since that alone would also require a pre-image attack to break?

OnkelPaul

legendary

Activity: 1039

Merit: 1005

What about the following approach:
Assume that you split the data into relatively few chunks without reordering parts, and that the original source of the data computed and signed a SHA-256 hash over the whole data.
Then you could run SHA-256 over the whole dataset, recording the intermediate state of the hashing algorithm wherever you split the data.
SHA-256 runs on 512-bit blocks, so if you split at a block boundary, you only need to note the values in the hashing registers at that point.
If you split the data within a block, you need to note the hashing register values before the block, plus the 512 bit block itself.
Each data recipient receives the states before and after his block in addition to the block data, so they can check whether their pieces are ok. They need to provide you with a signed statement about this fact.
Now you only have to archive these signed statements and the block boundary states of the hashing engine. Since the last boundary state is equal to the hash signed by the original author, these pieces together can be used to prove that you did not tamper with the data.

I see two possible issues with this approach, though:
1. You would disclose a little more of the data to each recipient since you have to cut at 512-bit boundaries. I don't know whether that's a problem, or whether you can nudge your data source to deliver the data in 512-bit chunks, for example by padding each record to a multiple of 64 bytes.
2. Data recipients could claim that you colluded with the recipient of the last block so that he signs an untrue statement about the final hash value. Since the data integrity is only proven when the final hash of the last block is equal to the hash signed by the original data source, this might be a problem. It is also possible to collude with a recipient that got an earlier blocks, but you could cheat only on recipients before the one you colluded with.

I don't know whether there's an easy way of fixing the last issue, and maybe there are more issues than I see at the moment :-(

Onkel Paul

spartacusrex

hero member

Activity: 718

Merit: 545

Quote from: DannyHamilton on February 10, 2016, 09:07:58 AM

By the way, when you say gmax, do you mean gmaxwell? And what is a CT?
(Honestly, I was really hoping that gmaxwell would see and take an interest in this thread. I suspect that if there is a solution to my problem, he knows exactly what it is.)

Yep.

CT is Confidential Transactions. It's his system for showing that the sum of the inputs of a txn add up to the sum of the outputs, without revealing any of the values. He's implemented it in his Elements sidechain.

https://bitcointalksearch.org/topic/confidential-transactions-content-privacy-for-bitcoin-transactions-1085273

And the bit that I thought might be helpful to you is :

Quote

..
A Pedersen commitment works like the above but with an additional property: commitments can be added, and the sum of a set of commitments is the same as a commitment to the sum of the data (with a blinding key set as the sum of the blinding keys):

C(BF1, data1) + C(BF2, data2) == C(BF1 + BF2, data1 + data2) C(BF1, data1) - C(BF1, data1) == 0

In other words, the commitment preserves addition and the commutative property applies.

If data_n = {1,1,2} and BF_n = {5,10,15} then:

C(BF1, data1) + C(BF2, data2) - C(BF3, data3) == 0

and so on.

..

Yes he would. He's bad ass.

DannyHamilton

legendary

Activity: 3528

Merit: 4945

Quote from: spartacusrex on February 10, 2016, 06:34:47 AM

What about some kind of homomorphic hash functon ?

Hash(X+Y) = Hash(X) + Hash(Y) (Doesn't Paillier do something like this.. ? Or whatever hash function gmax uses in his CT.)

Then Alice signs the Hash(X+Y+...) and you can show that the sum of all your little hashed pieces, adds up to the hash of the big piece. Which is signed by Alice.

This is the reason I started this thread.

I've not heard of "Pallier", but I'm going to go Google it right now.

I believe that what I'm looking for is exactly what you're describing where:
Hash(X+Y+...) = Hash(X) + Hash(Y) + Hash(...)

I didn't know if such a hash existed, but if it does I think that would be exactly what I need.

By the way, when you say gmax, do you mean gmaxwell? And what is a CT?
(Honestly, I was really hoping that gmaxwell would see and take an interest in this thread. I suspect that if there is a solution to my problem, he knows exactly what it is.)

spartacusrex

hero member

Activity: 718

Merit: 545

What about some kind of homomorphic hash functon ?

Hash(X+Y) = Hash(X) + Hash(Y) (Doesn't Paillier do something like this.. ? Or whatever hash function gmax uses in his CT.)

Then Alice signs the Hash(X+Y+...) and you can show that the sum of all your little hashed pieces, adds up to the hash of the big piece. Which is signed by Alice.

Topic: Non-bitcoin cryptography question (Read 2328 times)