SER_DISK vs SER_NETWORK | Bitcointalksearch.org

Mike Hearn

legendary

Activity: 1526

Merit: 1134

Sure, contributing back the blk parser would be welcome.

My concern with your heuristics is not that they will always be wrong (they won't), but people will use whatever statistics you come up with to make judgements or even investment decisions, without understanding the quite serious caveats that go along with your methodologies. See: the Silk Road study, which is now being quoted as fact in various news sources despite that it was based on a VERY shaky set of assumptions.

Sergio_Demian_Lerner

hero member

Activity: 555

Merit: 654

Quote from: apetersson on September 27, 2012, 07:58:20 PM

i would rather postpone the decision about "what was change" and merge addresses to entities. once you see a transaction that signs multiple inputs at once, you can "assume" that it was one entity and assign change status retroactively

Interesting. What would cover all spent outputs...
Also I can use that information to validate the naive method I suggested, and see the false positives/negatives ratio.

apetersson

hero member

Activity: 668

Merit: 501

i would rather postpone the decision about "what was change" and merge addresses to entities. once you see a transaction that signs multiple inputs at once, you can "assume" that it was one entity and assign change status retroactively

SgtSpike

legendary

Activity: 1400

Merit: 1005

Quote from: Sergio_Demian_Lerner on September 27, 2012, 01:44:08 PM

Quote from: Pieter Wuille on September 25, 2012, 05:17:08 PM

That assumption will be wrong 50% of the time...

Yes! I forgot the change position randomization!

But still it's generally possible to guess which output is the change, since :

1. The payment amount is always greater than the sum of inputs amounts, with the exception of the input amount of lesser value.
3. The change amount is always smaller than any of the inputs.

The only case where this guessing fails is when there is a single input amount. In this case, generally the payment amount is an integer value, and the change is not, so you still can guess with some accuracy.

Best regards Pieter!

I'm glad someone finally is doing analysis based on these assumptions! No, they aren't exact, but they would be generally pretty close. I am excited to see what you come up with.

Quote from: Mike Hearn on September 27, 2012, 02:25:04 PM

Are you planning to contribute your blkdat parser back to bitcoinj? It sounds useful!

I believe your assumptions are still incorrect:

1) You cannot assume anything about the size of a change address, nothing says it has to be smaller than the payment and often it won't be

2) You cannot assume payments are round numbers as often they will have been converted through an exchange rate. For instance many payments I make look like essentially random numbers because they are some round figure of my local currency multiplied by the exchange rate at the time.

Block chain analysis is hard, I doubt there is an accurate way to calculate what you want.

He did say "guess". And this is certainly a good way to get it right a vast majority of the time.

kjj

legendary

Activity: 1302

Merit: 1026

Quote from: Sergio_Demian_Lerner on September 27, 2012, 04:31:47 PM

But you still have to know which addresses belongs to you, so there is the chicken and egg problem.

But that information, if ever found, then travels back in time and infects every transaction you've ever done, which is bad.

Sergio_Demian_Lerner

hero member

Activity: 555

Merit: 654

But you still have to know which addresses belongs to you, so there is the chicken and egg problem.

kjj

legendary

Activity: 1302

Merit: 1026

Quote from: Sergio_Demian_Lerner on September 27, 2012, 04:10:17 PM

Quote from: Mike Hearn on September 27, 2012, 02:25:04 PM

Are you planning to contribute your blkdat parser back to bitcoinj? It sounds useful!

Yes, if anyone wants it

Quote from: Mike Hearn on September 27, 2012, 02:25:04 PM

I believe your assumptions are still incorrect:

1) You cannot assume anything about the size of a change address, nothing says it has to be smaller than the payment and often it won't be

But no client would automatically generate a transaction where the change is greater than a transaction input? What for?

Do you mean something like (A):

Inputs: 10 , 20, 30
Outputs: 15 (change), 45 (payment)

Why not create the tx (B):

Input: 20 , 30
Output: 5 (change) ,45 (payment)

Is the client so dumb to generate a transaction like A which wastes space instead of B ?

In the future, I would like to see the client attempt to make outputs that are roughly equal in size, with equal probability of being higher or lower. Just to make it harder to guess. But that is hardly a promise of anonymity. Which one was the change will quickly be revealed when it is merged with another address known to belong to you, or with the change you sent before, or with another transaction sent to the same address as one of the inputs, or...

Sergio_Demian_Lerner

hero member

Activity: 555

Merit: 654

Quote from: Mike Hearn on September 27, 2012, 02:25:04 PM

Are you planning to contribute your blkdat parser back to bitcoinj? It sounds useful!

Yes, if anyone wants it

Quote from: Mike Hearn on September 27, 2012, 02:25:04 PM

I believe your assumptions are still incorrect:

1) You cannot assume anything about the size of a change address, nothing says it has to be smaller than the payment and often it won't be

But no client would automatically generate a transaction where the change is greater than a transaction input? What for?

Do you mean something like (A):

Inputs: 10 , 20, 30
Outputs: 15 (change), 45 (payment)

Why not create the tx (B):

Input: 20 , 30
Output: 5 (change) ,45 (payment)

Is the client so dumb to generate a transaction like A which wastes space instead of B ?

Mike Hearn

legendary

Activity: 1526

Merit: 1134

Are you planning to contribute your blkdat parser back to bitcoinj? It sounds useful!

I believe your assumptions are still incorrect:

1) You cannot assume anything about the size of a change address, nothing says it has to be smaller than the payment and often it won't be

2) You cannot assume payments are round numbers as often they will have been converted through an exchange rate. For instance many payments I make look like essentially random numbers because they are some round figure of my local currency multiplied by the exchange rate at the time.

Block chain analysis is hard, I doubt there is an accurate way to calculate what you want.

Sergio_Demian_Lerner

hero member

Activity: 555

Merit: 654

Quote from: Pieter Wuille on September 25, 2012, 05:17:08 PM

That assumption will be wrong 50% of the time...

Yes! I forgot the change position randomization!

But still it's generally possible to guess which output is the change, since :

1. The payment amount is always greater than the sum of inputs amounts, with the exception of the input amount of lesser value.
3. The change amount is always smaller than any of the inputs.

The only case where this guessing fails is when there is a single input amount. In this case, generally the payment amount is an integer value, and the change is not, so you still can guess with some accuracy.

Best regards Pieter!

Pieter Wuille

legendary

Activity: 1072

Merit: 1189

Quote from: Sergio_Demian_Lerner on September 25, 2012, 05:07:12 PM

(Note that I had to assume that the last output from a transaction is the change).

That assumption will be wrong 50% of the time...

Sergio_Demian_Lerner

hero member

Activity: 555

Merit: 654

Thanks! I finished implementing the blk0001.dat parser for Bitcoinj.

For me, it' was the most simpe way to get statistics out from the blockchain. Tomorrow I will post a histogram of average volume transacted depending on the amount range (eg. 0 to 1 BTC, 10 - 100 BTC, 100 to 1K BTC, etc.)
(Note that I had to assume that the last output from a transaction is the change).
This reveals interesting information regarding the average use.

If someone wants to experiment with it, send me a message.

Best regards,
Sergio.

kjj

legendary

Activity: 1302

Merit: 1026

I was just grepping through the source, and those enums get passed around a lot, but appear only to be consumed in the IMPLEMENT_SERIALIZE functions of various classes.

For example, CAddress::IMPLEMENT_SERIALIZE in protocol.h adds the nVersion and nTime if called with SER_DISK, but does not otherwise. The others look mostly similar.

SER_DISK only seems to be consumed in protocol.h and wallet.cpp, neither of which involve the block chain, so there should be no differences in the block format there.

If you want to go looking for them yourself, don't forget to also look for SER_GETHASH.

jgarzik

legendary

Activity: 1596

Merit: 1100

Quote from: Sergio_Demian_Lerner on September 25, 2012, 12:32:47 PM

In Satoshi client every object can be serialized either to disk or to network.
Nevertheless I haven't found any difference between the serialization of the blockchain for SER_DISK compared to SER_NETWORK.

What classes are sensitive to SER_* serialization ?

I'm writing a Bitcoinj class to read and process Satoshi blockchain (blk*.dat) files and I want to know if I should care about SET_* flags.

The python implementation pynode does not have any notion of serialization differences between the two, either. pynode successfully imports bitcoin-generated blk000?.dat files, as well as talking on the network.

Perhaps this was for future expansion? I would love to know any differences, myself.

Sergio_Demian_Lerner

hero member

Activity: 555

Merit: 654

In Satoshi client every object can be serialized either to disk or to network.
Nevertheless I haven't found any difference between the serialization of the blockchain for SER_DISK compared to SER_NETWORK.

What classes are sensitive to SER_* serialization ?

I'm writing a Bitcoinj class to read and process Satoshi blockchain (blk*.dat) files and I want to know if I should care about SET_* flags.

Thanks, Sergio.

Topic: SER_DISK vs SER_NETWORK (Read 1366 times)