Pages:
Author

Topic: Wallet Label Export Format: A Proposal by Craig Raw (Read 428 times)

legendary
Activity: 2870
Merit: 7490
Crypto Swap Exchange
Following discussion with several wallet developers, I have come to the conclusion that the secondary goal of managing labels in non-specialized applications must be sacrificed in order to achieve the primary goal of wide implementation across different wallets. While this tradeoff was perhaps inevitable, it was worth a try!

As such I have rewritten the specification to use JSON, specifically the JSON Lines format.

I understand some benefit of JSON Lines (JSONL) compared with JSON. But FWIW, Microsoft Excel (an example of non-specialized applications) support importing JSON file without macro.
hero member
Activity: 882
Merit: 5834
not your keys, not your coins!
The rewritten BIP can be found at https://github.com/craigraw/bips/blob/master/bip-wallet-labels.mediawiki

It is perhaps simplest to understand it by looking at an example export:
{ "type": "tx", "ref":"f91d0a8a78462bc59398f2c5d7a84fcff491c26ba54c4833478b202796c8aafd", "label": "Transaction" }
{ "type": "addr", "ref": "bc1q34aq5drpuwy3wgl9lhup9892qp6svr8ldzyy7c", "label": "Address" }
{ "type": "pubkey", "ref":"0283409659355b6d1cc3c32decd5d561abaac86c37a353b52895a5e6c196d6f448", "label": "Public Key" }
{ "type": "input", "ref":"f91d0a8a78462bc59398f2c5d7a84fcff491c26ba54c4833478b202796c8aafd:0", "label": "Input" }
{ "type": "output", "ref":"f91d0a8a78462bc59398f2c5d7a84fcff491c26ba54c4833478b202796c8aafd:1", "label": "Output" }
{ "type": "xpub", "ref":"xpub661MyMwAqRbcFtXgS5sYJABqqG9YLmC4Q1Rdap9gSE8Nq...", "label": "Extended Public Key" }
Looks good to me! Good luck with the BIP. Smiley

If you want, you can make a 3rd party JSON-Lines to CSV generator to interface with your original design goal of business editing of label spreadsheets.
Agreed, as written above!

I guess a purely software-to-software oriented format could also facilitate writing a simple script that generates an excel file or some other human-friendly representation.
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
(Copied from bitcoin-dev):

Following discussion with several wallet developers, I have come to the conclusion that the secondary goal of managing labels in non-specialized applications must be sacrificed in order to achieve the primary goal of wide implementation across different wallets. While this tradeoff was perhaps inevitable, it was worth a try!

Approach ACK. This revised BIP looks much more focused.

If you want, you can make a 3rd party JSON-Lines to CSV generator to interface with your original design goal of business editing of label spreadsheets.

Good luck getting it numbered. Smiley
newbie
Activity: 4
Merit: 39
(Copied from bitcoin-dev):

Following discussion with several wallet developers, I have come to the conclusion that the secondary goal of managing labels in non-specialized applications must be sacrificed in order to achieve the primary goal of wide implementation across different wallets. While this tradeoff was perhaps inevitable, it was worth a try!

As such I have rewritten the specification to use JSON, specifically the JSON Lines format. This allows documents to be split or streamed, and is convenient for command-line processing. The format is also now self describing via a type field, permitting simple type identification. Public keys and xpubs have been added as types following further suggestions. To keep the specification simple, compression and encryption have been removed - with the strong recommendation to consider protecting the data in a way suitable to its application.

The rewritten BIP can be found at https://github.com/craigraw/bips/blob/master/bip-wallet-labels.mediawiki

It is perhaps simplest to understand it by looking at an example export:

{ "type": "tx", "ref":"f91d0a8a78462bc59398f2c5d7a84fcff491c26ba54c4833478b202796c8aafd", "label": "Transaction" }
{ "type": "addr", "ref": "bc1q34aq5drpuwy3wgl9lhup9892qp6svr8ldzyy7c", "label": "Address" }
{ "type": "pubkey", "ref":"0283409659355b6d1cc3c32decd5d561abaac86c37a353b52895a5e6c196d6f448", "label": "Public Key" }
{ "type": "input", "ref":"f91d0a8a78462bc59398f2c5d7a84fcff491c26ba54c4833478b202796c8aafd:0", "label": "Input" }
{ "type": "output", "ref":"f91d0a8a78462bc59398f2c5d7a84fcff491c26ba54c4833478b202796c8aafd:1", "label": "Output" }
{ "type": "xpub", "ref":"xpub661MyMwAqRbcFtXgS5sYJABqqG9YLmC4Q1Rdap9gSE8Nq...", "label": "Extended Public Key" }

This proposal is intended as a foundational BIP for wallet label interchange - further BIPs may add label synchronization protocols etc.
hero member
Activity: 882
Merit: 5834
not your keys, not your coins!
The focus should remain on this being a bare-bones software-to-software interchange format.
I guess a purely software-to-software oriented format could also facilitate writing a simple script that generates an excel file or some other human-friendly representation.
newbie
Activity: 3
Merit: 0

Specifically related to the last sentence, do you think that adding a prefix at the beginning of some data will hamper the human usability? I'm starting to think that might be the case, assuming people autofill such data from other sheets.


No, but I think it's an unnecessary addition to the format that attempts to fix a problem that shouldn't be/can't be solved in the first place (i.e. making it friendly for humans to directly work with the raw CSV or manually add valid entries in Excel).

The focus should remain on this being a bare-bones software-to-software interchange format.

That it is reasonably not too terrible for a human to look up a single one-off value is a nice bonus. But I think any human requirement beyond that is impossible to satisfy.
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
I wouldn't worry too much about human readability so much as human usability.

The proposed two-column format allows for a trivial CTRL-F to locate any specific desired one-off info, should someone want/need to find something directly in the raw CSV file (most likely looking up by addr or by label text).

And I don't think it's a reasonable use-case/design constraint to try to insure that the format be easy for humans to directly edit or add to themselves. Manually logging inputs or outputs is going to be incredibly error-prone regardless of how the format is defined. Excel or Google Sheets is fine as a convenience viewer, but manual editing only really makes sense within some kind of dedicated Bitcoin-savvy helper UI that can look up each txid entered, etc.

Ongoing management via direct human edits isn't practical (or at least wouldn't be a recommended practice) so let's not agonize over it.

Specifically related to the last sentence, do you think that adding a prefix at the beginning of some data will hamper the human usability? I'm starting to think that might be the case, assuming people autofill such data from other sheets.
newbie
Activity: 3
Merit: 0
I wouldn't worry too much about human readability so much as human usability.

The proposed two-column format allows for a trivial CTRL-F to locate any specific desired one-off info, should someone want/need to find something directly in the raw CSV file (most likely looking up by addr or by label text).

And I don't think it's a reasonable use-case/design constraint to try to insure that the format be easy for humans to directly edit or add to themselves. Manually logging inputs or outputs is going to be incredibly error-prone regardless of how the format is defined. Excel or Google Sheets is fine as a convenience viewer, but manual editing only really makes sense within some kind of dedicated Bitcoin-savvy helper UI that can look up each txid entered, etc.

Ongoing management via direct human edits isn't practical (or at least wouldn't be a recommended practice) so let's not agonize over it.
newbie
Activity: 4
Merit: 39
I have actually sent an email related to this a few days ago, commenting that a 3rd column can be obsoleted by simply prefixing the different data formats with its own name. Such as: "address:" for addresses, "transaction:" for transactions, and so on.

Eliminates what I view as "dirty tricks" which you need to check for to identify a data type.

This just changes the algorithm from character and length matching, to string matching. Also, similar to as I pointed out above, you introduce the possibility of typos causing difficult to detect omissions in data imports, and resultant variations in how implementations handle this. Canonical representations of references are preferred for this reason.

The proposed algorithm is complete and non-ambiguous for all considered data types. If new types need to be added in future, a new BIP would required in any case.
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
Have you tried 'coding' the rather simple parsing logic in Excel? I'm not a very skilled Excel user, but just tried it and it works like a charm.
Code:
=IF(LEN(A2)<64; "address"; IF(LEN(A2)=64; "transaction"; IF(ISNUMBER(SEARCH("<";A2)); "input"; "output")))
No need to add an extra column for this, indeed!

I highly doubt the average Excel user with no programming language background whatsoever will be able to reproduce this query or understand what it does.
Then it can just be added into the BIP and that's it. It could even be exported with the data as a comment at the end of the file.
A third column would instead increase the file size by 50% as you'll add a 3rd field for each dataset.

I have actually sent an email related to this a few days ago, commenting that a 3rd column can be obsoleted by simply prefixing the different data formats with its own name. Such as: "address:" for addresses, "transaction:" for transactions, and so on.

Eliminates what I view as "dirty tricks" which you need to check for to identify a data type.
hero member
Activity: 882
Merit: 5834
not your keys, not your coins!
Have you tried 'coding' the rather simple parsing logic in Excel? I'm not a very skilled Excel user, but just tried it and it works like a charm.
Code:
=IF(LEN(A2)<64; "address"; IF(LEN(A2)=64; "transaction"; IF(ISNUMBER(SEARCH("<";A2)); "input"; "output")))
No need to add an extra column for this, indeed!

I highly doubt the average Excel user with no programming language background whatsoever will be able to reproduce this query or understand what it does.
Then it can just be added into the BIP and that's it. It could even be exported with the data as a comment at the end of the file.
A third column would instead increase the file size by 50% as you'll add a 3rd field for each dataset.
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
Have you tried 'coding' the rather simple parsing logic in Excel? I'm not a very skilled Excel user, but just tried it and it works like a charm.
Code:
=IF(LEN(A2)<64; "address"; IF(LEN(A2)=64; "transaction"; IF(ISNUMBER(SEARCH("<";A2)); "input"; "output")))
No need to add an extra column for this, indeed!

I highly doubt the average Excel user with no programming language background whatsoever will be able to reproduce this query or understand what it does.
hero member
Activity: 882
Merit: 5834
not your keys, not your coins!
If i'm going to use Excel, i'd rather see additional field which tell type of the data so i could filter/sort it with less effort.
This an interesting point. When designing this I considered it to be more work for everyone manually editing files to add a 3rd field, which in addition increases the export file size without aiding the parsing of the file in any material way (given it's currently possible to disambiguate from the reference alone). Further, it creates additional complexity and increases the potential for mistakes due to typos. But easily sorting to differentiate between types is a good counterpoint - I'm just not sure it's worth the cost.
Have you tried 'coding' the rather simple parsing logic in Excel? I'm not a very skilled Excel user, but just tried it and it works like a charm.
Code:
=IF(LEN(A2)<64; "address"; IF(LEN(A2)=64; "transaction"; IF(ISNUMBER(SEARCH("<";A2)); "input"; "output")))
No need to add an extra column for this, indeed!
newbie
Activity: 4
Merit: 39
Do you mean wallet developer or user?

I mean user. For example, typos like 'Addres' would mean labels are skipped on import, which might be difficult to detect in a file with many labels. Of course the wallet could try to determine what the user meant from the reference itself, but then we are back at square one (and some implementations might not do this).
legendary
Activity: 3668
Merit: 6382
Looking for campaign manager? Contact icopress!
Then something more: I've learned that best is that a file contains - in a way or another - the version of the protocol/documentation, to allow in the future, if anything is changed/added, still handle everything correctly or know from start it's not a supported version. Of course, then the use-in-excel may no longer work.

This is generally good advice and has been suggested elsewhere. However, the 'use-in-excel' goal makes this tricky. If such a version must be specified but is not present (and all other data is fine), should the import fail? In general users won't know which version number to use, even if it is readily possible to add it (say in the column headers). Also, it's difficult to see how this format could be extended in a way that required a version number to differentiate, so again I'm not sure it's worth the cost.

Indeed, if use-in-excel is so important, this can become tricky (there can be a column with the version, but it's imho more a hack than a proper solution).

If the version is missing - again, it depends.. one direction would be that the "version info" could also contain something (a name) that will ensure one really imports what he expects, not just a random csv.
Another direction could be to assume it's at least version 1 (if version info is missing). Of course, this kind of approach may just "pass the responsibility" for finding a solution to those proposing an extension (or a new version).
newbie
Activity: 4
Merit: 39
Thanks all for the useful feedback - it's particularly useful to hear from users, as opposed to just developers.

As noted, the main reason I didn't use descriptors is because txids and inputs aren't supported. In addition, it's less than user friendly to read and edit them.

If i'm going to use Excel, i'd rather see additional field which tell type of the data so i could filter/sort it with less effort.

This an interesting point. When designing this I considered it to be more work for everyone manually editing files to add a 3rd field, which in addition increases the export file size without aiding the parsing of the file in any material way (given it's currently possible to disambiguate from the reference alone). Further, it creates additional complexity and increases the potential for mistakes due to typos. But easily sorting to differentiate between types is a good counterpoint - I'm just not sure it's worth the cost.

Then something more: I've learned that best is that a file contains - in a way or another - the version of the protocol/documentation, to allow in the future, if anything is changed/added, still handle everything correctly or know from start it's not a supported version. Of course, then the use-in-excel may no longer work.

This is generally good advice and has been suggested elsewhere. However, the 'use-in-excel' goal makes this tricky. If such a version must be specified but is not present (and all other data is fine), should the import fail? In general users won't know which version number to use, even if it is readily possible to add it (say in the column headers). Also, it's difficult to see how this format could be extended in a way that required a version number to differentiate, so again I'm not sure it's worth the cost.
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
Quote
Descriptors are great but they're not human-readable, which is probably why the BIP doesn't make use of them.
What is not readable?
Code:
importdescriptors "[{\"desc\":\"addr(1A69TXnEM2ms9fMaY9UuiJ7415X7xZaUSg)#d5ts4kht\",\"timestamp\":\"now\",\"label\":\"Withdraw address for Binance account\"}]"

Well, for addresses they are easy to see, but I was mainly referring to transactions, inputs, outputs etc. which have a non-trivial representation. In fact, the first two don't even have descriptors of their own if I recall correctly.
newbie
Activity: 21
Merit: 16
Quote
Descriptors are great but they're not human-readable, which is probably why the BIP doesn't make use of them.
What is not readable?
Code:
importdescriptors "[{\"desc\":\"addr(1A69TXnEM2ms9fMaY9UuiJ7415X7xZaUSg)#d5ts4kht\",\"timestamp\":\"now\",\"label\":\"Withdraw address for Binance account\"}]"
Also note that in Bitcoin Core, there is "Export" button, and you can get CSV file in your output, so the whole format for that is already established:
Code:
"Confirmed","Date","Type","Label","Address","Amount (BTC)","ID"
"true","2015-02-14T13:26:20.000","Sent to","Withdraw from Binance at 01-01-2021","1A69TXnEM2ms9fMaY9UuiJ7415X7xZaUSg","-21.61679877","c3bdad6e7dcd7997e16a5b7b7cf4d8f6079820ff2eedd5fcbb2ad088f767b37b‎"
However, importing labels for transactions is not implemented. You can change them, if you set them for addresses.
legendary
Activity: 1568
Merit: 6660
bitcoincleanup.com / bitmixlist.org
So far, I used the most portable format I can think of: the command-line-based format:
Code:
importdescriptors "[{\"desc\":\"tr(cMahea7zqjxrtgAbB7LSGbcQUr1uX1ojuat9jZodMN87JcbXMTcA)#tnrke5yz\",\"timestamp\":\"now\",\"label\":\"taproot\"}]"
importdescriptors "[{\"desc\":\"tr(cMahea7zqjxrtgAbB7LSGbcQUr1uX1ojuat9jZodMN87K7XCyj5v)#xpd75frm\",\"timestamp\":\"now\",\"label\":\"taproot2\"}]"
It is standardized across wallets, it is extensible, and it is reasonably-compatible between different versions of Bitcoin Core. Also, in case of incompatibility, it is quite easy to fix it, and convert into another version of Bitcoin Core.

Descriptors are great but they're not human-readable, which is probably why the BIP doesn't make use of them.

Of course, it is a good idea to make the first column of the CSV records a descriptor instead of an assorted collection of addresses/transactions which require another column to differentiate. In fact, if descriptors are used, we can completely do away with the 3rd "type" column.
legendary
Activity: 3668
Merit: 6382
Looking for campaign manager? Contact icopress!
But i question the decision where first field (called "Reference" on test vector) may contain multiple types of data. If i'm going to use Excel, i'd rather see additional field which tell type of the data so i could filter/sort it with less effort.

That's what I've been also thinking: without more info that "one field that can contain anything" can become overly confusing.
Then I've seen he uses/proposes different formatting for different types of data.
But.. why not different field for each of those types?? Instead of having pretty much a syntax, we could just have the tx on the tx column and the input on the input column, for example (and leaving the not needed columns empty).

Then something more: I've learned that best is that a file contains - in a way or another - the version of the protocol/documentation, to allow in the future, if anything is changed/added, still handle everything correctly or know from start it's not a supported version. Of course, then the use-in-excel may no longer work. On the other hand, I would suggest enforcing (or "strongly suggesting") the use of a password when exporting those files, for the sake of one's privacy...
Pages:
Jump to: