Author

Topic: Encoding bug in JSON-RPC handling of account/label names? (Read 2645 times)

newbie
Activity: 51
Merit: 0
Note the \u00C3\u00A5 instead of the correct \u00E5.
newbie
Activity: 1
Merit: 0
Try this:
rename one of your labels to Ã
open the debug console in bitcoin-qt
type listreceivedbyaccount

Then you can see this:
"account" : "�\u0083"

Yes - and this is the problem, as à should be \u00c3.

I have the same problem trying to use UTF-8 account names in bitcoind. The problem is the same from console, bitcoin-qt debug and json-rpc. I do not clearly understand how this should work, but i assume, that if i pass something to bitcoind with jsonrpc as utf-8 string - it can store it in any way, but should return utf-8 string. But in all cases it return double encoded string.



In [17]: a = 'Ã'

In [18]: json.dumps(a)
Out[18]: '"\\u00c3"'

that how char  'Ã' should escape.


but bitcoind returns

In [19]: b = "\u00C3\u0083"

In [20]: b.encode('latin-1').decode()
Out[20]: 'Ã'

and i should decode it to use.

hero member
Activity: 675
Merit: 514
Try this:
rename one of your labels to Ã
open the debug console in bitcoin-qt
type listreceivedbyaccount

Then you can see this:
"account" : "�\u0083"
legendary
Activity: 4542
Merit: 3393
Vile Vixen and Miss Bitcointalk 2021-2023
If you really did see \u00C3\u00A5 then it appears that you are trying to program in Java without understanding the inner Buddha-nature of the char type in Java.
I'm not trying to program in Java at all. That is the raw output of the JSON-RPC interface, which I am showing because it makes the source of the bug clear (if you want it in hex, it's 22 61 63 63 6f 75 6e 74 22 3a 22 46 72 5c 75 30 30 43 33 5c 75 30 30 41 35 6e 20 4d 75 6c 74 69 42 69 74 22). The application is expected to translate the escape sequences into the appropriate (or, in this case, inappropriate) Unicode characters.

As you can clearly see, these characters are U+00C3 (LATIN CAPITAL LETTER A WITH TILDE) and U+00A5 (YEN SIGN), which are correctly displayed thus: å If you're displaying these characters any other way, you're doing it wrong.

However, while the application is displaying these characters correctly, the characters themselves are incorrect. Obviously, the intended character is U+00E5 (LATIN SMALL LETTER A WITH RING ABOVE), which in UTF-8 is represented by the byte sequence C3 A5, which is also the ISO 8859-1 representation of the above (incorrect) characters. Interpreting this byte sequence as though it were ISO 8859-1 instead of UTF-8 is what is causing the bug. This is happening to the text before it is output by the JSON-RPC interface, so clearly the bug is in bitcoind or one of its libraries, rather than the application making use of this faulty output.
legendary
Activity: 2128
Merit: 1073
Would you guys at least try to reproduce the bug before assuming it's pilot error? Because I did, and it's not. This is what I get (in bitcoind 0.7.0 and 0.8.4):
Code:
"account" : "Fr\u00C3\u00A5n MultiBit"

Note the \u00C3\u00A5 instead of the correct \u00E5. It appears that bitcoind (and Bitcoin-Qt, but only in the debug console) is performing an ISO 8859-1 to UTF-8 conversion on a string that was already UTF-8 to begin with, even though neither bitcoind nor Bitcoin-Qt ever actually encode anything in ISO 8859-1 or anything other than UTF-8. A terminal (or other application) properly configured for Unicode will correctly display the resulting mess as "Från MultiBit".
If you really did see \u00C3\u00A5 then it appears that you are trying to program in Java without understanding the inner Buddha-nature of the char type in Java. The followin koan applies to you:
Quote from: Jargon file
A novice was trying to fix a broken Lisp machine by turning the power off and on.

Knight, seeing what the student was doing, spoke sternly: "You cannot fix a machine by just power-cycling it with no understanding of what is going wrong."

Knight turned the machine off and on.

The machine worked.
To understand what you're doing wrong you'll need to do the following:

1) grab the culprit JSON-RPC packets off the wire using Ethereal/Wireshark
2) display their hex dump
3) locate the documentation for the JSON-RPC class you've used as well as the internal TextStreamReader/TextStreamWriter classes used by the HTTP classes
4) print the JavaDoc of the entire inheritance hierarchy of the above all the way down to 'char'&'String' on a recycled/biodegradable paper with a vegetable-based ink
5) consume by mouth the above printout while intensly staring at the above hex dump.

Sometime during step 5) the internal Buddha-nature of Java's char&String types will illuminate your brain. You'll then easily fix your erroneous program and you'll never have any more problems of this type in your life.
legendary
Activity: 4542
Merit: 3393
Vile Vixen and Miss Bitcointalk 2021-2023
Would you guys at least try to reproduce the bug before assuming it's pilot error? Because I did, and it's not. This is what I get (in bitcoind 0.7.0 and 0.8.4):
Code:
"account" : "Fr\u00C3\u00A5n MultiBit"

Note the \u00C3\u00A5 instead of the correct \u00E5. It appears that bitcoind (and Bitcoin-Qt, but only in the debug console) is performing an ISO 8859-1 to UTF-8 conversion on a string that was already UTF-8 to begin with, even though neither bitcoind nor Bitcoin-Qt ever actually encode anything in ISO 8859-1 or anything other than UTF-8. A terminal (or other application) properly configured for Unicode will correctly display the resulting mess as "Från MultiBit".
administrator
Activity: 5222
Merit: 13032
It's a problem with your terminal, probably. Bitcoin just accepts whatever bytes you give it IIRC.

This forum is using ISO-8859-1.

The HTML is sent in ISO-8859-1, but Unicode is fully supported via HTML entities.
legendary
Activity: 2128
Merit: 1073
Anyone seen anything similar?
Yes, you are mixing character encodings: UTF-8 and ISO-8859-1. This forum is using ISO-8859-1. I manually forced it to UTF-8 and your listreceivedbyaccount example displayed correctly in my browser. You need to configure your OS and your terminal program and your HTTP library for the correct character encodings.
member
Activity: 93
Merit: 11
Hi!

I just bumped into something that might be an encoding bug. I sent a few milli Bitcoins to my plain vanilla 0.8.3 wallet, and labeled it as "Från MultiBit" ("From MultiBit" in Swedish). Then I shut down the QT client, started bitcoind in -deamon mode, and called "listreceivedbyaccount" and got the following result:

listreceivedbyaccount = [{"account":"Från MultiBit","amount":0.07,"confirmations":133}]

It *could* be just me doing something wrong in my Java commons-httpclient or net.sf.json-lib code, haven't really dug deep into that just yet.

Anyone seen anything similar?
Jump to: