Author

Topic: [BOUNTY 2.0 BTC] Python2.X encoding problems in windows - Please Help (Read 1156 times)

newbie
Activity: 24
Merit: 0
Beware that sys.argv[] passed arguments are "ascii" (so you must decode(locale) first to get unicode) and every unicode you pass to windows in non-obvious ways, such as popen arguments, but even CreateProcessW, must be encoded to multibyte (sys.getfilesystemencoding() is reasonably portable for that). Doing:

Code:
Popen(['msg.exe', '*', '/server:127.0.0.1', unicode(sys.argv[1].decode(locale.getpreferredencoding()))])

Is wrong (python will try to convert to mbcs, but with ascii source encoding). Generally one should be careful with win32api args, and env variables (including cmdline). Everything else in python is unicode....

I was able to get it to work in some contexts but not others.  When I got it to work from the command line, I wasn't able to get it working from the settings file, when set from the File->Settings menu.  But also I wasn't sure if the encoding was hitting the file correctly.  There was just so many combinations...

Also, it worked with some unicode, and not others.


To demonstrate the dialog bug:

Code:
 File "armoryqt.py", line 716, in openSettings
    dlgSettings = DlgSettings(self, self)
  File "C:\BitcoinArmory-master\BitcoinArmory-master\qtdialogs.py", line 10073, in __init__
    '(%s)' % BTC_HOME_DIR, size=2)
  File "C:\BitcoinArmory-master\BitcoinArmory-master\qtdefines.py", line 212, in __init__
    self.setText(txt, **kwargs)
  File "C:\BitcoinArmory-master\BitcoinArmory-master\qtdefines.py", line 215, in setText
    text = unicode(text)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 51: ordinal not in range(128)

Traceback (most recent call last):
  File "armoryqt.py", line 716, in openSettings
    dlgSettings = DlgSettings(self, self)
  File "C:\BitcoinArmory-master\BitcoinArmory-master\qtdialogs.py", line 10073, in __init__
    '(%s)' % BTC_HOME_DIR, size=2)
  File "C:\BitcoinArmory-master\BitcoinArmory-master\qtdefines.py", line 212, in __init__
    self.setText(txt, **kwargs)
  File "C:\BitcoinArmory-master\BitcoinArmory-master\qtdefines.py", line 215, in setText
    text = unicode(text)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 51: ordinal not in range(128)

Once again, we're trying to convert mbcs string without specifying source encoding (ie wherever the string comes from should be decode('mbcs') first).

Seems like Qt suffers from same behaviour (mbcs strings are treated as ascii).

All of this madness probably stems from the fact that mbcs is only subset of utf16.

Partial fix for command line:
https://github.com/wyuzhe/BitcoinArmory/commit/fd7ff04bd0b343ad119980c85996840803771a1d
legendary
Activity: 1428
Merit: 1093
Core Armory Developer
In my case it appears to be utf8/codepage confusion:

After setting:

DEFAULT_ENCODING = locale.getpreferredencoding()

in armoryengine.py:89, it no longer complains about non-existent --satoshi-datadir

Furthermore, popen works as expected:

Code:
import sys
import locale
from subprocess import *
Popen(['msg.exe', '*', '/server:127.0.0.1', sys.argv[1].decode(locale.getpreferredencoding()).encode(sys.getfilesystemencoding())])

Shows it exactly in popup as on commandline.

I was able to get it to work in some contexts but not others.  When I got it to work from the command line, I wasn't able to get it working from the settings file, when set from the File->Settings menu.  But also I wasn't sure if the encoding was hitting the file correctly.  There was just so many combinations...

Also, it worked with some unicode, and not others.
newbie
Activity: 24
Merit: 0
In my case it appears to be utf8/codepage confusion:

After setting:

DEFAULT_ENCODING = locale.getpreferredencoding()

in armoryengine.py:89, it no longer complains about non-existent --satoshi-datadir

Furthermore, popen works as expected:

Code:
import sys
import locale
from subprocess import *
Popen(['msg.exe', '*', '/server:127.0.0.1', sys.argv[1].decode(locale.getpreferredencoding()).encode(sys.getfilesystemencoding())])

Shows it exactly in popup as on commandline.
legendary
Activity: 1428
Merit: 1093
Core Armory Developer
Hi etotheipi,

I don't have access to a PC right now but I did have to deal with this kind of annoyances in the past...
Can you simply try to add the below comment as the first line of your test file?
#encoding=utf-8

Does this fix the issue? If not, I'll dig some more into it tomorrow evening...

The problem is not the source-file encoding.  I think that's what you're talking about, and would only matter if the source file itself had non-ASCII in it.  Is this correct?

The problem is not the source file, but rather, strings and filesystem objects that are handled by the code.
legendary
Activity: 1428
Merit: 1093
Core Armory Developer
For reference, QuantumFoam might have found the answer already.  He pointed me to using the win32process::CreateProcessW method which actually looks like it will work.  I haven't tried it yet, but I did a little googling about it and it looks like it's the correct answer.  I just want to get his answer here so no one posts "first", instead of him.
legendary
Activity: 1176
Merit: 1280
May Bitcoin be touched by his Noodly Appendage
I thought it was # -*- coding: utf-8 -*-

Looks like it's a OS bug though
legendary
Activity: 1092
Merit: 1016
760930
Hi etotheipi,

I don't have access to a PC right now but I did have to deal with this kind of annoyances in the past...
Can you simply try to add the below comment as the first line of your test file?
#encoding=utf-8

Does this fix the issue? If not, I'll dig some more into it tomorrow evening...
legendary
Activity: 1428
Merit: 1093
Core Armory Developer
So, lots of reports of unicode issues on non-US systems trying to run Armory.  I had tested unicode support by setting my Satoshi datadir to "Bitcoiné" and then letting Armory try to figure out the rest.  This was tested both in Windows and Linux.   But now I have reports of this failing.  I realize that I didn't do parts of it right, but now I see that parts of it I can't figure out at the slightest.

If I instead use this directory name:  Bitcoinéś , everything now falls apart.  The ś is apparently un-convertable to the encoding used by subprocess.Popen, even though it succeeds everywhere else.  Having that filename in pure unicode works fine for os.path.exists() and I can even open a file inside and write data to it.  I think it's because the os module knows how to talk to Windows.  But I don't.

So here I am:

Code:
import os
import sys
import locale

pathUni = u'C:\\Users\\vbox\\ArmoryCheckout\\Bitcoin\xe9\u015b\bitcoin.conf'
os.path.exists(pathUni)  # true
open(pathUni, 'w').write(...)  # works

print locale.getpreferredencoding()  # cp1252
print sys.getfilesystemencoding()  # mbcs

# Errors out trying to convert to ASCII
subprocess.Popen(['something.exe', pathUni])

# Fails to find path
subprocess.Popen(['something.exe', pathUni.encode( 'utf-8')])

# Fails to find path
subprocess.Popen(['something.exe', pathUniencode( locale.getpreferredencoding() )])

# 'charmap' codec can't encode u'\u015b': character maps to "
subprocess.Popen(['something.exe', pathUni.encode( sys.getfilesystemencoding() ])  

The single post I could find on stackexchange that had this exact problem, was resolved by modifying "something.exe", because it was an app they controlled.  That doesn't solve my problem, where I don't have control of it.

I don't even know how to ask for help.  But if someone has experience with this and can help me fix it, it's worth 2 BTC to me.  I've wasted almost a full day on this!  (p.s. this doesn't seem to be a problem in Linux, for which preferred and fs encoding are all UTF-8... it's only a problem in Windows).

I suppose you can just create a directory or file in Windows with a ton of crazy unicode characters, and then attempt to run a Popen command using that file or directory as an argument.  It will fail.  
Jump to: