将字节转换为string？

我正在使用这个代码从外部程序获得标准输出：

>>> from subprocess import * >>> command_stdout = Popen(['ls', '-l'], stdout=PIPE).communicate()[0]

communications（）方法返回一个字节数组：

 >>> command_stdout b'total 0\n-rw-rw-r-- 1 thomas thomas 0 Mar 3 07:03 file1\n-rw-rw-r-- 1 thomas thomas 0 Mar 3 07:03 file2\n'

不过，我想用普通的Pythonstring来处理输出。所以我可以这样打印：

 >>> print(command_stdout) -rw-rw-r-- 1 thomas thomas 0 Mar 3 07:03 file1 -rw-rw-r-- 1 thomas thomas 0 Mar 3 07:03 file2

我认为这就是binascii.b2a_qp（）方法，但是当我尝试它时，我再次得到相同的字节数组：

 >>> binascii.b2a_qp(command_stdout) b'total 0\n-rw-rw-r-- 1 thomas thomas 0 Mar 3 07:03 file1\n-rw-rw-r-- 1 thomas thomas 0 Mar 3 07:03 file2\n'

有谁知道如何将字节值转换回string？我的意思是，使用“电池”，而不是手动进行。我希望Python 3能够正常运行。

你需要解码bytes对象来产生一个string：

 >>> b"abcde" b'abcde' # utf-8 is used here because it is a very common encoding, but you # need to use the encoding your data is actually in. >>> b"abcde".decode("utf-8") 'abcde'

我觉得这样很容易：

 bytes = [112, 52, 52] "".join(map(chr, bytes)) >> p44

您需要解码字节string并将其转换为字符（unicode）string。

 b'hello'.decode(encoding)

要么

 str(b'hello', encoding)

如果你不知道编码，那么要用Python 3和Python 2兼容的方式把二进制input读成string，用古老的MS-DOS cp437编码：

 PY3K = sys.version_info >= (3, 0) lines = [] for line in stream: if not PY3K: lines.append(line) else: lines.append(line.decode('cp437'))

由于编码是未知的，期望非英文符号转换为cp437字符（英文字符不翻译，因为它们在大多数单字节编码和UTF-8中匹配）。

解码任意二进制input到UTF-8是不安全的，因为你可能会得到这个：

 >>> b'\x00\x01\xffsd'.decode('utf-8') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 2: invalid start byte

对于Python 2来说， latin-1也是一样（缺省是？）。请参阅代码页布局中的缺失点 – 这是Python扼杀臭名昭着的ordinal not in range 。

更新20150604 ：有传闻说，Python 3具有surrogateescape错误策略编码的东西到二进制数据没有数据丢失和崩溃，但它需要转换testing[binary] -> [str] -> [binary]来validation性能和可靠性。

UPDATE 20170116 ：感谢Nearoo的评论 – 还有一种可能性，用backslashreplaceerror handling程序来跳过所有未知字节的backslashreplace 。这仅适用于Python 3，所以即使使用此解决方法，仍然会从不同的Python版本获得不一致的输出：

 PY3K = sys.version_info >= (3, 0) lines = [] for line in stream: if not PY3K: lines.append(line) else: lines.append(line.decode('utf-8', 'backslashreplace'))

有关详细信息，请参阅https://docs.python.org/3/howto/unicode.html#python-s-unicode-support 。

更新20170119 ：我决定实施适用于Python 2和Python 3的斜线转义解码。它应该比cp437解决scheme慢，但是它应该在每个Python版本上产生相同的结果 。

 # --- preparation import codecs def slashescape(err): """ codecs error handler. err is UnicodeDecode instance. return a tuple with a replacement for the unencodable part of the input and a position where encoding should continue""" #print err, dir(err), err.start, err.end, err.object[:err.start] thebyte = err.object[err.start:err.end] repl = u'\\x'+hex(ord(thebyte))[2:] return (repl, err.end) codecs.register_error('slashescape', slashescape) # --- processing stream = [b'\x80abc'] lines = [] for line in stream: lines.append(line.decode('utf-8', 'slashescape'))

我想你真正想要的是这样的：

 >>> from subprocess import * >>> command_stdout = Popen(['ls', '-l'], stdout=PIPE).communicate()[0] >>> command_text = command_stdout.decode(encoding='windows-1252')

亚伦的回答是正确的，除了你需要知道使用哪种编码。我相信Windows使用“windows-1252”。只有在你的内容中有一些不寻常的（非ascii）字符才会有影响，但是这样做会有所帮助。

顺便说一句，它的重要性在于Python转向使用两种不同types的二进制和文本数据的原因：它不能在它们之间神奇地转换，因为它不知道编码，除非你告诉它！你会知道的唯一方法是阅读Windows文档（或在这里阅读）。

将universal_newlines设置为True，即

 command_stdout = Popen(['ls', '-l'], stdout=PIPE, universal_newlines=True).communicate()[0]

在Python 3中，您可以直接使用：

 b'hello'.decode()

相当于

 b'hello'.decode(encoding="utf-8")

这里的默认编码是“utf-8”，或者你可以通过以下方式检查：

 >> import sys >> sys.getdefaultencoding()

虽然@Aaron Maenpaa的答案正确，但最近一位用户问道

还有更简单的方法吗？ 'fhand.read（）。decode（“ASCII”）'[…]这么长！

您可以使用

 command_stdout.decode()

decode()有一个标准的参数

codecs.decode(obj, encoding='utf-8', errors='strict')

要将字节序列解释为文本，您必须知道相应的字符编码：

 unicode_text = bytestring.decode(character_encoding)

例：

 >>> b'\xc2\xb5'.decode('utf-8') 'µ'

ls命令可能会产生不能被解释为文本的输出。 Unix上的文件名可以是除斜杠b'/'和零b'\0'外的任何字节序列：

 >>> open(bytes(range(0x100)).translate(None, b'\0/'), 'w').close()

尝试使用utf-8编码解码这样的字节汤引发UnicodeDecodeError 。

可能会更糟。如果您使用错误的不兼容编码，解码可能会失败并产生mojibake ：

 >>> '—'.encode('utf-8').decode('cp1252') 'â€”'

数据已损坏，但您的程序仍不知道发生了故障。

一般来说，字符序列本身并不embedded使用什么字符编码。你必须在带外传达这个信息。一些结果比其他结果更可能存在，因此chardet模块可以猜测字符编码。一个Python脚本可能在不同的地方使用多个字符编码。

使用os.fsdecode()函数可以将ls输出转换为Pythonstring，即使对于不可译码的文件名（在Unix上它使用sys.getfilesystemencoding()和surrogateescapeerror handling程序）

 import os import subprocess output = os.fsdecode(subprocess.check_output('ls'))

为了获得原始字节，你可以使用os.fsencode() 。

如果传递的是universal_newlines=True参数，则subprocess locale.getpreferredencoding(False)使用locale.getpreferredencoding(False)来解码字节，例如，它可以是Windows上的cp1252 。

要实时解码字节stream，可以使用io.TextIOWrapper() ：例如。

不同的命令可能使用不同的字符编码输出，例如， dir internal command（ cmd ）可能使用cp437。要解码它的输出，你可以显式地传递编码（Python 3.6+）：

 output = subprocess.check_output('dir', shell=True, encoding='cp437')

文件名可能不同于os.listdir() （它使用Windows Unicode API），例如， '\xb6'可以用'\x14'替代–Python的cp437编解码器映射b'\x14'来控制字符U + 0014而不是U + 00B6（¶）。要支持具有任意Unicode字符的文件名，请参阅将可能包含非ASCII字符的poweshell输出解码为pythonstring

我做了一个清理列表的函数

 def cleanLists(self, lista): lista = [x.strip() for x in lista] lista = [x.replace('\n', '') for x in lista] lista = [x.replace('\b', '') for x in lista] lista = [x.encode('utf8') for x in lista] lista = [x.decode('utf8') for x in lista] return lista

对于Python 3，这是一个更安全的Pythonic方法从byte转换为string ：

 def byte_to_str(bytes_or_str): if isinstance(bytes_or_str, bytes): #check if its in bytes print(bytes_or_str.decode('utf-8')) else: print("Object not of byte type") byte_to_str(b'total 0\n-rw-rw-r-- 1 thomas thomas 0 Mar 3 07:03 file1\n-rw-rw-r-- 1 thomas thomas 0 Mar 3 07:03 file2\n')

输出：

 total 0 -rw-rw-r-- 1 thomas thomas 0 Mar 3 07:03 file1 -rw-rw-r-- 1 thomas thomas 0 Mar 3 07:03 file2

我做了一个清理列表的函数

 def cleanLists(self, lista): lista = [x.strip() for x in lista] lista = [x.replace('\n', '') for x in lista] lista = [x.replace('\b', '') for x in lista] lista = [x.encode('utf8') for x in lista] lista = [x.decode('utf8') for x in lista] return lista

从http://docs.python.org/3/library/sys.html ，

要向标准stream写入或读取二进制数据，请使用基础二进制缓冲区。例如，要将字节写入标准输出，请使用sys.stdout.buffer.write（b'abc'）。

将字节转换为string？

通过lxml的属性查找元素

如何使用requests.py在python下载大文件？

错误：该端口已被使用。

在列表中查找具有等于某个值的属性（符合任何条件）

get_or_create（）是否必须立即保存？（Django的）

获得运行python脚本的Windows电脑的名字？

如何删除string的左边部分？

使用Python在Mac OS X中查找当前活动窗口

Python的time.time（）返回本地或UTC时间戳吗？

使用多处理时不能pickle <type'instancemethod'> Pool.map（）

将字节转换为string？

通过lxml的属性查找元素

如何使用requests.py在python下载大文件？

错误：该端口已被使用。

在列表中查找具有等于某个值的属性（符合任何条件）

get_or_create（）是否必须立即保存？ （Django的）

获得运行python脚本的Windows电脑的名字？

如何删除string的左边部分？

使用Python在Mac OS X中查找当前活动窗口

Python的time.time（）返回本地或UTC时间戳吗？

使用多处理时不能pickle <type'instancemethod'> Pool.map（）

get_or_create（）是否必须立即保存？（Django的）