Python 将字节转换为字符串_Python_String_Python 3.x

Python 将字节转换为字符串

python string python-3.x

Python 将字节转换为字符串,python,string,python-3.x,Python,String,Python 3.x,我使用此代码从外部程序获取标准输出： >>> from subprocess import * >>> command_stdout = Popen(['ls', '-l'], stdout=PIPE).communicate()[0] communicate（）方法返回字节数组： >>> command_stdout b'total 0\n-rw-rw-r-- 1 thomas thomas 0 Mar 3 07:03 file1\n-

我使用此代码从外部程序获取标准输出：

>>> from subprocess import *
>>> command_stdout = Popen(['ls', '-l'], stdout=PIPE).communicate()[0]

communicate（）方法返回字节数组：

>>> command_stdout
b'total 0\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file1\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file2\n'

但是，我希望将输出作为普通Python字符串使用。这样我就可以这样打印：

>>> print(command_stdout)
-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file1
-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file2

我以为这就是该方法的用途，但当我尝试它时，我又得到了相同的字节数组：

>>> binascii.b2a_qp(command_stdout)
b'total 0\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file1\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file2\n'

如何将字节值转换回字符串？我的意思是，使用“电池”而不是手动操作。我希望Python3可以使用。

您需要解码bytes对象以生成字符串：

>>> b"abcde"
b'abcde'

# utf-8 is used here because it is a very common encoding, but you
# need to use the encoding your data is actually in.
>>> b"abcde".decode("utf-8") 
'abcde'

您需要解码字节字符串并将其转换为字符（Unicode）字符串

关于Python 2

encoding = 'utf-8'
'hello'.decode(encoding)

或

关于Python 3

encoding = 'utf-8'
b'hello'.decode(encoding)

或

我想你真的想要这个：

>>> from subprocess import *
>>> command_stdout = Popen(['ls', '-l'], stdout=PIPE).communicate()[0]
>>> command_text = command_stdout.decode(encoding='windows-1252')

Aaron的回答是正确的，只是您需要知道使用哪种编码。我相信Windows使用的是“Windows-1252”。只有当你的内容中有一些不寻常的（非ASCII）字符时，这才有意义，但这样做会有所不同

顺便说一句，它确实很重要，这是Python开始使用两种不同类型的二进制和文本数据的原因：它无法在它们之间进行神奇的转换，因为除非您告诉它，否则它不知道编码！您知道的唯一方法是阅读Windows文档（或在此处阅读）。

我认为这种方法很简单：

>>> bytes_data = [112, 52, 52]
>>> "".join(map(chr, bytes_data))
'p44'

发件人：

要从标准流写入或读取二进制数据，请使用底层二进制缓冲区。例如，要将字节写入标准输出，请使用

sys.stdout.buffer.write（b'abc'）

将universal\u换行符设置为True，即

command_stdout = Popen(['ls', '-l'], stdout=PIPE, universal_newlines=True).communicate()[0]

如果您不知道编码，那么要以Python 3和Python 2兼容的方式将二进制输入读入字符串，请使用古老的MS-DOS编码：

PY3K = sys.version_info >= (3, 0)

lines = []
for line in stream:
    if not PY3K:
        lines.append(line)
    else:
        lines.append(line.decode('cp437'))

unicode_text = bytestring.decode(character_encoding)

>>> '—'.encode('utf-8').decode('cp1252')
'â€”'

由于编码未知，非英语符号应翻译为

cp437

（英语字符不翻译，因为它们在大多数单字节编码和UTF-8中匹配）

将任意二进制输入解码到UTF-8是不安全的，因为您可能会遇到以下情况：

>>> b'\x00\x01\xffsd'.decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 2: invalid
start byte

有关详细信息，请参阅

更新20170119：我决定实现对Python2和Python3都有效的斜杠转义解码。它应该比

cp437

解决方案慢，但是它应该在每个Python版本上产生相同的结果

# --- preparation

import codecs

def slashescape(err):
    """ codecs error handler. err is UnicodeDecode instance. return
    a tuple with a replacement for the unencodable part of the input
    and a position where encoding should continue"""
    #print err, dir(err), err.start, err.end, err.object[:err.start]
    thebyte = err.object[err.start:err.end]
    repl = u'\\x'+hex(ord(thebyte))[2:]
    return (repl, err.end)

codecs.register_error('slashescape', slashescape)

# --- processing

stream = [b'\x80abc']

lines = []
for line in stream:
    lines.append(line.decode('utf-8', 'slashescape'))

在正常工作时，用户：

还有更简单的方法吗？”fhand.read（）.decode（“ASCII”）“[…]太长了

您可以使用：

command_stdout.decode()

decode（）

具有：

codecs.decode（obj，encoding='utf-8'，errors='strict'）

我做了一个清理列表的函数

def cleanLists(self, lista):
    lista = [x.strip() for x in lista]
    lista = [x.replace('\n', '') for x in lista]
    lista = [x.replace('\b', '') for x in lista]
    lista = [x.encode('utf8') for x in lista]
    lista = [x.decode('utf8') for x in lista]

    return lista

，默认编码为“utf-8”，因此您可以直接使用：

b'hello'.decode()

这相当于

b'hello'.decode(encoding="utf-8")

另一方面，编码默认为默认的字符串编码。因此，您应该使用：

b'hello'.decode(encoding)

其中

encoding

是您想要的编码

Python2.7中添加了对关键字参数的支持。

要将字节序列解释为文本，您必须知道对应字符编码：

PY3K = sys.version_info >= (3, 0)

lines = []
for line in stream:
    if not PY3K:
        lines.append(line)
    else:
        lines.append(line.decode('cp437'))

unicode_text = bytestring.decode(character_encoding)

>>> '—'.encode('utf-8').decode('cp1252')
'â€”'

例如：

>>> b'\xc2\xb5'.decode('utf-8')
'µ'

ls

命令可能会产生无法解释为文本的输出。文件名在Unix上，可以是除斜杠

b'/'

和零之外的任何字节序列

b'\0'

：

>>> open(bytes(range(0x100)).translate(None, b'\0/'), 'w').close()

尝试使用utf-8编码来解码这样的字节汤会引发UnicodeDecodeError

情况可能更糟。解码可能会无声地失败并产生错误如果使用了错误的不兼容编码：

PY3K = sys.version_info >= (3, 0)

lines = []
for line in stream:
    if not PY3K:
        lines.append(line)
    else:
        lines.append(line.decode('cp437'))

unicode_text = bytestring.decode(character_encoding)

>>> '—'.encode('utf-8').decode('cp1252')
'â€”'

数据已损坏，但您的程序仍不知道发生了故障已经发生了

通常，要使用的字符编码不会嵌入到字节序列本身中。你必须在带外传达这些信息。某些结果比其他结果更可能出现，因此存在可以猜测字符编码的

chardet

模块。单个Python脚本可以在不同的位置使用多个字符编码

ls

可以使用

os.fsdecode（）

即使在（它使用上的错误处理程序 Unix）：

要获取原始字节，可以使用

os.fsencode（）

如果传递

universal\u newlines=True

参数，则

子流程使用
locale.getpreferredencoding（False）解码字节，例如，它可以
Windows上的cp1252

要动态解码字节流，可以使用：

不同的命令可能使用不同的字符编码例如，

dir

内部命令（

cmd

）的输出可以使用cp437。破译输出时，可以显式传递编码（Python 3.6+）：

文件名可能与使用Windows的os.listdir（）不同 Unicode API），例如，

'\xb6'

可以用

'\x14'

-Python的 cp437编解码器将

b'\x14'

映射到控制字符U+0014，而不是

U+00B6（¨）。要支持带有任意Unicode字符的文件名，请参见Python 3的，这是一种更安全的Python方法，可以将

字节

转换为

字符串

：

def byte_to_str(bytes_or_str):
    if isinstance(bytes_or_str, bytes): # Check if it's in bytes
        print(bytes_or_str.decode('utf-8'))
    else:
        print("Object not of byte type")

byte_to_str(b'total 0\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file1\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file2\n')

输出：

total 0
-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file1
-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file2

如果您应该通过尝试

decode（）

获得以下信息：

AttributeError:“str”对象没有属性“decode”

您还可以在强制转换中直接指定编码类型：

>>> my_byte_str
b'Hello World'

>>> str(my_byte_str, 'utf-8')
'Hello World'

当使用Windows系统中的数据时（使用

\r\n

行尾），我的答案是

String = Bytes.decode("utf-8").replace("\r\n", "\n")

为什么?？使用multiline Input.txt尝试此操作：

Bytes = open("Input.txt", "rb").read()
String = Bytes.decode("utf-8")
open("Output.txt", "w").write(String)

所有行尾都将加倍（到

\r\n

），从而产生额外的空行。Python的文本读取函数通常规范化行尾，以便字符串只使用

\n

。如果您从Windows系统接收二进制数据，Python就没有机会这样做。因此,

Bytes = open("Input.txt", "rb").read()
String = Bytes.decode("utf-8").replace("\r\n", "\n")
open("Output.txt", "w").write(String)

将复制您的原始文件。

因为这个问题实际上是在问ab

>>> from subprocess import Popen, PIPE
>>> text = Popen(['ls', '-l'], stdout=PIPE, encoding='utf-8').communicate()[0]
>>> type(text)
str
>>> print(text)
total 0
-rw-r--r-- 1 wim badger 0 May 31 12:45 some_file.txt

>>> b'abcde'.decode()
'abcde'

>>> b'caf\xe9'.decode('cp1250')
'café'

def toString(string):    
    try:
        return v.decode("utf-8")
    except ValueError:
        return string

b = b'97.080.500'
s = '97.080.500'
print(toString(b))
print(toString(s))

with open("bytesfile", "rb") as infile:
    str = base64.b85encode(imageFile.read())

with open("bytesfile", "rb") as infile:
    str2 = json.dumps(list(infile.read()))

command_result = subprocess.run(["ls", "-l"], capture_output=True, text=True)
command_result.stdout  # is a `str` containing your program's stdout

bytes.fromhex('c3a9').decode('utf-8')