Python 3.x 使用pandas读取带有字节字符串的tsv文件_Python 3.x_Pandas

Python 3.x 使用pandas读取带有字节字符串的tsv文件

python-3.x pandas

Python 3.x 使用pandas读取带有字节字符串的tsv文件,python-3.x,pandas,Python 3.x,Pandas,我有一个tsv文件，其中一列包含utf-8编码的字节字符串（例如，b'La croisi\xc3\xa8re'）。我试图用pandas方法read\u csv读取此文件，但我得到的是一列字符串，而不是字节字符串（例如，“b'La croisi\xc3\xa8re”）在Python3中，如何将该列读取为字节字符串而不是常规字符串？我试图在read\u csv中使用dtype={'my\u bytestr\u col'：bytes}，但没有成功另一种说法是：我如何从类似“b'La croisi\

我有一个tsv文件，其中一列包含utf-8编码的字节字符串（例如，

b'La croisi\xc3\xa8re'

）。我试图用

pandas

方法

read\u csv

读取此文件，但我得到的是一列字符串，而不是字节字符串（例如，

“b'La croisi\xc3\xa8re”

）

在Python3中，如何将该列读取为字节字符串而不是常规字符串？我试图在

read\u csv

中使用

dtype={'my\u bytestr\u col'：bytes}

，但没有成功

另一种说法是：我如何从类似“b'La croisi\xc3\xa8re”的东西转到

b'La croisi\xc3\xa8re'

？

示例文件：

    First Name  Last Name   bytes
0   foo          bar        b'La croisi\xc3\xa8re'

然后试试这个：

import pandas as pd
import ast
df = pd.read_csv('file.tsv', sep='\t')
df['bytes'].apply(ast.literal_eval)

输出：

要从类似于

“b'La croisi\xc3\xa8re'

到

b'La croisi\xc3\xa8re'

的内容，您可以执行

数据[2:-1]。encode（）

不太可能，这将返回以下错误编码的字节字符串：

b'La croisi\xc3\x83\xc2\xa8re'

这不起作用，因为它将我的

tsv

文件中的utf-8编码字节字符串列打开为一个简单字符串。也许我可以向这个函数传递一组参数来修复它？bytes（str（df['my'u bytestr\u col']），'utf-8'），它返回

b“b'La croisi\\xc3\\xa8re'

在Python 3OK中，这应该适用于您：df['bytes'].apply（ast.literal\u eval）

0    b'La croisi\xc3\xa8re'
Name: bytes, dtype: object