Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/308.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
从python3中的文件读取字节字符串_Python_String_Python 3.x_Byte - Fatal编程技术网

从python3中的文件读取字节字符串

从python3中的文件读取字节字符串,python,string,python-3.x,byte,Python,String,Python 3.x,Byte,文件内容如下,文件编码为utf-8: cd232704-a46f-3d9d-97f6-67edb897d65f b'this Friday, Gerda Scheuers will be excited \xe2\x80\x94 but she\xe2\x80\x99s most excited about the merchandise the movie will bring.' 这是我的密码: with open(file, 'r') as f_in: for line i

文件内容如下,文件编码为utf-8:

cd232704-a46f-3d9d-97f6-67edb897d65f    b'this Friday, Gerda Scheuers will be excited \xe2\x80\x94 but she\xe2\x80\x99s most excited about the merchandise the movie will bring.'
这是我的密码:

with open(file, 'r') as f_in:
    for line in f_in:
        tokens = line.split('\t')
        print(tokens[1])
我想得到正确的答案——“本周五,格尔达·谢尔斯会很激动——但她最激动的是这部电影将带来的商品。”

但我无法从文件中读取字节。如果打开一个包含字节的文件,我需要解码该行以将其拆分。

您可以使用将字节文字转换为字节:

然后,对其进行解码以获得字符串对象:

>>> ast.literal_eval(r"b'excited \xe2\x80\x94 but she\xe2\x80\x99s'")
b'excited \xe2\x80\x94 but she\xe2\x80\x99s'
>>> ast.literal_eval(r"b'excited \xe2\x80\x94 but she\xe2\x80\x99s'").decode('utf-8')
'excited — but she’s'

打开(文件'r')作为f_in:
对于f_in中的行:
令牌=line.split('\t')
#如果len(代币)<2:
#继续
字节\u part=ast.literal\u eval(令牌[1])
s=字节_part.decode('utf-8')#解码字节以转换为字符串
>>> ast.literal_eval(r"b'excited \xe2\x80\x94 but she\xe2\x80\x99s'")
b'excited \xe2\x80\x94 but she\xe2\x80\x99s'
>>> ast.literal_eval(r"b'excited \xe2\x80\x94 but she\xe2\x80\x99s'").decode('utf-8')
'excited — but she’s'
with open(file, 'r') as f_in:
    for line in f_in:
        tokens = line.split('\t')
        # if len(tokens) < 2:
        #    continue
        bytes_part = ast.literal_eval(tokens[1])
        s = bytes_part.decode('utf-8')  # Decode the bytes to convert to a string