python：在尝试解析列表时获取UnicodeError_Python_Unicode_Ascii

python：在尝试解析列表时获取UnicodeError

python unicode

python：在尝试解析列表时获取UnicodeError,python,unicode,ascii,Python,Unicode,Ascii,试图通过API从中获取一个列表来获取他们的IMDB ID 为我能找到和找不到的电影创建日志记录，如下所示： import requests OMDBPath = "http://www.omdbapi.com/" movieFile = open("movies.txt") foundLog = open("log_found.txt", 'w') notFoundLog = open("log_not_found.txt", 'w') #### for line in movieFil

试图通过API从中获取一个列表来获取他们的IMDB ID

为我能找到和找不到的电影创建日志记录，如下所示：

import requests

OMDBPath = "http://www.omdbapi.com/"

movieFile = open("movies.txt")
foundLog = open("log_found.txt", 'w')
notFoundLog = open("log_not_found.txt", 'w')

####

for line in movieFile:
    name = line.split('(')[0].decode('utf8')
    print name
    year = False
    if line.find('(') != -1:
        year = line[line.find('(')+1 : line.find(')')].decode('utf8')
        OMDBQuery = {'t': name, 'y': year}
    else:
        OMDBQuery = {'t': name}

    req = requests.get(OMDBPath, params=OMDBQuery)
    if req.json()[u'Response'] == "False":
        if year:
            notFoundLog.write("Couldn't find " + name + " (" + year + ")" + "\n")
        else:
            notFoundLog.write("Couldn't find " + name + "\n")
    # else:
    #     print req.json()
    #     foundLog.write(req.text.decode('utf8').encode('latin1') + ",")
movieFile.close()
foundLog.close()
notFoundLog.close()

我读了很多关于unicode编码和解码的书，看起来这是因为我没有以正确的方式编码文件？不确定这里出了什么问题，当我到“卡什”时遇到了一个问题：

Caché
回溯（最近一次呼叫最后一次）：
文件“app.py”，第34行，在
notFoundLog.write（“找不到“+name+”（“+year+”）”+“\n”）
UnicodeEncodeError:“ascii”编解码器无法对位置18中的字符u'\xe9'进行编码：序号不在范围内（128）

这是一个工作解决方案，它依赖于模块为您打开的各种文件提供透明的utf-8编码/解码：

import requests
import codecs

OMDBPath = "http://www.omdbapi.com/"

with codecs.open("movies.txt", encoding='utf-8') as movieFile, \
     codecs.open("log_found.txt", 'w', encoding='utf-8') as foundLog, \
     codecs.open("log_not_found.txt", 'w', encoding='utf-8') as notFoundLog:
    for line in movieFile:
        name = line.split('(')[0]
        print(name)
        year = False
        if line.find('(') != -1:
            year = line[line.find('(')+1 : line.find(')')]
            OMDBQuery = {'t': name, 'y': year}
        else:
            OMDBQuery = {'t': name}

        req = requests.get(OMDBPath, params=OMDBQuery)
        if req.json()[u'Response'] == "False":
            if year:
                notFoundLog.write(u"Couldn't find {} ({})\n".format(name, year))
            else:
                notFoundLog.write(u"Couldn't find {}\n".format(name))
        #else:
            #print(req.json())
            #foundLog.write(u"{},".format(req.text))

请注意，只有在Python2.x中才需要使用

编解码器

模块。在Python3.x中，默认情况下，内置的

open

函数应该能够正确处理这个问题。

我已经阅读了这个响应，我不确定它是否对我的情况有帮助。例如，添加

.encode（'ascii'，'ignore'）

，将8½转换为8。我希望保留最大程度的保真度，以便在允许我将输出写入文件的同时提供搜索依据。我认为您需要的是

name.encode（'utf-8'）

。现在获得

UnicodeDecodeError:'ascii'编解码器无法解码位置1中的字节0xc2:序号不在范围内（128）

当我进入8 1/2电影文件中的行：name=line.split（“（”）[0]name=name.encode（'utf-8'）打印名称时，我的意思是当您尝试写入日志文件时，而不是从

movies.txt

读取时，您应该调用

name.encode（'utf-8'）

。

import requests
import codecs

OMDBPath = "http://www.omdbapi.com/"

with codecs.open("movies.txt", encoding='utf-8') as movieFile, \
     codecs.open("log_found.txt", 'w', encoding='utf-8') as foundLog, \
     codecs.open("log_not_found.txt", 'w', encoding='utf-8') as notFoundLog:
    for line in movieFile:
        name = line.split('(')[0]
        print(name)
        year = False
        if line.find('(') != -1:
            year = line[line.find('(')+1 : line.find(')')]
            OMDBQuery = {'t': name, 'y': year}
        else:
            OMDBQuery = {'t': name}

        req = requests.get(OMDBPath, params=OMDBQuery)
        if req.json()[u'Response'] == "False":
            if year:
                notFoundLog.write(u"Couldn't find {} ({})\n".format(name, year))
            else:
                notFoundLog.write(u"Couldn't find {}\n".format(name))
        #else:
            #print(req.json())
            #foundLog.write(u"{},".format(req.text))