Python UnicodeEncodeError:&x27；ascii'；编解码器可以'；t编码位置15-17中的字符：字符不在范围内（128）_Python_Python 3.x_Beautifulsoup_Python Unicode

Python UnicodeEncodeError:&x27；ascii'；编解码器可以'；t编码位置15-17中的字符：字符不在范围内（128）

python python-3.x

Python UnicodeEncodeError:&x27；ascii'；编解码器可以'；t编码位置15-17中的字符：字符不在范围内（128）,python,python-3.x,beautifulsoup,python-unicode,Python,Python 3.x,Beautifulsoup,Python Unicode,我在运行以下代码时遇到了困难 import urllib.request, urllib.parse, urllib.error from bs4 import BeautifulSoup import ssl import re import csv file = open("Test.CSV", "r") reader = csv.reader(file) for line in reader: text = line[5] lst = re.findall('(http.

我在运行以下代码时遇到了困难

import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
import ssl
import re
import csv

file = open("Test.CSV", "r")
reader = csv.reader(file)
for line in reader:
    text = line[5]
    lst = re.findall('(http.?://[^\s]+)', text)

    if not lst: print('Empty List')
    else:
        try:
            for url in lst:
                html = urllib.request.urlopen(url, context=ctx).read()
                soup = BeautifulSoup(html, 'html.parser')
                title = soup.title.string
                str_title = str (title)
                if 'Twitter' in str_title:
                    if len(lst) > 1: break
                    else: continue
                else:
                    print (str_title, ',', url)
        except urllib.error.HTTPError as err:
            if err.code == 404:
                print ('Invalid Twitter Link')

上面提到的代码读取一个csv文件，选择一个列，然后使用正则表达式解析该列，以获取单行中的所有超链接，然后使用BeautifulSoup通过超链接进行解析，以获取页面的“标题字符串”

现在，每当我运行此代码时，它都会停止对特定行的工作，并抛出一个错误“UnicodeEncodeError:'ascii'编解码器无法对位置15-17中的字符进行编码：序号不在范围内（128）”

这里如何使用Unicode字符串？

任何帮助都将不胜感激。

错误消息显示问题发生在

urllib.request.urlopen（url，context=ctx）

中。看起来至少有一个URL包含非ASCII字符

怎么办

您可以尝试引用URL：

html = urllib.request.urlopen(urllib.parse.quote(url, errors='ignore'), context=ctx).read()

这将防止UnicodeEncodeError，但会自动生成错误的url，这可能会导致以后出现问题

我的建议是捕获UnicodeError并显示一条错误消息，这将有助于了解引擎盖下发生的情况以及如何实际修复它：

for url in lst:
    try:
        html = urllib.request.urlopen(url, context=ctx).read()
        soup = BeautifulSoup(html, 'html.parser')
        title = soup.title.string
        ...
    except UnicodeEncodeError as e:
        print("Incorrect URL {}".format(url.encode('ascii', errors='backslashreplace')))

errors='backslaschreplace'

选项将转储出错字符的代码

最好给出错误的确切行以及完整的错误消息和堆栈跟踪。此外，UnicodeEncodeError表示错误发生在写入时（而不是读取时）…下面是确切的错误：文件“C:\Users\asaxena\Desktop\py4e\Gartner\crawler\u new.py”，第29行，html=urllib.request.urlopen（url，context=ctx）。read（）文件“C:\Users\asaxena\AppData\Local\Programs\Python\Python36-32\lib\urllib\re quest.py”，第223行，urlopenFile“C:\Users\asaxena\AppData\Local\Programs\Python\Python36-32\lib\http\clie nt.py”，第1117行，putrequest self.\u输出（request.encode（'ascii'））Unicodeincoder错误：“ascii”编解码器无法对15-17位的字符进行编码：ord inal不在范围内（128）问题本身的错误比注释中的错误要好（可读性更强）.Great..让我也试试这个..顺便说一句，代码中可以有多个try&exception块吗，你看我的代码中已经有一个try&exception块了。很抱歉问这样的基本问题，我对Python很陌生。try-exception块可以嵌套。异常将首先传递给内部块。如果它无法捕获它，它将被传播对于封闭的bloc.Hi，我解决了这个错误，但遇到了一些其他错误，例如“urllib.error.urleror”，我解决了这个问题，但随后遇到了另一个错误“Traceback（最近一次调用）：文件”C:\Users\asaxena\Desktop\py4e\Gartner\crawler\u new.py“，第32行，在title=soup.title.string AttributeError中：'NoneType'对象没有属性'string'。我真的有办法绕过出现的任何类型的错误吗？即使是不可见的错误？