Python '；utf-8'；编解码器可以'；t解码139604位置的字节0xf6：无效的开始字节_Python_Web_Web Crawler_Knowledge Capture

Python '；utf-8'；编解码器可以'；t解码139604位置的字节0xf6：无效的开始字节

python web web-crawler

Python '；utf-8'；编解码器可以'；t解码139604位置的字节0xf6：无效的开始字节,python,web,web-crawler,knowledge-capture,Python,Web,Web Crawler,Knowledge Capture,我正在做一个知识工程项目当我在一些科学家的个人网站上爬行时，这个错误发生了 import html2text import requests from urllib.request import urlopen from bs4 import BeautifulSoup import re import urllib homepage = "http://angom.myweb.cs.uwindsor.ca" headers = {'User-Agent': 'Mozilla/5.0 (W

我正在做一个知识工程项目

当我在一些科学家的个人网站上爬行时，这个错误发生了

import html2text
import requests
from urllib.request import urlopen
from bs4 import BeautifulSoup
import re
import urllib


homepage = "http://angom.myweb.cs.uwindsor.ca"
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0'}
req = urllib.request.Request(url=homepage, headers=headers)
print(req)
c = urlopen(req).read()
print(type(c))

content = urlopen(req).read().decode("utf-8")

UnicodeDecodeError:“utf-8”编解码器无法对139604位置的字节0xf6进行解码：起始字节无效

<meta http-equiv=Content-Type content="text/html; charset=windows-1252">

将在这种情况下工作

如果您计划使用BeautifulSoup，.

print（c[139600:139610]）

可能会给出提示？

content = urlopen(req).read().decode("windows-1252")