python下载带有URL的HTML时编码错误
我在以以下方式运行python代码时遇到问题:python下载带有URL的HTML时编码错误,python,html,encoding,Python,Html,Encoding,我在以以下方式运行python代码时遇到问题: import requests headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'} #url1='https://www.nytimes.com/store/west-side-highway-and-pi
import requests
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
#url1='https://www.nytimes.com/store/west-side-highway-and-piers-manhattan-1937-nypl482645-nypl482645p.html'
url2='https://www.nytimes.com/1978/06/21/archives/jordan-wary-of-interim-role-in-west-bank-and-gaza-jordan-accepted.html'
response = requests.get(url, headers=headers)
fileout="outputTest.html"
obj=open(fileout,"w")
obj.write(response.text)
obj.close()
当我使用url2(它在url1上工作)时,从URL下载HTML时出错
return codecs.charmap\u encode(输入、自身错误、编码表)[0]
UnicodeEncodeError:“charmap”编解码器无法对34060位置的字符“\u2010”进行编码:字符映射到
如何修复url2的错误?使用
obj.write(str(response.text.encode('utf-8')))
而不是
obj.write(response.text)
谢谢,它可以工作,但它丢失了HTML规范化。HTML输出将有许多“\n”字符,如:b'\n\n\n…。我如何解决这个问题?
obj.write(response.text)