Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/355.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/html/73.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
python下载带有URL的HTML时编码错误_Python_Html_Encoding - Fatal编程技术网

python下载带有URL的HTML时编码错误

python下载带有URL的HTML时编码错误,python,html,encoding,Python,Html,Encoding,我在以以下方式运行python代码时遇到问题: import requests headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'} #url1='https://www.nytimes.com/store/west-side-highway-and-pi

我在以以下方式运行python代码时遇到问题:

import requests
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
#url1='https://www.nytimes.com/store/west-side-highway-and-piers-manhattan-1937-nypl482645-nypl482645p.html'
url2='https://www.nytimes.com/1978/06/21/archives/jordan-wary-of-interim-role-in-west-bank-and-gaza-jordan-accepted.html'
response = requests.get(url, headers=headers)
fileout="outputTest.html"
obj=open(fileout,"w")
obj.write(response.text)
obj.close()
当我使用url2(它在url1上工作)时,从URL下载HTML时出错

return codecs.charmap\u encode(输入、自身错误、编码表)[0]
UnicodeEncodeError:“charmap”编解码器无法对34060位置的字符“\u2010”进行编码:字符映射到
如何修复url2的错误?

使用

obj.write(str(response.text.encode('utf-8')))
而不是

obj.write(response.text)

谢谢,它可以工作,但它丢失了HTML规范化。HTML输出将有许多“\n”字符,如:b'\n\n\n…。我如何解决这个问题?
obj.write(response.text)