Python Paper.article.ArticleException:article`download()`失败,403客户端错误:url禁止

Python Paper.article.ArticleException:article`download()`失败,403客户端错误:url禁止,python,python-3.x,url,download,newspaper3k,Python,Python 3.x,Url,Download,Newspaper3k,我试图从一篇我可以通过网络浏览的文章中下载文本(例如Safari) 错误是: newspaper.article.ArticleException: Article `download()` failed with 403 Client Error: Forbidden for url: https://www.newsweek.com/new-mexico-compound-charges-dropped-children-1096830 on URL https://www.newsweek

我试图从一篇我可以通过网络浏览的文章中下载文本(例如Safari)

错误是:

newspaper.article.ArticleException: Article `download()` failed with 403 Client Error: Forbidden for url: https://www.newsweek.com/new-mexico-compound-charges-dropped-children-1096830 on URL https://www.newsweek.com/new-mexico-compound-charges-dropped-children-1096830
代码如下:

from newspaper import Article
from newspaper import Config

user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.4 Safari/605.1.15'
config = Config()

config.browser_user_agent = user_agent
url = "https://www.newsweek.com/new-mexico-compound-charges-dropped-children-1096830".strip()



page = Article(url, config=config)


page.download()
page.parse()
print(page.text)
就像你看到的,我尝试了这个解决方案,但没有成功

完整的错误日志:

/Users/mona/anaconda3/bin/python /Users/mona/multimodal/newspaper_pg.py
Traceback (most recent call last):
  File "/Users/mona/multimodal/newspaper_pg.py", line 18, in <module>
    page.parse()
  File "/Users/mona/anaconda3/lib/python3.6/site-packages/newspaper/article.py", line 191, in parse
    self.throw_if_not_downloaded_verbose()
  File "/Users/mona/anaconda3/lib/python3.6/site-packages/newspaper/article.py", line 532, in throw_if_not_downloaded_verbose
    (self.download_exception_msg, self.url))
newspaper.article.ArticleException: Article `download()` failed with 403 Client Error: Forbidden for url: https://www.newsweek.com/new-mexico-compound-charges-dropped-children-1096830 on URL https://www.newsweek.com/new-mexico-compound-charges-dropped-children-1096830

Process finished with exit code 1
/Users/mona/anaconda3/bin/python/Users/mona/multimodal/paper\u pg.py
回溯(最近一次呼叫最后一次):
文件“/Users/mona/multimodal/paper_pg.py”,第18行,in
page.parse()
文件“/Users/mona/anaconda3/lib/python3.6/site packages/paper/article.py”,第191行,解析
self.throw\u如果\u未下载\u verbose()
文件“/Users/mona/anaconda3/lib/python3.6/site packages/paper/article.py”,第532行,如果未下载,请详细输入
(self.download_exception_msg,self.url))
Paper.article.ArticleException:article`download()`失败,403客户端错误:url禁止:https://www.newsweek.com/new-mexico-compound-charges-dropped-children-1096830 关于URLhttps://www.newsweek.com/new-mexico-compound-charges-dropped-children-1096830
进程已完成,退出代码为1

我从这个网站获得了我的用户代理信息:

适合我的用户代理是
Mozilla/5.0(Macintosh;Intel Mac OS X 10.15;rv:78.0)Gecko/20100101 Firefox/78.0

您可以在这里找到您的:


对我来说,正确的用户代理是
Mozilla/5.0(Macintosh;Intel Mac OS X 10.15;rv:78.0)Gecko/20100101 Firefox/78.0

您可以在这里找到您的:

from newspaper import Article
from newspaper import Config

user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Firefox/78.0'
config = Config()

config.browser_user_agent = user_agent
url = "https://www.newsweek.com/new-mexico-compound-charges-dropped-children-1096830".strip()



page = Article(url, config=config)


page.download()
page.parse()
print(page.text)