Python 请求';无效URL的MissingSchema异常
我试图从一个网站刮内容,但我得到下面提到的错误 方法:Python 请求';无效URL的MissingSchema异常,python,python-requests,Python,Python Requests,我试图从一个网站刮内容,但我得到下面提到的错误 方法: def scrape_newtimes(): """Scrapes content from the NewTimes""" url = 'https://www.newtimes.co.rw/' r = requests.get(url, headers=HEADERS) tree = fromstring(r.content) links = tree.xpath('//div[@clas
def scrape_newtimes():
"""Scrapes content from the NewTimes"""
url = 'https://www.newtimes.co.rw/'
r = requests.get(url, headers=HEADERS)
tree = fromstring(r.content)
links = tree.xpath('//div[@class="x-small-push clearfix"]/a/@href')
for link in links:
r = requests.get(link, headers=HEADERS)
blog_tree = fromstring(r.content)
paras = blog_tree.xpath('//div[@class="article-content"]/p')
para = extract_paratext(paras)
text = extract_text(para)
if not text:
continue
yield '"%s" %s' % (text, link)
我得到的错误是:
>>> sc = scrape_newtimes()
>>> string_1 = next(sc)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "D:\Projects\bird\bird-env\bot.py", line 58, in scrape_newtimes
r = requests.get(link, headers=HEADERS)
File "D:\Projects\bird\venv\lib\site-packages\requests\api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "D:\Projects\bird\venv\lib\site-packages\requests\api.py", line 60, in request
return session.request(method=method, url=url, **kwargs)
File "D:\Projects\bird\venv\lib\site-packages\requests\sessions.py", line 519, in request
prep = self.prepare_request(req)
File "D:\Projects\bird\venv\lib\site-packages\requests\sessions.py", line 462, in prepare_request
hooks=merge_hooks(request.hooks, self.hooks),
File "D:\Projects\bird\venv\lib\site-packages\requests\models.py", line 313, in prepare
self.prepare_url(url, params)
File "D:\Projects\bird\venv\lib\site-packages\requests\models.py", line 387, in prepare_url
raise MissingSchema(error)
requests.exceptions.MissingSchema: Invalid URL '/news/londons-kings-college-launch-civil-service-programme-rwanda': No schema supplied. Perhaps you meant http:///news/londons-kings-college-launch-civil-service-programme-rwanda?
>>>
sc=scrape\u newtimes()
>>>字符串_1=下一个(sc)
回溯(最近一次呼叫最后一次):
文件“”,第1行,在
文件“D:\Projects\bird\bird env\bot.py”,第58行,在scrape\u newtimes中
r=requests.get(link,headers=headers)
文件“D:\Projects\bird\venv\lib\site packages\requests\api.py”,第75行,在get中
返回请求('get',url,params=params,**kwargs)
文件“D:\Projects\bird\venv\lib\site packages\requests\api.py”,第60行,在请求中
return session.request(method=method,url=url,**kwargs)
文件“D:\Projects\bird\venv\lib\site packages\requests\sessions.py”,第519行,在请求中
准备=自我准备请求(req)
文件“D:\Projects\bird\venv\lib\site packages\requests\sessions.py”,第462行,在prepare\u请求中
钩子=合并钩子(request.hooks,self.hooks),
文件“D:\Projects\bird\venv\lib\site packages\requests\models.py”,第313行,在prepare中
self.prepare_url(url,参数)
文件“D:\Projects\bird\venv\lib\site packages\requests\models.py”,第387行,在prepare\u url中
raise MissingSchema(错误)
requests.exceptions.MissingSchema:无效URL“/news/londons kings college launch civil service Program卢旺达”:未提供架构。也许你的意思是http:///news/londons-kings-college-launch-civil-service-programme-rwanda?
>>>
异常基本上告诉您出了什么问题:
requests.exceptions.MissingSchema: Invalid URL '/news/londons-kings-college-launch-civil-service-programme-rwanda': No schema supplied. Perhaps you meant http:///news/londons-kings-college-launch-civil-service-programme-rwanda?
或使用线条环绕线条:
Invalid URL '/news/londons-kings-college-launch-civil-service-programme-rwanda':
No schema supplied. Perhaps you meant
http:///news/londons-kings-college-launch-civil-service-programme-rwanda?
您的
链接
不包含完整的URL请复制并粘贴错误回溯的文本,而不是发布它的屏幕截图-但是错误告诉您,您请求的URL缺少协议架构。它看起来像是一个相对的url。也许您需要将变量url
与每个链接连接起来,然后才能请求链接url。@ChidG Check我已经添加了屏幕截图。我应该怎么做?如何将上述url连接到link@RwandaNkunda尝试