Python 从文件完成URL列表循环时未提供架构_Python_Beautifulsoup_Python Requests

Python 从文件完成URL列表循环时未提供架构

python

Python 从文件完成URL列表循环时未提供架构,python,beautifulsoup,python-requests,Python,Beautifulsoup,Python Requests,我正在与BeautifulSoup一起进行一个web抓取项目，在其中的一个步骤中，我需要从另一个链接列表中编译一个链接列表，该列表已保存到文件中。循环似乎运行正常，直到到达文件的最后一行，此时它将抛出一个错误requests.exceptions.MissingSchema:无效URL“h”：未提供架构。也许你的意思是http://h？。下面是完整的代码和回溯这是否与python正在以列表的形式读取我的.txt文件中的每一行有关？我还试着只使用1个for循环 for link in seaso

我正在与BeautifulSoup一起进行一个web抓取项目，在其中的一个步骤中，我需要从另一个链接列表中编译一个链接列表，该列表已保存到文件中。循环似乎运行正常，直到到达文件的最后一行，此时它将抛出一个错误

requests.exceptions.MissingSchema:无效URL“h”：未提供架构。也许你的意思是http://h？

。下面是完整的代码和回溯

这是否与python正在以列表的形式读取我的.txt文件中的每一行有关？我还试着只使用1个for循环

for link in season_links:
    response_loop = requests.get(link[0])

但它没有解决这个错误

这是我的密码：

Contents of file:
https://rugby.statbunker.com/competitions/LastMatches?comp_id=98&limit=10&offs=UTC
https://rugby.statbunker.com/competitions/LastMatches?comp_id=99&limit=10&offs=UTC

# for reading season links from file
season_links = []
season_links_file = codecs.open('season_links_unpag_tst2.txt', 'r')
for line in season_links_file:
    stripped_line = line.strip()
    line_list = stripped_line.split()
    season_links.append(line_list)
season_links_file.close()
print('Season links file read complete' + '\n')
print(season_links)

# handling for pagination within each season
for link in season_links:
    t0 = time.time()
    for item in link: # for some reason it reads each row in my .txt as a list, so I have to loop over it again
        response_loop = requests.get(item)
        html_loop = response_loop.content
        soup_loop = BeautifulSoup(html_loop, 'html.parser')

        for p in soup_loop.find_all('p', text='›'):
            season_links.append(p.find_parent('a').get('href'))
        print('Season link: ' + item)
        response_delay = time.time() - t0
        print('Loop duration: ' + str(response_delay))
        time.sleep(4*response_delay)
        print('Sleep: ' + str(response_delay*4) + '\n')

回溯

Season link: https://rugby.statbunker.com/competitions/LastMatches?comp_id=1&limit=10&offs=UTC
Loop duration: 2.961906909942627
Sleep: 11.847627639770508

Season link: https://rugby.statbunker.com/competitions/LastMatches?comp_id=103&limit=10&offs=UTC
Loop duration: 1.6234941482543945
Sleep: 6.493976593017578

Traceback (most recent call last):
  File "/Users/claycrosby/Desktop/coding/projects/gambling/scraper/sb_compile_games.py", line 103, in <module>
    response_loop = requests.get(item)
  File "/opt/miniconda3/envs/ds383/lib/python3.8/site-packages/requests/api.py", line 76, in get
    return request('get', url, params=params, **kwargs)
  File "/opt/miniconda3/envs/ds383/lib/python3.8/site-packages/requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "/opt/miniconda3/envs/ds383/lib/python3.8/site-packages/requests/sessions.py", line 516, in request
    prep = self.prepare_request(req)
  File "/opt/miniconda3/envs/ds383/lib/python3.8/site-packages/requests/sessions.py", line 449, in prepare_request
    p.prepare(
  File "/opt/miniconda3/envs/ds383/lib/python3.8/site-packages/requests/models.py", line 314, in prepare
    self.prepare_url(url, params)
  File "/opt/miniconda3/envs/ds383/lib/python3.8/site-packages/requests/models.py", line 388, in prepare_url
    raise MissingSchema(error)
requests.exceptions.MissingSchema: Invalid URL 'h': No schema supplied. Perhaps you meant http://h?
[Finished in 23.3s with exit code 1]

季节链接：https://rugby.statbunker.com/competitions/LastMatches?comp_id=1&limit=10&offs=UTC
循环持续时间：2.961906909942627
睡眠时间：11.847627639770508
季节链接：https://rugby.statbunker.com/competitions/LastMatches?comp_id=103&limit=10&offs=UTC
循环持续时间：1.6234941482543945
睡眠时间：6.493976593017578
回溯（最近一次呼叫最后一次）：
文件“/Users/claycrosby/Desktop/coding/projects/gambing/scraper/sb_compile_games.py”，第103行，在
response\u loop=requests.get（项目）
get中的文件“/opt/miniconda3/envs/ds383/lib/python3.8/site packages/requests/api.py”，第76行
返回请求（'get'，url，params=params，**kwargs）
文件“/opt/miniconda3/envs/ds383/lib/python3.8/site packages/requests/api.py”，请求中的第61行
return session.request（method=method，url=url，**kwargs）
文件“/opt/miniconda3/envs/ds383/lib/python3.8/site packages/requests/sessions.py”，请求中的第516行
准备=自我准备请求（req）
文件“/opt/miniconda3/envs/ds383/lib/python3.8/site packages/requests/sessions.py”，第449行，在prepare_请求中
p、 预备(
文件“/opt/miniconda3/envs/ds383/lib/python3.8/site-packages/requests/models.py”，第314行，准备中
self.prepare_url（url，参数）
文件“/opt/miniconda3/envs/ds383/lib/python3.8/site packages/requests/models.py”，第388行，在prepare\u url中
raise MissingSchema（错误）
requests.exceptions.MissingSchema:无效URL“h”：未提供架构。您的意思可能是http://h？
[以23.3s完成，退出代码为1]

编辑：我试着打印每个

项目，发现第三个项目名为h
。我的文件中没有空格或h
，但问题源于我试图从循环中附加到原始列表。我使用了不同的列表，处理过程没有错误
# for reading season links from file
season_links_unpag = []
season_links_file = codecs.open('season_links_unpag_tst2.txt', 'r')
for line in season_links_file:
    stripped_line = line.strip()
    line_list = stripped_line.split()
    season_links_unpag.append(line_list)
season_links_file.close()
print('Season links file read complete' + '\n')
print(season_links_unpag)

# handling for pagination within each season
season_links = []
for link in season_links_unpag:
    t0 = time.time()
    for item in link:
        print(item)
        response_loop = requests.get(item)
        html_loop = response_loop.content
        soup_loop = BeautifulSoup(html_loop, 'html.parser')

        for p in soup_loop.find_all('p', text='›'):
            season_links.append(p.find_parent('a').get('href'))
        print('Season link: ' + item)
        response_delay = time.time() - t0
        print('Loop duration: ' + str(response_delay))
        time.sleep(4*response_delay)
        print('Sleep: ' + str(response_delay*4) + '\n')