Python 通过多个URL进行Web抓取
我有代码,我想为我需要的内容,但我想通过所有的游戏ID的运行,而不仅仅是在URL中的一个。我想改变2017020001,让它通过2017021272或直到赛季结束,大约1272我相信。如何使用下面的代码实现这一点Python 通过多个URL进行Web抓取,python,json,web-scraping,Python,Json,Web Scraping,我有代码,我想为我需要的内容,但我想通过所有的游戏ID的运行,而不仅仅是在URL中的一个。我想改变2017020001,让它通过2017021272或直到赛季结束,大约1272我相信。如何使用下面的代码实现这一点 import csv import requests import os req = requests.get('https://statsapi.web.nhl.com/api/v1/game/2017020001/feed/live?site=en_nhl') data = re
import csv
import requests
import os
req = requests.get('https://statsapi.web.nhl.com/api/v1/game/2017020001/feed/live?site=en_nhl')
data = req.json()
my_data = []
pk = data['gameData']['game']['pk']
for item in data['liveData']['plays']['allPlays']:
players = item.get('players')
if players:
player_a = players[0]['player']['fullName'] if len(players) > 0 else None
player_b = players[1]['player']['fullName'] if len(players) > 1 else None
else:
player_a, player_b = None, None
event = item['result']['event']
time = item['about']['periodTime']
triCode = item.get('team', {}).get('triCode')
coordinates_x, coordinates_y = item['coordinates'].get('x'), item['coordinates'].get('y')
my_data.append([pk, player_a, player_b, event, time, triCode, coordinates_x, coordinates_y])
headers = ["pk", "player_a", "player_b", "event", "time", "triCode", "coordinates_x", "coordinates_y"]
with open("NHL_2017020001.csv", "a", newline='') as f:
writer = csv.writer(f)
writer.writerow(headers)
writer.writerows(my_data)
f.close()
您应该使用for循环迭代代码 像这样的方法应该会奏效:
import csv
import requests
import os
for x in range(2017020001, 2017021273):
req = requests.get('https://statsapi.web.nhl.com/api/v1/game/%s/feed/live?site=en_nhl' % x)
data = req.json()
my_data = []
pk = data['gameData']['game']['pk']
for item in data['liveData']['plays']['allPlays']:
players = item.get('players')
if players:
player_a = players[0]['player']['fullName'] if len(players) > 0 else None
player_b = players[1]['player']['fullName'] if len(players) > 1 else None
else:
player_a, player_b = None, None
event = item['result']['event']
time = item['about']['periodTime']
triCode = item.get('team', {}).get('triCode')
coordinates_x, coordinates_y = item['coordinates'].get('x'), item['coordinates'].get('y')
my_data.append([pk, player_a, player_b, event, time, triCode, coordinates_x, coordinates_y])
headers = ["pk", "player_a", "player_b", "event", "time", "triCode", "coordinates_x", "coordinates_y"]
with open("NHL_2017020001.csv", "a", newline='') as f:
writer = csv.writer(f)
writer.writerow(headers)
writer.writerows(my_data)
f.close()
您应该使用for循环迭代代码 像这样的方法应该会奏效:
import csv
import requests
import os
for x in range(2017020001, 2017021273):
req = requests.get('https://statsapi.web.nhl.com/api/v1/game/%s/feed/live?site=en_nhl' % x)
data = req.json()
my_data = []
pk = data['gameData']['game']['pk']
for item in data['liveData']['plays']['allPlays']:
players = item.get('players')
if players:
player_a = players[0]['player']['fullName'] if len(players) > 0 else None
player_b = players[1]['player']['fullName'] if len(players) > 1 else None
else:
player_a, player_b = None, None
event = item['result']['event']
time = item['about']['periodTime']
triCode = item.get('team', {}).get('triCode')
coordinates_x, coordinates_y = item['coordinates'].get('x'), item['coordinates'].get('y')
my_data.append([pk, player_a, player_b, event, time, triCode, coordinates_x, coordinates_y])
headers = ["pk", "player_a", "player_b", "event", "time", "triCode", "coordinates_x", "coordinates_y"]
with open("NHL_2017020001.csv", "a", newline='') as f:
writer = csv.writer(f)
writer.writerow(headers)
writer.writerows(my_data)
f.close()
如果游戏ID是按顺序编号的,那么将非常简单,只需将所有代码嵌套在一个for循环下,循环遍历所有游戏ID,并使用str.format()将必要的填充添加到编号中。在这种情况下,某些部分会发生更改:
import csv
import requests
import os
for i in range(1, 1273):
url = 'https://statsapi.web.nhl.com/api/v1/game/201702{:04d}/feed/live?site=en_nhl'.format(i)
req = requests.get(url)
req.raise_for_status()
data = req.json()
my_data = []
pk = data['gameData']['game']['pk']
for item in data['liveData']['plays']['allPlays']:
players = item.get('players')
if players:
player_a = players[0]['player']['fullName'] if len(players) > 0 else None
player_b = players[1]['player']['fullName'] if len(players) > 1 else None
else:
player_a, player_b = None, None
event = item['result']['event']
time = item['about']['periodTime']
triCode = item.get('team', {}).get('triCode')
coordinates_x, coordinates_y = item['coordinates'].get('x'), item['coordinates'].get('y')
my_data.append([pk, player_a, player_b, event, time, triCode, coordinates_x, coordinates_y])
headers = ["pk", "player_a", "player_b", "event", "time", "triCode", "coordinates_x", "coordinates_y"]
with open("NHL_201702{:04d}.csv".format(i), "a", newline='') as f:
writer = csv.writer(f)
writer.writerow(headers)
writer.writerows(my_data)
最后一个修正是使用和。。。as
使您无需明确关闭文件。
您可以找到有关使用str.format()的其他信息。如果游戏ID按顺序编号,则将所有代码嵌套在一个for循环下即可,该循环会遍历所有游戏ID,并使用str.format()为编号添加必要的填充。在这种情况下,某些部分会发生更改:
import csv
import requests
import os
for i in range(1, 1273):
url = 'https://statsapi.web.nhl.com/api/v1/game/201702{:04d}/feed/live?site=en_nhl'.format(i)
req = requests.get(url)
req.raise_for_status()
data = req.json()
my_data = []
pk = data['gameData']['game']['pk']
for item in data['liveData']['plays']['allPlays']:
players = item.get('players')
if players:
player_a = players[0]['player']['fullName'] if len(players) > 0 else None
player_b = players[1]['player']['fullName'] if len(players) > 1 else None
else:
player_a, player_b = None, None
event = item['result']['event']
time = item['about']['periodTime']
triCode = item.get('team', {}).get('triCode')
coordinates_x, coordinates_y = item['coordinates'].get('x'), item['coordinates'].get('y')
my_data.append([pk, player_a, player_b, event, time, triCode, coordinates_x, coordinates_y])
headers = ["pk", "player_a", "player_b", "event", "time", "triCode", "coordinates_x", "coordinates_y"]
with open("NHL_201702{:04d}.csv".format(i), "a", newline='') as f:
writer = csv.writer(f)
writer.writerow(headers)
writer.writerows(my_data)
最后一个修正是使用和。。。as
使您无需明确关闭文件。
您可以找到有关使用str.format()的其他信息中的
stop
值,因为它不会在结果序列中包含stop值。中的stop
值,因为它不会在结果序列中包含stop值。因为它不会在结果序列中包含stop值。这与上面的代码。你有没有碰巧用它测试过?它说名称数据没有定义。当我改变它时,它说响应对象是不可下标的。在尝试访问数据之前,您是否尝试过检查请求是否成功?这是通过确保在运行request.get()函数后以下值的计算结果为true来完成的:req.status\u code==200
不确定如何执行该操作,这是新的。因为它不会在结果序列中包含停止值。这与上面的代码不起作用。你有没有碰巧用它测试过?它说名称数据没有定义。当我改变它时,它说响应对象是不可下标的。在尝试访问数据之前,您是否尝试过检查请求是否成功?这是通过确保在运行request.get()函数后以下值计算为true来完成的:req.status\u code==200
不确定如何执行此操作,这是一个新功能。