Python 通过多个URL进行Web抓取

Python 通过多个URL进行Web抓取,python,json,web-scraping,Python,Json,Web Scraping,我有代码,我想为我需要的内容,但我想通过所有的游戏ID的运行,而不仅仅是在URL中的一个。我想改变2017020001,让它通过2017021272或直到赛季结束,大约1272我相信。如何使用下面的代码实现这一点 import csv import requests import os req = requests.get('https://statsapi.web.nhl.com/api/v1/game/2017020001/feed/live?site=en_nhl') data = re

我有代码,我想为我需要的内容,但我想通过所有的游戏ID的运行,而不仅仅是在URL中的一个。我想改变2017020001,让它通过2017021272或直到赛季结束,大约1272我相信。如何使用下面的代码实现这一点

import csv
import requests
import os

req = requests.get('https://statsapi.web.nhl.com/api/v1/game/2017020001/feed/live?site=en_nhl')
data = req.json()

my_data = []
pk = data['gameData']['game']['pk']
for item in data['liveData']['plays']['allPlays']:
    players = item.get('players')
    if players:
        player_a = players[0]['player']['fullName'] if len(players) > 0 else None
        player_b = players[1]['player']['fullName'] if len(players) > 1 else None
    else:
        player_a, player_b = None, None
    event = item['result']['event']
    time = item['about']['periodTime']
    triCode = item.get('team', {}).get('triCode')
    coordinates_x, coordinates_y = item['coordinates'].get('x'), item['coordinates'].get('y')
    my_data.append([pk, player_a, player_b, event, time, triCode, coordinates_x, coordinates_y])

headers = ["pk", "player_a", "player_b", "event", "time", "triCode", "coordinates_x", "coordinates_y"]

with open("NHL_2017020001.csv", "a", newline='') as f:
    writer = csv.writer(f)
    writer.writerow(headers)
    writer.writerows(my_data)
f.close()

您应该使用for循环迭代代码

像这样的方法应该会奏效:

import csv
import requests
import os

for x in range(2017020001, 2017021273):
    req = requests.get('https://statsapi.web.nhl.com/api/v1/game/%s/feed/live?site=en_nhl' % x)
    data = req.json()

    my_data = []
    pk = data['gameData']['game']['pk']
    for item in data['liveData']['plays']['allPlays']:
        players = item.get('players')
        if players:
            player_a = players[0]['player']['fullName'] if len(players) > 0 else None
            player_b = players[1]['player']['fullName'] if len(players) > 1 else None
        else:
            player_a, player_b = None, None
        event = item['result']['event']
        time = item['about']['periodTime']
        triCode = item.get('team', {}).get('triCode')
        coordinates_x, coordinates_y = item['coordinates'].get('x'), item['coordinates'].get('y')
        my_data.append([pk, player_a, player_b, event, time, triCode, coordinates_x, coordinates_y])

    headers = ["pk", "player_a", "player_b", "event", "time", "triCode", "coordinates_x", "coordinates_y"]

    with open("NHL_2017020001.csv", "a", newline='') as f:
        writer = csv.writer(f)
        writer.writerow(headers)
        writer.writerows(my_data)
    f.close()

您应该使用for循环迭代代码

像这样的方法应该会奏效:

import csv
import requests
import os

for x in range(2017020001, 2017021273):
    req = requests.get('https://statsapi.web.nhl.com/api/v1/game/%s/feed/live?site=en_nhl' % x)
    data = req.json()

    my_data = []
    pk = data['gameData']['game']['pk']
    for item in data['liveData']['plays']['allPlays']:
        players = item.get('players')
        if players:
            player_a = players[0]['player']['fullName'] if len(players) > 0 else None
            player_b = players[1]['player']['fullName'] if len(players) > 1 else None
        else:
            player_a, player_b = None, None
        event = item['result']['event']
        time = item['about']['periodTime']
        triCode = item.get('team', {}).get('triCode')
        coordinates_x, coordinates_y = item['coordinates'].get('x'), item['coordinates'].get('y')
        my_data.append([pk, player_a, player_b, event, time, triCode, coordinates_x, coordinates_y])

    headers = ["pk", "player_a", "player_b", "event", "time", "triCode", "coordinates_x", "coordinates_y"]

    with open("NHL_2017020001.csv", "a", newline='') as f:
        writer = csv.writer(f)
        writer.writerow(headers)
        writer.writerows(my_data)
    f.close()

如果游戏ID是按顺序编号的,那么将非常简单,只需将所有代码嵌套在一个for循环下,循环遍历所有游戏ID,并使用str.format()将必要的填充添加到编号中。在这种情况下,某些部分会发生更改:

import csv
import requests
import os

for i in range(1, 1273):
    url = 'https://statsapi.web.nhl.com/api/v1/game/201702{:04d}/feed/live?site=en_nhl'.format(i)
    req = requests.get(url)
    req.raise_for_status()
    data = req.json()
    my_data = []
    pk = data['gameData']['game']['pk']
    for item in data['liveData']['plays']['allPlays']:
        players = item.get('players')
        if players:
            player_a = players[0]['player']['fullName'] if len(players) > 0 else None
            player_b = players[1]['player']['fullName'] if len(players) > 1 else None
        else:
            player_a, player_b = None, None
            event = item['result']['event']
            time = item['about']['periodTime']
            triCode = item.get('team', {}).get('triCode')
        coordinates_x, coordinates_y = item['coordinates'].get('x'), item['coordinates'].get('y')
        my_data.append([pk, player_a, player_b, event, time, triCode, coordinates_x, coordinates_y])

        headers = ["pk", "player_a", "player_b", "event", "time", "triCode", "coordinates_x", "coordinates_y"]

    with open("NHL_201702{:04d}.csv".format(i), "a", newline='') as f:
        writer = csv.writer(f)
        writer.writerow(headers)
        writer.writerows(my_data)
最后一个修正是使用
和。。。as
使您无需明确关闭文件。
您可以找到有关使用str.format()的其他信息。

如果游戏ID按顺序编号,则将所有代码嵌套在一个for循环下即可,该循环会遍历所有游戏ID,并使用str.format()为编号添加必要的填充。在这种情况下,某些部分会发生更改:

import csv
import requests
import os

for i in range(1, 1273):
    url = 'https://statsapi.web.nhl.com/api/v1/game/201702{:04d}/feed/live?site=en_nhl'.format(i)
    req = requests.get(url)
    req.raise_for_status()
    data = req.json()
    my_data = []
    pk = data['gameData']['game']['pk']
    for item in data['liveData']['plays']['allPlays']:
        players = item.get('players')
        if players:
            player_a = players[0]['player']['fullName'] if len(players) > 0 else None
            player_b = players[1]['player']['fullName'] if len(players) > 1 else None
        else:
            player_a, player_b = None, None
            event = item['result']['event']
            time = item['about']['periodTime']
            triCode = item.get('team', {}).get('triCode')
        coordinates_x, coordinates_y = item['coordinates'].get('x'), item['coordinates'].get('y')
        my_data.append([pk, player_a, player_b, event, time, triCode, coordinates_x, coordinates_y])

        headers = ["pk", "player_a", "player_b", "event", "time", "triCode", "coordinates_x", "coordinates_y"]

    with open("NHL_201702{:04d}.csv".format(i), "a", newline='') as f:
        writer = csv.writer(f)
        writer.writerow(headers)
        writer.writerows(my_data)
最后一个修正是使用
和。。。as
使您无需明确关闭文件。
您可以找到有关使用str.format()的其他信息

中的
stop
值,因为它不会在结果序列中包含stop值。中的
stop
值,因为它不会在结果序列中包含stop值。因为它不会在结果序列中包含stop值。这与上面的代码。你有没有碰巧用它测试过?它说名称数据没有定义。当我改变它时,它说响应对象是不可下标的。在尝试访问数据之前,您是否尝试过检查请求是否成功?这是通过确保在运行request.get()函数后以下值的计算结果为true来完成的:
req.status\u code==200
不确定如何执行该操作,这是新的。因为它不会在结果序列中包含停止值。这与上面的代码不起作用。你有没有碰巧用它测试过?它说名称数据没有定义。当我改变它时,它说响应对象是不可下标的。在尝试访问数据之前,您是否尝试过检查请求是否成功?这是通过确保在运行request.get()函数后以下值计算为true来完成的:
req.status\u code==200
不确定如何执行此操作,这是一个新功能。