Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/html/89.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 使用BeautifulSoup抓取Web数据_Python_Html_Web Scraping_Beautifulsoup - Fatal编程技术网

Python 使用BeautifulSoup抓取Web数据

Python 使用BeautifulSoup抓取Web数据,python,html,web-scraping,beautifulsoup,Python,Html,Web Scraping,Beautifulsoup,我正在努力从rotowire.com上获取每场棒球比赛的降雨机会和温度/风速。一旦我刮取了数据,我将把它转换成三列——雨、温度和风。多亏了另一位用户,我才能够接近获取数据,但却无法完全做到这一点。我试过两种方法 第一种方法: from bs4 import BeautifulSoup import requests import pandas as pd url = 'https://www.rotowire.com/baseball/daily-lineups.php' r = reques

我正在努力从rotowire.com上获取每场棒球比赛的降雨机会和温度/风速。一旦我刮取了数据,我将把它转换成三列——雨、温度和风。多亏了另一位用户,我才能够接近获取数据,但却无法完全做到这一点。我试过两种方法

第一种方法:

from bs4 import BeautifulSoup
import requests
import pandas as pd

url = 'https://www.rotowire.com/baseball/daily-lineups.php'
r = requests.get(url)
soup = BeautifulSoup(r.text, "html.parser")

weather = []

for i in soup.select(".lineup__bottom"):
    
    forecast = i.select_one('.lineup__weather-text').text
    weather.append(forecast)
这将返回:

['\n100% Rain\r\n                66°\xa0\xa0Wind 8 mph In                        ', '\n0% Rain\r\n                64°\xa0\xa0Wind 4 mph L-R                        ', '\n0% Rain\r\n                69°\xa0\xa0Wind 7 mph In                        ', '\nDome\r\n                In Domed Stadium\r\n                        ', '\n0% Rain\r\n                75°\xa0\xa0Wind 10 mph Out                        ', '\n0% Rain\r\n                68°\xa0\xa0Wind 9 mph R-L                        ', '\n0% Rain\r\n                82°\xa0\xa0Wind 9 mph                         ', '\n0% Rain\r\n                81°\xa0\xa0Wind 5 mph R-L                        ', '\nDome\r\n                In Domed Stadium\r\n                        ', '\n1% Rain\r\n                75°\xa0\xa0Wind 4 mph R-L                        ', '\n1% Rain\r\n                71°\xa0\xa0Wind 6 mph Out                        ', '\nDome\r\n                In Domed Stadium\r\n                        ']
我尝试过的第二种方法是:

from bs4 import BeautifulSoup
import requests
import pandas as pd


url = 'https://www.rotowire.com/baseball/daily-lineups.php'

r = requests.get(url)
soup = BeautifulSoup(r.text, "html.parser")

#weather = []

for i in soup.select(".lineup__bottom"):
    
    forecast = i.select_one('.lineup__weather-text').text
    weather.append(forecast)
    #print(forecast)
    rain = i.select_one('.lineup__weather-text b:contains("Rain") ~ span').text

这将返回一个
属性错误,即“非类型”对象没有属性“文本”

您可以找到带有游戏信息的卡,并在底部找到天气数据(如果存在):


要查找所有数据,请参见此示例:

import pandas as pd
import requests
from bs4 import BeautifulSoup


url = "https://www.rotowire.com/baseball/daily-lineups.php"
soup = BeautifulSoup(requests.get(url).content, "html.parser")

weather = []

for tag in soup.select(".lineup__bottom"):
    header = tag.find_previous(class_="lineup__teams").get_text(
        strip=True, separator=" vs "
    )
    rain = tag.select_one(".lineup__weather-text > b")
    forecast_info = rain.next_sibling.split()
    temp = forecast_info[0]
    wind = forecast_info[2]

    weather.append(
        {"Header": header, "Rain": rain.text.split()[0], "Temp": temp, "Wind": wind}
    )


df = pd.DataFrame(weather)
print(df)
输出:

        header       rain       temperature              wind
0   PHI vs CIN  100% Rain                66          8 mph In
1   CWS vs CLE    0% Rain                64         4 mph L-R
2    SD vs CHC    0% Rain                69          7 mph In
3   NYM vs ARI       Dome  In Domed Stadium  In Domed Stadium
4   MIN vs BAL    0% Rain                75         9 mph Out
5    TB vs NYY    0% Rain                68         9 mph R-L
6   MIA vs TOR    0% Rain                81         6 mph L-R
7   WAS vs ATL    0% Rain                81         4 mph R-L
8   BOS vs HOU       Dome  In Domed Stadium  In Domed Stadium
9   TEX vs COL    0% Rain                76             6 mph
10  STL vs LAD    0% Rain                73         4 mph Out
11  OAK vs SEA       Dome  In Domed Stadium  In Domed Stadium
        Header  Rain Temp     Wind
0   PHI vs CIN  100%  66°        8
1   CWS vs CLE    0%  64°        4
2    SD vs CHC    0%  69°        7
3   NYM vs ARI  Dome   In  Stadium
4   MIN vs BAL    0%  75°        9
5    TB vs NYY    0%  68°        9
6   MIA vs TOR    0%  81°        6
7   WAS vs ATL    0%  81°        4
8   BOS vs HOU  Dome   In  Stadium
9   TEX vs COL    0%  76°        6
10  STL vs LAD    0%  73°        4
11  OAK vs SEA  Dome   In  Stadium

该死。比我先到+。我使用
在soup中列出。选择('.lineup:not(.is-ad,.is-tools)):
和max-split-arg用于拆分临时风位。@QHarr我也花了一些时间:-)我也想看看你的方法。(除非是一样的。)差别还不够。我喜欢你加上谁在玩。我对此犹豫不决。@QHarr你仍然可以把它作为一个答案发布。我将upvote@ShawnSchreier
        Header  Rain Temp     Wind
0   PHI vs CIN  100%  66°        8
1   CWS vs CLE    0%  64°        4
2    SD vs CHC    0%  69°        7
3   NYM vs ARI  Dome   In  Stadium
4   MIN vs BAL    0%  75°        9
5    TB vs NYY    0%  68°        9
6   MIA vs TOR    0%  81°        6
7   WAS vs ATL    0%  81°        4
8   BOS vs HOU  Dome   In  Stadium
9   TEX vs COL    0%  76°        6
10  STL vs LAD    0%  73°        4
11  OAK vs SEA  Dome   In  Stadium