Python 从网页中抓取数据_Python_Web Scraping

Python 从网页中抓取数据

python web-scraping

Python 从网页中抓取数据,python,web-scraping,Python,Web Scraping,我正试图从以下网页中获取数据我需要一个表格格式的记分牌。有人能帮我吗？我在用蟒蛇3。我不熟悉网页抓取，也不太熟悉网页的内部结构。提前谢谢我尝试将BeautifulSoup与urllib2等一起使用，但没有达到任何效果。您可以使用pandas的read\u html（）。这将返回数据帧列表。你从那里用它做什么取决于你自己。你们可能需要整理一下数据，但我只是把它们放进一张大桌子里给你们看 import pandas as pd url = 'https://m.cricbuzz.com/li

我正试图从以下网页中获取数据我需要一个表格格式的记分牌。有人能帮我吗？我在用蟒蛇3。我不熟悉网页抓取，也不太熟悉网页的内部结构。提前谢谢

我尝试将BeautifulSoup与urllib2等一起使用，但没有达到任何效果。

您可以使用pandas的

read\u html（）

。这将返回数据帧列表。你从那里用它做什么取决于你自己。你们可能需要整理一下数据，但我只是把它们放进一张大桌子里给你们看

import pandas as pd

url = 'https://m.cricbuzz.com/live-cricket-scorecard/10711/aus-vs-ind-1st-test-india-in-australia-test-series-2011-12'
dfs = pd.read_html(url)

result =  pd.concat( [ df for df in dfs ] )

输出：

print (result.to_string())
                      0                     1                     2                     3                     4
0               Batting                     R                     B                    4s                    6s
0              Ed Cowan                    68                   177                     7                     0
1  c M Dhoni b R Ashwin  c M Dhoni b R Ashwin  c M Dhoni b R Ashwin  c M Dhoni b R Ashwin  c M Dhoni b R Ashwin
0          David Warner                    37                    49                     4                     1
1   c M Dhoni b U Yadav   c M Dhoni b U Yadav   c M Dhoni b U Yadav   c M Dhoni b U Yadav   c M Dhoni b U Yadav
0           Shaun Marsh                     0                     6                     0                     0
1   c V Kohli b U Yadav   c V Kohli b U Yadav   c V Kohli b U Yadav   c V Kohli b U Yadav   c V Kohli b U Yadav
0         Ricky Ponting                    62                    94                     6                     0
1  c V Laxman b U Yadav  c V Laxman b U Yadav  c V Laxman b U Yadav  c V Laxman b U Yadav  c V Laxman b U Yadav
0        Michael Clarke                    31                    68                     5                     0
1              b Z Khan              b Z Khan              b Z Khan              b Z Khan              b Z Khan
0        Michael Hussey                     0                     1                     0                     0
1    c M Dhoni b Z Khan    c M Dhoni b Z Khan    c M Dhoni b Z Khan    c M Dhoni b Z Khan    c M Dhoni b Z Khan
0           Brad Haddin                    27                    69                     1                     0
1   c V Sehwag b Z Khan   c V Sehwag b Z Khan   c V Sehwag b Z Khan   c V Sehwag b Z Khan   c V Sehwag b Z Khan
0          Peter Siddle                    41                   100                     4                     0
1    c M Dhoni b Z Khan    c M Dhoni b Z Khan    c M Dhoni b Z Khan    c M Dhoni b Z Khan    c M Dhoni b Z Khan
0       James Pattinson                    18                    54                     2                     0
1               not out               not out               not out               not out               not out
0        Ben Hilfenhaus                    19                    32                     3                     0
1  c V Kohli b R Ashwin  c V Kohli b R Ashwin  c V Kohli b R Ashwin  c V Kohli b R Ashwin  c V Kohli b R Ashwin
0           Nathan Lyon                     6                    11                     1                     0
1            b R Ashwin            b R Ashwin            b R Ashwin            b R Ashwin            b R Ashwin
0                Bowler                     O                     M                     R                     W
1           Zaheer Khan                    31                     6                    77                     4
2         Ishant Sharma                    24                     7                    48                     0
3           Umesh Yadav                    26                     5                   106                     3
4   Ravichandran Ashwin                    29                     3                    81                     3
0                  Home           Live Scores                   NaN                   NaN                   NaN
1              Schedule                  News                   NaN                   NaN                   NaN
2            Editorials                Photos                   NaN                   NaN                   NaN
3              Archives               Players                   NaN                   NaN                   NaN
4              Rankings                Series                   NaN                   NaN                   NaN
5                  Poll                Videos                   NaN                   NaN                   NaN
6          Points Table            Contact Us                   NaN                   NaN                   NaN
7       Cricbuzz TV Ads    Careers @ Cricbuzz                   NaN                   NaN                   NaN
8           Mobile Apps    This day that year                   NaN                   NaN                   NaN
9          Wickets Zone                   NaN                   NaN                   NaN                   NaN
0           Mobile Apps       Social Channels                   NaN                   NaN                   NaN
1                iPhone              facebook                   NaN                   NaN                   NaN
2               Android               twitter                   NaN                   NaN                   NaN

请包括您的代码，并解释什么不起作用。哇，非常感谢！Pandas让事情变得如此简单，让我来探索一下，但是，下面的代码对我来说很有用：-request=requests.get（score）soup=beautifulsou（request.text，'html.parser'）ids=[“innings_1”，“innings_2”，“innings_3”，“innings_4”]//这些是我在ids中寻找的div id:s=soup.find（id=id）如果s不是None:text=s.get_text（分隔符=“”）#然后解析文本（与上面的行类似）ya。熊猫在引擎盖下使用美丽的小熊猫。这实际上取决于你需要什么。在这种特殊情况下，使用pandas然后清理表可能需要更多的工作，而不是像您刚才描述的那样直接从beautifulsoup解析。有点像用大锤敲碎坚果的比喻。它能完成任务吗？当然这是最有效的吗？不总是这样。但很高兴知道你有这个工具，以防其他网站。