Python 从网页中抓取数据
我正试图从以下网页中获取数据 我需要一个表格格式的记分牌。有人能帮我吗?我在用蟒蛇3。我不熟悉网页抓取,也不太熟悉网页的内部结构。 提前谢谢Python 从网页中抓取数据,python,web-scraping,Python,Web Scraping,我正试图从以下网页中获取数据 我需要一个表格格式的记分牌。有人能帮我吗?我在用蟒蛇3。我不熟悉网页抓取,也不太熟悉网页的内部结构。 提前谢谢 我尝试将BeautifulSoup与urllib2等一起使用,但没有达到任何效果。您可以使用pandas的read\u html()。这将返回数据帧列表。你从那里用它做什么取决于你自己。你们可能需要整理一下数据,但我只是把它们放进一张大桌子里给你们看 import pandas as pd url = 'https://m.cricbuzz.com/li
我尝试将BeautifulSoup与urllib2等一起使用,但没有达到任何效果。您可以使用pandas的
read\u html()
。这将返回数据帧列表。你从那里用它做什么取决于你自己。你们可能需要整理一下数据,但我只是把它们放进一张大桌子里给你们看
import pandas as pd
url = 'https://m.cricbuzz.com/live-cricket-scorecard/10711/aus-vs-ind-1st-test-india-in-australia-test-series-2011-12'
dfs = pd.read_html(url)
result = pd.concat( [ df for df in dfs ] )
输出:
print (result.to_string())
0 1 2 3 4
0 Batting R B 4s 6s
0 Ed Cowan 68 177 7 0
1 c M Dhoni b R Ashwin c M Dhoni b R Ashwin c M Dhoni b R Ashwin c M Dhoni b R Ashwin c M Dhoni b R Ashwin
0 David Warner 37 49 4 1
1 c M Dhoni b U Yadav c M Dhoni b U Yadav c M Dhoni b U Yadav c M Dhoni b U Yadav c M Dhoni b U Yadav
0 Shaun Marsh 0 6 0 0
1 c V Kohli b U Yadav c V Kohli b U Yadav c V Kohli b U Yadav c V Kohli b U Yadav c V Kohli b U Yadav
0 Ricky Ponting 62 94 6 0
1 c V Laxman b U Yadav c V Laxman b U Yadav c V Laxman b U Yadav c V Laxman b U Yadav c V Laxman b U Yadav
0 Michael Clarke 31 68 5 0
1 b Z Khan b Z Khan b Z Khan b Z Khan b Z Khan
0 Michael Hussey 0 1 0 0
1 c M Dhoni b Z Khan c M Dhoni b Z Khan c M Dhoni b Z Khan c M Dhoni b Z Khan c M Dhoni b Z Khan
0 Brad Haddin 27 69 1 0
1 c V Sehwag b Z Khan c V Sehwag b Z Khan c V Sehwag b Z Khan c V Sehwag b Z Khan c V Sehwag b Z Khan
0 Peter Siddle 41 100 4 0
1 c M Dhoni b Z Khan c M Dhoni b Z Khan c M Dhoni b Z Khan c M Dhoni b Z Khan c M Dhoni b Z Khan
0 James Pattinson 18 54 2 0
1 not out not out not out not out not out
0 Ben Hilfenhaus 19 32 3 0
1 c V Kohli b R Ashwin c V Kohli b R Ashwin c V Kohli b R Ashwin c V Kohli b R Ashwin c V Kohli b R Ashwin
0 Nathan Lyon 6 11 1 0
1 b R Ashwin b R Ashwin b R Ashwin b R Ashwin b R Ashwin
0 Bowler O M R W
1 Zaheer Khan 31 6 77 4
2 Ishant Sharma 24 7 48 0
3 Umesh Yadav 26 5 106 3
4 Ravichandran Ashwin 29 3 81 3
0 Home Live Scores NaN NaN NaN
1 Schedule News NaN NaN NaN
2 Editorials Photos NaN NaN NaN
3 Archives Players NaN NaN NaN
4 Rankings Series NaN NaN NaN
5 Poll Videos NaN NaN NaN
6 Points Table Contact Us NaN NaN NaN
7 Cricbuzz TV Ads Careers @ Cricbuzz NaN NaN NaN
8 Mobile Apps This day that year NaN NaN NaN
9 Wickets Zone NaN NaN NaN NaN
0 Mobile Apps Social Channels NaN NaN NaN
1 iPhone facebook NaN NaN NaN
2 Android twitter NaN NaN NaN
请包括您的代码,并解释什么不起作用。哇,非常感谢!Pandas让事情变得如此简单,让我来探索一下,但是,下面的代码对我来说很有用:-request=requests.get(score)soup=beautifulsou(request.text,'html.parser')ids=[“innings_1”,“innings_2”,“innings_3”,“innings_4”]//这些是我在ids中寻找的div id:s=soup.find(id=id)如果s不是None:text=s.get_text(分隔符=“”)#然后解析文本(与上面的行类似)ya。熊猫在引擎盖下使用美丽的小熊猫。这实际上取决于你需要什么。在这种特殊情况下,使用pandas然后清理表可能需要更多的工作,而不是像您刚才描述的那样直接从beautifulsoup解析。有点像用大锤敲碎坚果的比喻。它能完成任务吗?当然这是最有效的吗?不总是这样。但很高兴知道你有这个工具,以防其他网站。