Python 从BeautifulSoup输出的格式
通过阅读BeautifulSoup文档,我成功地编写了一个简短的python脚本来刮取一个表并将其打印出来,但是我不知道如何将其格式化为一个表。最终目标是从网站上获取足球比赛预测:并将其保存到文本文件中 以下是我迄今为止编写的代码:Python 从BeautifulSoup输出的格式,python,python-3.x,beautifulsoup,Python,Python 3.x,Beautifulsoup,通过阅读BeautifulSoup文档,我成功地编写了一个简短的python脚本来刮取一个表并将其打印出来,但是我不知道如何将其格式化为一个表。最终目标是从网站上获取足球比赛预测:并将其保存到文本文件中 以下是我迄今为止编写的代码: import urllib import urllib.request from bs4 import BeautifulSoup def make_soup(url): thepage = urllib.request.urlopen(url)
import urllib
import urllib.request
from bs4 import BeautifulSoup
def make_soup(url):
thepage = urllib.request.urlopen(url)
soupdata = BeautifulSoup(thepage, "html.parser")
return soupdata
soup = make_soup("https://afootballreport.com/predictions/over-1.5-goals/")
for record in soup.findAll('tr'):
for data in record.findAll('td'):
print(data.text.strip())
这是输出:
03/28
17:30
Iceland Reykjavik Youth Cup
Fjölnir / Vængir U19
Valur / KH U19
Over 1.5
Valur / KH U19 have over 1.5 goals in 100% of their games in the last 2 months (total games 6).
03/28
17:30
Saudi Arabia Pro League
Al Ittifaq
Al Quadisiya
Over 1.5
Al Ittifaq have over 1.5 goals in 100% of their games in the last 2 months (total games 8).
我想让它每行都有一列:日期、时间、足球联赛、主队、AwayTeam、提示、描述。
像这样:
Date, Time, Football League, HomeTeam, AwayTeam, Tip, Description
03/28, 17:30, Iceland Reykjavik Youth Cup, Fjölnir / Vængir U19, Valur / KH U19, Over 1.5, Valur / KH U19 have over 1.5 goals in 100% of their games in the last 2 months (total games 6).
有人能帮我吗?你做了很多工作。每当我看到
标记时,我都会首先尝试pandas的.read\u html()
。它将为您完成大部分工作,然后您可以根据需要操作数据帧
import pandas as pd
tables = pd.read_html('https://afootballreport.com/predictions/over-1.5-goals/')
table = tables[0]
table[['Date', 'Time']] = table['Home team - Away team'].str.split(' ', expand=True)
table = table.drop(['Home team - Away team'],axis=1)
table = table.rename(columns={"Unnamed: 3":"Description"})
table[['Football League', 'Home Team', 'Away Team']] = table['Tip'].str.split(' ', expand=True)
table = table.drop(['Tip'],axis=1)
输出:
print (table.head(5).to_string())
Logic Description Date Time Football League Home Team Away Team
0 Over 1.5 Valur / KH U19 have over 1.5 goals in 100% of ... 03/28 17:30 Iceland Reykjavik Youth Cup Fjölnir / Vængir U19 Valur / KH U19
1 Over 1.5 Al Ittifaq have over 1.5 goals in 100% of thei... 03/28 17:30 Saudi Arabia Pro League Al Ittifaq Al Quadisiya
2 Over 1.5 Sarreguemines have over 1.5 goals in 100% of t... 03/28 19:00 France National 3 Sarreguemines Strasbourg II
3 Over 1.5 Mons Calpe have over 1.5 goals in 100% of thei... 03/28 19:29 Gibraltar Premier Division Mons Calpe Glacis United
4 Over 1.5 Glacis United have over 1.5 goals in 100% of t... 03/28 19:29 Gibraltar Premier Division Mons Calpe Glacis United
编辑:
print (table.head(5).to_string())
Logic Description Date Time Football League Home Team Away Team
0 Over 1.5 Valur / KH U19 have over 1.5 goals in 100% of ... 03/28 17:30 Iceland Reykjavik Youth Cup Fjölnir / Vængir U19 Valur / KH U19
1 Over 1.5 Al Ittifaq have over 1.5 goals in 100% of thei... 03/28 17:30 Saudi Arabia Pro League Al Ittifaq Al Quadisiya
2 Over 1.5 Sarreguemines have over 1.5 goals in 100% of t... 03/28 19:00 France National 3 Sarreguemines Strasbourg II
3 Over 1.5 Mons Calpe have over 1.5 goals in 100% of thei... 03/28 19:29 Gibraltar Premier Division Mons Calpe Glacis United
4 Over 1.5 Glacis United have over 1.5 goals in 100% of t... 03/28 19:29 Gibraltar Premier Division Mons Calpe Glacis United
如果您使用的是Pandas版本0.24.2
import pandas as pd
tables = pd.read_html('https://afootballreport.com/predictions/over-1.5-goals/')
table = tables[0]
table[['Date', 'Time']] = table['Home team - Away team'].str.split(' ', expand=True)
table = table.drop(['Home team - Away team'],axis=1)
table = table.rename(columns={"Logic":"Description"})
table[['Football League', 'Home Team', 'Away Team']] = table['Home team - Away team.1'].str.split(' ', expand=True)
table = table.drop(['Home team - Away team.1'],axis=1)
看看pprint。我喜欢叫它“漂亮的打印机”。谢谢你,看起来容易多了。然而,主队和客队栏没有显示?如果它显示出它将是完美的哦,是的。我没注意到。我会看看我能不能修好它实际上它就在那里,只是补偿一下。我会把它打印出来,在上面的编辑中给你看如果你给我几分钟的时间,我会在你想要的时候修正表格以正确地保存数据太好了,谢谢,然后我该如何命名这些列?重命名为Date,Time,Football League,homesteam,AwayTeam,Tip,Logic?我希望能够在另一个脚本中使用数据,因此必须能够通过hometeam和awayteam进行搜索