Python 将网站表格转换为熊猫df(beautifulsoup不识别表格)
我想将网站表转换为pandas df,但Python 将网站表格转换为熊猫df(beautifulsoup不识别表格),python,pandas,beautifulsoup,Python,Pandas,Beautifulsoup,我想将网站表转换为pandas df,但BeautifulSoup无法识别该表(下图截图)。下面是我在没有运气的情况下尝试的代码 我也尝试了下面的代码,但没有成功 df = pd.read_html('https://www.ndbc.noaa.gov/ship_obs.php') print(df) 您的表不在标记中,而是在多个标记中 您可以将其解析为数据帧,如下所示: import pandas as pd import requests import bs4 url = f"
BeautifulSoup
无法识别该表(下图截图)。下面是我在没有运气的情况下尝试的代码
我也尝试了下面的代码,但没有成功
df = pd.read_html('https://www.ndbc.noaa.gov/ship_obs.php')
print(df)
您的表不在
标记中,而是在多个
标记中
您可以将其解析为数据帧,如下所示:
import pandas as pd
import requests
import bs4
url = f"https://www.ndbc.noaa.gov/ship_obs.php"
soup = bs4.BeautifulSoup(requests.get(url).text, 'html.parser').find('pre').find_all("span")
print(pd.DataFrame([r.getText().split() for r in soup]))
输出:
0 1 2 3 4 5 ... 40 41 42 43 44 45
0 SHIP HOUR LAT LON WDIR WSPD ... °T ft sec °T Acc Ice
1 SHIP 19 46.5 -72.3 260 5.1 ... None None None None None None
2 SHIP 19 46.8 -71.2 110 2.9 ... None None None None None None
3 SHIP 19 47.4 -61.8 40 18.1 ... None None None None None None
4 SHIP 19 47.7 -53.2 40 8.0 ... None None None None None None
.. ... ... ... ... ... ... ... ... ... ... ... ... ...
170 SHIP 19 17.6 -62.4 100 20.0 ... None None None None None None
171 SHIP 19 25.8 -78.0 40 24.1 ... None None None None None None
172 SHIP 19 1.5 104.8 20 22.0 ... None None None None None None
173 SHIP 19 57.9 1.2 180 - ... None None None None None None
174 SHIP 19 35.1 -10.0 310 24.1 ... None None None None None None
[175 rows x 46 columns]
方法稍有不同,也可以查看列计数。我跳过了顶部的行,因此您必须构建列标题并清理最后一行
import io
url = 'https://www.ndbc.noaa.gov/ship_obs.php'
page = requests.get(url)
soup = BeautifulSoup(page.text, "html.parser")
tablecontent = soup.find('pre')
data = BeautifulSoup(tablecontent.text, "html.parser")
s = io.StringIO(data.text)
df = pd.read_csv(s, sep='\s+', engine='python', skiprows=3, header=None)
输出(很抱歉,从jupyter中复制不正确)
好。数据不存储在表中。这是一堆标签。我想这对你有帮助。
import io
url = 'https://www.ndbc.noaa.gov/ship_obs.php'
page = requests.get(url)
soup = BeautifulSoup(page.text, "html.parser")
tablecontent = soup.find('pre')
data = BeautifulSoup(tablecontent.text, "html.parser")
s = io.StringIO(data.text)
df = pd.read_csv(s, sep='\s+', engine='python', skiprows=3, header=None)
0 1 2 3 4 5 6 7 8 9 ... 14 15 16 17 18 19 20 21 22 23
0 SHIP 19 47.4 -61.8 40 18.1 - - - 29.82 ... - - - - - - - - ---- -----
1 SHIP 19 47.7 -53.2 40 8.0 - - - 29.76 ... - - - - - - - - ---- -----
2 SHIP 19 47.8 -54.1 50 13.0 - - - 29.75 ... - - - - - - - - ---- -----
3 SHIP 19 48.2 -53.4 50 13.0 - - - 29.78 ... - - - - - - - - ---- -----
4 SHIP 19 46.8 -71.2 110 2.9 - - - 30.03 ... - - - - - - - - ---- -----
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
178 SHIP 19 25.8 -78.0 40 24.1 - 4.9 4.0 30.08 ... 11 5 - - - - - - ---- -----
179 SHIP 19 1.5 104.8 20 22.0 - - - 29.87 ... 11 5 - - - - - - ---- -----
180 SHIP 19 57.9 1.2 180 - - - - 29.35 ... 5 - - - - - - - ---- -----
181 SHIP 19 35.1 -10.0 310 24.1 - 6.6 6.0 29.68 ... 5 8 14.8 10.0 310 - - - ---- -----
182 182 ship observations reported for 1900 GMT None None None ... None None None None None None None None None None