Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/340.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 将网站表格转换为熊猫df(beautifulsoup不识别表格)_Python_Pandas_Beautifulsoup - Fatal编程技术网

Python 将网站表格转换为熊猫df(beautifulsoup不识别表格)

Python 将网站表格转换为熊猫df(beautifulsoup不识别表格),python,pandas,beautifulsoup,Python,Pandas,Beautifulsoup,我想将网站表转换为pandas df,但BeautifulSoup无法识别该表(下图截图)。下面是我在没有运气的情况下尝试的代码 我也尝试了下面的代码,但没有成功 df = pd.read_html('https://www.ndbc.noaa.gov/ship_obs.php') print(df) 您的表不在标记中,而是在多个标记中 您可以将其解析为数据帧,如下所示: import pandas as pd import requests import bs4 url = f"

我想将网站表转换为pandas df,但
BeautifulSoup
无法识别该表(下图截图)。下面是我在没有运气的情况下尝试的代码

我也尝试了下面的代码,但没有成功

df = pd.read_html('https://www.ndbc.noaa.gov/ship_obs.php')
print(df)

您的表不在
标记中,而是在多个
标记中

您可以将其解析为数据帧,如下所示:

import pandas as pd
import requests
import bs4

url = f"https://www.ndbc.noaa.gov/ship_obs.php"
soup = bs4.BeautifulSoup(requests.get(url).text, 'html.parser').find('pre').find_all("span")
print(pd.DataFrame([r.getText().split() for r in soup]))
输出:

      0     1     2      3     4     5   ...    40    41    42    43    44    45
0    SHIP  HOUR   LAT    LON  WDIR  WSPD  ...    °T    ft   sec    °T   Acc   Ice
1    SHIP    19  46.5  -72.3   260   5.1  ...  None  None  None  None  None  None
2    SHIP    19  46.8  -71.2   110   2.9  ...  None  None  None  None  None  None
3    SHIP    19  47.4  -61.8    40  18.1  ...  None  None  None  None  None  None
4    SHIP    19  47.7  -53.2    40   8.0  ...  None  None  None  None  None  None
..    ...   ...   ...    ...   ...   ...  ...   ...   ...   ...   ...   ...   ...
170  SHIP    19  17.6  -62.4   100  20.0  ...  None  None  None  None  None  None
171  SHIP    19  25.8  -78.0    40  24.1  ...  None  None  None  None  None  None
172  SHIP    19   1.5  104.8    20  22.0  ...  None  None  None  None  None  None
173  SHIP    19  57.9    1.2   180     -  ...  None  None  None  None  None  None
174  SHIP    19  35.1  -10.0   310  24.1  ...  None  None  None  None  None  None

[175 rows x 46 columns]

方法稍有不同,也可以查看列计数。我跳过了顶部的行,因此您必须构建列标题并清理最后一行

import io
url = 'https://www.ndbc.noaa.gov/ship_obs.php'
page = requests.get(url)
soup = BeautifulSoup(page.text, "html.parser")
tablecontent = soup.find('pre')
data = BeautifulSoup(tablecontent.text, "html.parser")
s = io.StringIO(data.text)
df = pd.read_csv(s, sep='\s+', engine='python', skiprows=3, header=None)
输出(很抱歉,从jupyter中复制不正确)


好。数据不存储在表中。这是一堆标签。我想这对你有帮助。
import io
url = 'https://www.ndbc.noaa.gov/ship_obs.php'
page = requests.get(url)
soup = BeautifulSoup(page.text, "html.parser")
tablecontent = soup.find('pre')
data = BeautifulSoup(tablecontent.text, "html.parser")
s = io.StringIO(data.text)
df = pd.read_csv(s, sep='\s+', engine='python', skiprows=3, header=None)
    0   1   2   3   4   5   6   7   8   9   ... 14  15  16  17  18  19  20  21  22  23
0   SHIP    19  47.4    -61.8   40  18.1    -   -   -   29.82   ... -   -   -   -   -   -   -   -   ----    -----
1   SHIP    19  47.7    -53.2   40  8.0 -   -   -   29.76   ... -   -   -   -   -   -   -   -   ----    -----
2   SHIP    19  47.8    -54.1   50  13.0    -   -   -   29.75   ... -   -   -   -   -   -   -   -   ----    -----
3   SHIP    19  48.2    -53.4   50  13.0    -   -   -   29.78   ... -   -   -   -   -   -   -   -   ----    -----
4   SHIP    19  46.8    -71.2   110 2.9 -   -   -   30.03   ... -   -   -   -   -   -   -   -   ----    -----
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
178 SHIP    19  25.8    -78.0   40  24.1    -   4.9 4.0 30.08   ... 11  5   -   -   -   -   -   -   ----    -----
179 SHIP    19  1.5 104.8   20  22.0    -   -   -   29.87   ... 11  5   -   -   -   -   -   -   ----    -----
180 SHIP    19  57.9    1.2 180 -   -   -   -   29.35   ... 5   -   -   -   -   -   -   -   ----    -----
181 SHIP    19  35.1    -10.0   310 24.1    -   6.6 6.0 29.68   ... 5   8   14.8    10.0    310 -   -   -   ----    -----
182 182 ship    observations    reported    for 1900    GMT None    None    None    ... None    None    None    None    None    None    None    None    None    None