Python 正在删除EDGAR HTML文件并希望转换为dataframe
我不太会刮网站Python 正在删除EDGAR HTML文件并希望转换为dataframe,python,html,web-scraping,Python,Html,Web Scraping,我不太会刮网站 url = 'https://www.sec.gov/Archives/edgar/data/1383094/000095013120003579/d33910dex991.htm' df = pd.read_html(url, parse_dates=[0])[0] print (df.head()) 这是我的代码,我想从这个网站上提取所有数据,但结果总是第一个“正文” 0 1 2 3 4 0
url = 'https://www.sec.gov/Archives/edgar/data/1383094/000095013120003579/d33910dex991.htm'
df = pd.read_html(url, parse_dates=[0])[0]
print (df.head())
这是我的代码,我想从这个网站上提取所有数据,但结果总是第一个“正文”
0 1 2 3 4
0 NaN NaN NaN NaN NaN
1 Collection Period Beginning: NaN NaN 08/01/2020 NaN
2 Collection Period Ending: NaN NaN 08/31/2020 NaN
3 Previous Payment/Close Date: NaN NaN 08/17/2020 NaN
4 Payment Date NaN NaN 09/15/2020 NaN
如何获取其余的表?
pd.read\u html
返回所有表的列表。您只是在读取初始表,因此它会给您一个df
尝试:
。。以此类推,读取索引处的所有df。df保存列表,您可以访问每个索引处的列表元素
df = pd.read_html(url, parse_dates=[0])
df1 = df[0]
df2 = df[1]