python:从beautifulsoup读取数据并在dataframe中排列
我想安排从beautifulsoup到pandas数据帧的输出python:从beautifulsoup读取数据并在dataframe中排列,python,pandas,dataframe,Python,Pandas,Dataframe,我想安排从beautifulsoup到pandas数据帧的输出 import pandas as pd import requests import bs4 import urllib, json Cik = '824142' url = 'https://api.ustals.com/v1/indicators/xbrl?indicators=EarningsPerShareDiluted,NetIncomeLoss'\ ',Revenues,ProfitLoss,Dividends
import pandas as pd
import requests
import bs4
import urllib, json
Cik = '824142'
url = 'https://api.ustals.com/v1/indicators/xbrl?indicators=EarningsPerShareDiluted,NetIncomeLoss'\
',Revenues,ProfitLoss,DividendsCommonStockCash,Assets,Liabilities'\
'&frequency=q&period_type=end_date&companies={s}&token=KUNwBJE78kDQMUfoC3g'
response = requests.get(url.format(s=Cik))
page_data = bs4.BeautifulSoup(response.text, "html.parser")
print page_data
页面数据的输出
company_id,indicator_id,2011-07-30,2011-10-29,2012-04-28,2012-07-28,2012-10-27,2013-05-04,2013-08-03,2013-11-
02,2014-02-01,2014-05-03,2014-11-01,2015-05-02,2015-08-01,2015-10-31,2016-01-30,2016-04-30,2016-07-30,2016-10-29,2017-01-28,2017-04-29,2017-07-29,2017-10-28
1318008,Assets,343367000,357805000,378926000,418145000,438136000,416984000,450963000,465777000,443403000,454455000,499572000,505547000,457355000,441070000,414695000,422148000,432561000,453028000,426683000,447436000,468867000,496269000
1318008,EarningsPerShareDiluted,0.08,0.45,0.14,0.07,0.4,0.08,0.16,0.39,0.89,0.09,0.54,0.09,0.11,0.36,0.48,-0.
08,-0.03,0.43,0.72,-0.18,-0.02,0.48
1318008,Liabilities,106880000,106092000,98507000,135708000,137777000,115743000,141548000,140583000,107749000,
130316000,155372000,141121000,152237000,141540000,117738000,132848000,152314000,163597000,119632000,141867000
,154362000,169686000 1318008,NetIncomeLoss,2591000,14137000,4527000,2086000,12667000,2498000,4739000,11860000,26851000,2496000,157
27000,2770000,3213000,9653000,13149000,-2137000,-838000,10695000,18184000,-4448000,-608000,11922000
如何将其安排到一个整洁的数据框中?日期作为一个数据帧,资产作为一个数据帧,负债作为一个数据帧等等。我认为您需要类似注释中提到的@MaxU这样的解决方案,但也需要将第一列和第二列设置为多索引: 也可以进行小数据清理-从第二列创建索引,删除重复的第一列并转置:
df = pd.read_csv(url.format(s=Cik), index_col=[1]).iloc[:, 1:].T
print (df)
indicator_id Assets DividendsCommonStockCash EarningsPerShareDiluted \
2011-06-30 186360000.0 NaN 0.15
2011-09-30 182254000.0 NaN 0.23
2012-03-31 184765000.0 NaN 0.18
2012-06-30 203554000.0 NaN 0.38
2012-09-30 196254000.0 NaN 0.24
2012-12-31 193493000.0 NaN 0.31
2013-03-31 194473000.0 NaN 0.29
2013-06-30 221214000.0 NaN 0.33
2013-09-30 220138000.0 NaN 0.28
2013-12-31 215444000.0 NaN 0.11
2014-03-31 228719000.0 NaN 0.26
2014-06-30 241652000.0 NaN 0.20
2014-09-30 247509000.0 NaN 0.22
2014-12-31 233117000.0 NaN 0.12
2015-03-31 236759000.0 NaN 0.15
2015-06-30 250012000.0 NaN 0.20
2015-09-30 255098000.0 NaN 0.24
2015-12-31 232854000.0 NaN 0.25
2016-03-31 236669000.0 0.0 0.20
2016-06-30 257527000.0 NaN 0.27
2016-09-30 257277000.0 NaN 0.29
2016-12-31 256530000.0 NaN 0.24
2017-03-31 265283000.0 NaN 0.19
2017-06-30 285011000.0 NaN 0.26
2017-09-30 303138000.0 NaN 0.28
indicator_id NetIncomeLoss
2011-06-30 3839000.0
2011-09-30 5626000.0
2012-03-31 4567000.0
2012-06-30 9297000.0
2012-09-30 6007000.0
2012-12-31 7578000.0
2013-03-31 7140000.0
2013-06-30 12119000.0
2013-09-30 10522000.0
2013-12-31 7766000.0
2014-03-31 9822000.0
2014-06-30 11363000.0
2014-09-30 12440000.0
2014-12-31 10533000.0
2015-03-31 8399000.0
2015-06-30 11130000.0
2015-09-30 13251000.0
2015-12-31 12948000.0
2016-03-31 10806000.0
2016-06-30 14341000.0
2016-09-30 15682000.0
2016-12-31 12547000.0
2017-03-31 10217000.0
2017-06-30 13794000.0
2017-09-30 14717000.0
不要低估Pandas-pd.read_csvurl.formats=Cik会成功-@MaxU最终你甚至可以使用pd.read_html@user32185,这太过分了。我们需要pd.read\u html来解析html表。。。
df = pd.read_csv(url.format(s=Cik), index_col=[1]).iloc[:, 1:].T
print (df)
indicator_id Assets DividendsCommonStockCash EarningsPerShareDiluted \
2011-06-30 186360000.0 NaN 0.15
2011-09-30 182254000.0 NaN 0.23
2012-03-31 184765000.0 NaN 0.18
2012-06-30 203554000.0 NaN 0.38
2012-09-30 196254000.0 NaN 0.24
2012-12-31 193493000.0 NaN 0.31
2013-03-31 194473000.0 NaN 0.29
2013-06-30 221214000.0 NaN 0.33
2013-09-30 220138000.0 NaN 0.28
2013-12-31 215444000.0 NaN 0.11
2014-03-31 228719000.0 NaN 0.26
2014-06-30 241652000.0 NaN 0.20
2014-09-30 247509000.0 NaN 0.22
2014-12-31 233117000.0 NaN 0.12
2015-03-31 236759000.0 NaN 0.15
2015-06-30 250012000.0 NaN 0.20
2015-09-30 255098000.0 NaN 0.24
2015-12-31 232854000.0 NaN 0.25
2016-03-31 236669000.0 0.0 0.20
2016-06-30 257527000.0 NaN 0.27
2016-09-30 257277000.0 NaN 0.29
2016-12-31 256530000.0 NaN 0.24
2017-03-31 265283000.0 NaN 0.19
2017-06-30 285011000.0 NaN 0.26
2017-09-30 303138000.0 NaN 0.28
indicator_id NetIncomeLoss
2011-06-30 3839000.0
2011-09-30 5626000.0
2012-03-31 4567000.0
2012-06-30 9297000.0
2012-09-30 6007000.0
2012-12-31 7578000.0
2013-03-31 7140000.0
2013-06-30 12119000.0
2013-09-30 10522000.0
2013-12-31 7766000.0
2014-03-31 9822000.0
2014-06-30 11363000.0
2014-09-30 12440000.0
2014-12-31 10533000.0
2015-03-31 8399000.0
2015-06-30 11130000.0
2015-09-30 13251000.0
2015-12-31 12948000.0
2016-03-31 10806000.0
2016-06-30 14341000.0
2016-09-30 15682000.0
2016-12-31 12547000.0
2017-03-31 10217000.0
2017-06-30 13794000.0
2017-09-30 14717000.0