python:从beautifulsoup读取数据并在dataframe中排列

python:从beautifulsoup读取数据并在dataframe中排列,python,pandas,dataframe,Python,Pandas,Dataframe,我想安排从beautifulsoup到pandas数据帧的输出 import pandas as pd import requests import bs4 import urllib, json Cik = '824142' url = 'https://api.ustals.com/v1/indicators/xbrl?indicators=EarningsPerShareDiluted,NetIncomeLoss'\ ',Revenues,ProfitLoss,Dividends

我想安排从beautifulsoup到pandas数据帧的输出

import pandas as pd
import requests
import bs4
import urllib, json

Cik = '824142'
url = 'https://api.ustals.com/v1/indicators/xbrl?indicators=EarningsPerShareDiluted,NetIncomeLoss'\
    ',Revenues,ProfitLoss,DividendsCommonStockCash,Assets,Liabilities'\
    '&frequency=q&period_type=end_date&companies={s}&token=KUNwBJE78kDQMUfoC3g'
response = requests.get(url.format(s=Cik))
page_data = bs4.BeautifulSoup(response.text, "html.parser")
print page_data
页面数据的输出

    company_id,indicator_id,2011-07-30,2011-10-29,2012-04-28,2012-07-28,2012-10-27,2013-05-04,2013-08-03,2013-11-
    02,2014-02-01,2014-05-03,2014-11-01,2015-05-02,2015-08-01,2015-10-31,2016-01-30,2016-04-30,2016-07-30,2016-10-29,2017-01-28,2017-04-29,2017-07-29,2017-10-28
    1318008,Assets,343367000,357805000,378926000,418145000,438136000,416984000,450963000,465777000,443403000,454455000,499572000,505547000,457355000,441070000,414695000,422148000,432561000,453028000,426683000,447436000,468867000,496269000
    1318008,EarningsPerShareDiluted,0.08,0.45,0.14,0.07,0.4,0.08,0.16,0.39,0.89,0.09,0.54,0.09,0.11,0.36,0.48,-0.
    08,-0.03,0.43,0.72,-0.18,-0.02,0.48

1318008,Liabilities,106880000,106092000,98507000,135708000,137777000,115743000,141548000,140583000,107749000,
    130316000,155372000,141121000,152237000,141540000,117738000,132848000,152314000,163597000,119632000,141867000
    ,154362000,169686000        1318008,NetIncomeLoss,2591000,14137000,4527000,2086000,12667000,2498000,4739000,11860000,26851000,2496000,157
    27000,2770000,3213000,9653000,13149000,-2137000,-838000,10695000,18184000,-4448000,-608000,11922000

如何将其安排到一个整洁的数据框中?日期作为一个数据帧,资产作为一个数据帧,负债作为一个数据帧等等。

我认为您需要类似注释中提到的@MaxU这样的解决方案,但也需要将第一列和第二列设置为多索引:

也可以进行小数据清理-从第二列创建索引,删除重复的第一列并转置:

df = pd.read_csv(url.format(s=Cik), index_col=[1]).iloc[:, 1:].T
print (df)

indicator_id       Assets  DividendsCommonStockCash  EarningsPerShareDiluted  \
2011-06-30    186360000.0                       NaN                     0.15   
2011-09-30    182254000.0                       NaN                     0.23   
2012-03-31    184765000.0                       NaN                     0.18   
2012-06-30    203554000.0                       NaN                     0.38   
2012-09-30    196254000.0                       NaN                     0.24   
2012-12-31    193493000.0                       NaN                     0.31   
2013-03-31    194473000.0                       NaN                     0.29   
2013-06-30    221214000.0                       NaN                     0.33   
2013-09-30    220138000.0                       NaN                     0.28   
2013-12-31    215444000.0                       NaN                     0.11   
2014-03-31    228719000.0                       NaN                     0.26   
2014-06-30    241652000.0                       NaN                     0.20   
2014-09-30    247509000.0                       NaN                     0.22   
2014-12-31    233117000.0                       NaN                     0.12   
2015-03-31    236759000.0                       NaN                     0.15   
2015-06-30    250012000.0                       NaN                     0.20   
2015-09-30    255098000.0                       NaN                     0.24   
2015-12-31    232854000.0                       NaN                     0.25   
2016-03-31    236669000.0                       0.0                     0.20   
2016-06-30    257527000.0                       NaN                     0.27   
2016-09-30    257277000.0                       NaN                     0.29   
2016-12-31    256530000.0                       NaN                     0.24   
2017-03-31    265283000.0                       NaN                     0.19   
2017-06-30    285011000.0                       NaN                     0.26   
2017-09-30    303138000.0                       NaN                     0.28   

indicator_id  NetIncomeLoss  
2011-06-30        3839000.0  
2011-09-30        5626000.0  
2012-03-31        4567000.0  
2012-06-30        9297000.0  
2012-09-30        6007000.0  
2012-12-31        7578000.0  
2013-03-31        7140000.0  
2013-06-30       12119000.0  
2013-09-30       10522000.0  
2013-12-31        7766000.0  
2014-03-31        9822000.0  
2014-06-30       11363000.0  
2014-09-30       12440000.0  
2014-12-31       10533000.0  
2015-03-31        8399000.0  
2015-06-30       11130000.0  
2015-09-30       13251000.0  
2015-12-31       12948000.0  
2016-03-31       10806000.0  
2016-06-30       14341000.0  
2016-09-30       15682000.0  
2016-12-31       12547000.0  
2017-03-31       10217000.0  
2017-06-30       13794000.0  
2017-09-30       14717000.0  

不要低估Pandas-pd.read_csvurl.formats=Cik会成功-@MaxU最终你甚至可以使用pd.read_html@user32185,这太过分了。我们需要pd.read\u html来解析html表。。。
df = pd.read_csv(url.format(s=Cik), index_col=[1]).iloc[:, 1:].T
print (df)

indicator_id       Assets  DividendsCommonStockCash  EarningsPerShareDiluted  \
2011-06-30    186360000.0                       NaN                     0.15   
2011-09-30    182254000.0                       NaN                     0.23   
2012-03-31    184765000.0                       NaN                     0.18   
2012-06-30    203554000.0                       NaN                     0.38   
2012-09-30    196254000.0                       NaN                     0.24   
2012-12-31    193493000.0                       NaN                     0.31   
2013-03-31    194473000.0                       NaN                     0.29   
2013-06-30    221214000.0                       NaN                     0.33   
2013-09-30    220138000.0                       NaN                     0.28   
2013-12-31    215444000.0                       NaN                     0.11   
2014-03-31    228719000.0                       NaN                     0.26   
2014-06-30    241652000.0                       NaN                     0.20   
2014-09-30    247509000.0                       NaN                     0.22   
2014-12-31    233117000.0                       NaN                     0.12   
2015-03-31    236759000.0                       NaN                     0.15   
2015-06-30    250012000.0                       NaN                     0.20   
2015-09-30    255098000.0                       NaN                     0.24   
2015-12-31    232854000.0                       NaN                     0.25   
2016-03-31    236669000.0                       0.0                     0.20   
2016-06-30    257527000.0                       NaN                     0.27   
2016-09-30    257277000.0                       NaN                     0.29   
2016-12-31    256530000.0                       NaN                     0.24   
2017-03-31    265283000.0                       NaN                     0.19   
2017-06-30    285011000.0                       NaN                     0.26   
2017-09-30    303138000.0                       NaN                     0.28   

indicator_id  NetIncomeLoss  
2011-06-30        3839000.0  
2011-09-30        5626000.0  
2012-03-31        4567000.0  
2012-06-30        9297000.0  
2012-09-30        6007000.0  
2012-12-31        7578000.0  
2013-03-31        7140000.0  
2013-06-30       12119000.0  
2013-09-30       10522000.0  
2013-12-31        7766000.0  
2014-03-31        9822000.0  
2014-06-30       11363000.0  
2014-09-30       12440000.0  
2014-12-31       10533000.0  
2015-03-31        8399000.0  
2015-06-30       11130000.0  
2015-09-30       13251000.0  
2015-12-31       12948000.0  
2016-03-31       10806000.0  
2016-06-30       14341000.0  
2016-09-30       15682000.0  
2016-12-31       12547000.0  
2017-03-31       10217000.0  
2017-06-30       13794000.0  
2017-09-30       14717000.0