Python 如何存储熊猫数据框列表以便于访问_Python_List_Pandas_Dataframe

Python 如何存储熊猫数据框列表以便于访问

python list pandas dataframe

Python 如何存储熊猫数据框列表以便于访问,python,list,pandas,dataframe,Python,List,Pandas,Dataframe,我有一个数据框列表 df1 = Stock Year Profit CountPercent AAPL 2012 1 38.77 AAPL 2013 1 33.33 df2 = Stock Year Profit CountPercent GOOG 2012 1 43.47 GOOG 2013 1 32.35 df3 = Stoc

我有一个数据框列表

df1 = 
    Stock  Year   Profit  CountPercent
     AAPL  2012    1       38.77
     AAPL  2013    1       33.33
df2 = 
    Stock  Year   Profit  CountPercent
    GOOG   2012    1       43.47
    GOOG   2013    1       32.35

df3 = 
    Stock  Year   Profit  CountPercent
    ABC   2012    1       40.00
    ABC   2013    1       32.35

函数的输出是[

df1，df2，df3，…

]这样的，数据框中的所有列将相同，但行将不同

如何将这些内容存储在硬盘中，并以最快速有效的方式再次以列表的形式检索？

如果列

Stock

中的值相同，您可以通过删除此列并使用

dict comprehension

（键是每个

df

中列

Stock

的第一个值）：

对于存储在

磁盘中

我认为最好的方法是使用

如果每个

堆栈

列中的值相同，则可以将所有

df

都存储起来：

df = pd.concat([df1.set_index('Stock'), df2.set_index('Stock'), df3.set_index('Stock')])
print (df)
       Year  Profit  CountPercent
Stock                            
AAPL   2012       1         38.77
AAPL   2013       1         33.33
GOOG   2012       1         43.47
GOOG   2013       1         32.35
ABC    2012       1         40.00
ABC    2013       1         32.35

store = pd.HDFStore('store.h5')
store['df'] = df
print (store)
<class 'pandas.io.pytables.HDFStore'>
File path: store.h5
/df            frame        (shape->[1,4])

df=pd.concat（[df1.集合指数（'Stock'）、df2.集合指数（'Stock'）、df3.集合指数（'Stock'））
打印（df）
年利润百分比
股票
AAPL 2012 138.77
AAPL 2013 1 33.33
GOOG 2012 143.47
GOOG 2013 132.35
ABC 2012 140.00
ABC 2013 1 32.35
store=pd.HDFStore（'store.h5'）
存储['df']=df
印刷品（商店）
文件路径：store.h5
/df框架（形状->[1,4]）

如果

Stock

列中的值相同，您可以通过删除此列并使用

dict comprehension

（键是每个

df

中的

Stock

列的第一个值）：

对于存储在

磁盘中

我认为最好的方法是使用

如果每个

堆栈

列中的值相同，则可以将所有

df

都存储起来：

df = pd.concat([df1.set_index('Stock'), df2.set_index('Stock'), df3.set_index('Stock')])
print (df)
       Year  Profit  CountPercent
Stock                            
AAPL   2012       1         38.77
AAPL   2013       1         33.33
GOOG   2012       1         43.47
GOOG   2013       1         32.35
ABC    2012       1         40.00
ABC    2013       1         32.35

store = pd.HDFStore('store.h5')
store['df'] = df
print (store)
<class 'pandas.io.pytables.HDFStore'>
File path: store.h5
/df            frame        (shape->[1,4])

df=pd.concat（[df1.集合指数（'Stock'）、df2.集合指数（'Stock'）、df3.集合指数（'Stock'））
打印（df）
年利润百分比
股票
AAPL 2012 138.77
AAPL 2013 1 33.33
GOOG 2012 143.47
GOOG 2013 132.35
ABC 2012 140.00
ABC 2013 1 32.35
store=pd.HDFStore（'store.h5'）
存储['df']=df
印刷品（商店）
文件路径：store.h5
/df框架（形状->[1,4]）

我认为，如果您的所有DFs都具有相同的形状，那么将您的数据存储为

pandas.Panel

而不是DFs列表会更自然—这就是工作原理

import io
import pandas as pd

df1 = pd.read_csv(io.StringIO("""
Stock,Year,Profit,CountPercent
AAPL,2012,1,38.77
AAPL,2013,1,33.33
"""
))

df2 = pd.read_csv(io.StringIO("""
Stock,Year,Profit,CountPercent
GOOG,2012,1,43.47
GOOG,2013,1,32.35
"""
))

df3 = pd.read_csv(io.StringIO("""
Stock,Year,Profit,CountPercent
ABC,2012,1,40.0
ABC,2013,1,32.35
"""
))


store = pd.HDFStore('c:/temp/stocks.h5')

# i had to drop `Stock` column and make it Panel-Axis, because of ERROR:
# TypeError: Cannot serialize the column [%s] because its data contents are [mixed-integer] object dtype
# when saving Panel to HDFStore ... 
p = pd.Panel({df.iat[0, 0]:df.drop('Stock', 1) for df in [df1,df2,df3]})

store = pd.HDFStore('c:/temp/stocks.h5')
store.append('stocks', p, data_columns=True, mode='w')
store.close()

# read panel from HDFStore
store = pd.HDFStore('c:/temp/stocks.h5')
p = store.select('stocks')

商店：

In [18]: store
Out[18]:
<class 'pandas.io.pytables.HDFStore'>
File path: c:/temp/stocks.h5
/stocks            wide_table   (typ->appendable,nrows->6,ncols->3,indexers->[major_axis,minor_axis],dc->[AAPL,ABC,GOOG])

我认为，如果所有DFs都具有相同的形状，那么将数据存储为

pandas.Panel

而不是DFs列表将更为自然—这就是工作原理

import io
import pandas as pd

df1 = pd.read_csv(io.StringIO("""
Stock,Year,Profit,CountPercent
AAPL,2012,1,38.77
AAPL,2013,1,33.33
"""
))

df2 = pd.read_csv(io.StringIO("""
Stock,Year,Profit,CountPercent
GOOG,2012,1,43.47
GOOG,2013,1,32.35
"""
))

df3 = pd.read_csv(io.StringIO("""
Stock,Year,Profit,CountPercent
ABC,2012,1,40.0
ABC,2013,1,32.35
"""
))


store = pd.HDFStore('c:/temp/stocks.h5')

# i had to drop `Stock` column and make it Panel-Axis, because of ERROR:
# TypeError: Cannot serialize the column [%s] because its data contents are [mixed-integer] object dtype
# when saving Panel to HDFStore ... 
p = pd.Panel({df.iat[0, 0]:df.drop('Stock', 1) for df in [df1,df2,df3]})

store = pd.HDFStore('c:/temp/stocks.h5')
store.append('stocks', p, data_columns=True, mode='w')
store.close()

# read panel from HDFStore
store = pd.HDFStore('c:/temp/stocks.h5')
p = store.select('stocks')

商店：

In [18]: store
Out[18]:
<class 'pandas.io.pytables.HDFStore'>
File path: c:/temp/stocks.h5
/stocks            wide_table   (typ->appendable,nrows->6,ncols->3,indexers->[major_axis,minor_axis],dc->[AAPL,ABC,GOOG])

您的所有DFs是否具有相同的形状（行和列）？您的所有DFs是否具有相同的形状（行和列）？