在python中,如何使用透视表输出进行下一步分析?

在python中,如何使用透视表输出进行下一步分析?,python,pandas,dataframe,pivot,pivot-table,Python,Pandas,Dataframe,Pivot,Pivot Table,样本数据 District Taluka Circle Crop Yield_2006 Yield_2007 Yield_2008 Yield_2009 AHMEDNAGAR AKOLE AKOLE PADDY 875.3 1338.9 894.9 339.2 AHMEDNAGAR AKOLE KOTUL PADDY 637.2 1007.4 919.7 323.9 AHMEDNAGAR AKOLE RAJUR PA

样本数据

District    Taluka  Circle  Crop    Yield_2006  Yield_2007  Yield_2008  Yield_2009
AHMEDNAGAR  AKOLE   AKOLE   PADDY   875.3   1338.9  894.9   339.2
AHMEDNAGAR  AKOLE   KOTUL   PADDY   637.2   1007.4  919.7   323.9
AHMEDNAGAR  AKOLE   RAJUR   PADDY   857.8   1227.1  1114.5  506.5
AHMEDNAGAR  AKOLE   SAMSHE  PADDY   875.3   1338.9  894.9   339.2
AHMEDNAGAR  AKOLE   BRAMHA  PADDY   637.2   1007.4  919.7   323.9
AHMEDNAGAR  AKOLE   VIRGAO  PADDY   875.3   1338.9  894.9   339.2
AHMEDNAGAR  AKOLE   SHENDI  PADDY   857.8   1227.1  1114.5  506.5
AHMEDNAGAR  AKOLE   SAKWADI PADDY   857.8   1227.1  1114.5  506.5
AMRAVATI    DHARNI  DHARNI  PADDY   590      888.6  437.8   201.9
AMRAVATI    DHARNI  DHULAT  PADDY   489.7    863.3  277     227.8
AMRAVATI    DHARNI  HARSUL  PADDY   590      888.6  437.8   201.9
AMRAVATI    DHARNI  SIKHEDA PADDY   489.7    863.3  277     227.8
AMRAVATI    CHIKARA CHHDARA PADDY   539.8    698.5  388.9   373.8
AMRAVATI    CHIKARA  SEDOH  PADDY   539.8    698.5  388.9   338.2
AMRAVATI    CHIKARA  CHURNI PADDY   539.8    698.5  388.9   338.2
代码:

>>> import pandas as pd
>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> Data=pd.read_csv("/home/desktop/Desktop/noonion.csv")
>>> Data1 =Data[['District','Taluka','Circle','Crop', 'Yield_2006', 'Yield_2007','Yield_2008','Yield_2009']]
>>> pivot=pd.pivot_table(Data1,values=["Yield_2006", "Yield_2007", "Yield_2008", "Yield_2009"],index=["District","Crop"],aggfunc=[np.mean],fill_value=False)
>>> pivot.head()
                            mean                                     
                      Yield_2006   Yield_2007  Yield_2008  Yield_2009
District   Crop                                                      
AHMEDNAGAR BAJRA      781.804124   884.185567  770.402062  767.814433
           BLACKGRAM  298.888889   517.722222   80.166667  608.166667
           COTTON     722.241667  1000.156250  863.227083  870.489583
           GREENGRAM  514.166667   660.938596  212.971930  512.380702
           GROUNDNUT  843.243590   919.384615  815.717949  842.012821
现在,我想使用这个pivot输出

比如:我想创建一个新的列“Average_产量”,它是每种作物2006年到2009年的平均产量


如何创建一个新列,其中我的“平均收益率”列值四舍五入到小数点后4位,我的“平均收益率”列值为2006年到2009年的收益率平均值?

您可以首先将
[]
aggfunc
中删除,以便在列中不返回
多索引
,然后按行(
轴=1
)使用:

对于选定列,可以使用或
子集

pivot['Average_Yield'] = pivot.loc[:,'Yield_2006':'Yield_2007'].mean(axis=1).round(4)
print (pivot)
                  Yield_2006  Yield_2007  Yield_2008  Yield_2009  \
District   Crop                                                    
AHMEDNAGAR PADDY  809.212500      1214.1      983.45    398.1125   
AMRAVATI   PADDY  539.828571       799.9      370.90    272.8000   

                  Average_Yield  
District   Crop                  
AHMEDNAGAR PADDY      1011.6563  
AMRAVATI   PADDY       669.8643  
替代解决方案:

In [79]: res = df.groupby(["District","Crop"]).mean()

In [80]: res['Average_Yield'] = res.mean(1)

In [81]: res
Out[81]:
                  Yield_2006  Yield_2007  Yield_2008  Yield_2009  Average_Yield
District   Crop
AHMEDNAGAR PADDY  809.212500      1214.1      983.45    398.1125     851.218750
AMRAVATI   PADDY  539.828571       799.9      370.90    272.8000     495.857143

是否可以从
/home/desktop/desktop/noonion.csv
获取一些数据样本?或者,如果数据不保密,共享文件?@jezrael,有这么多列,因此无法在此附加:(嗯,是否可以通过
dropbox
gdocs
wetransfer
,其他内容共享csv?@jezrael,我已经添加了样本数据,但有疑问假设我们只想取“2006年收益率和2007年收益率”的平均值在我们的数据透视表中,那么如何计算平均收益率?很高兴能帮助您。如果有两个选项可供选择,那么我将为这两个选项都做,谢谢help@e4e5,当然,不用担心!:-)
pivot['Average_Yield'] = pivot[['Yield_2006','Yield_2007']].mean(axis=1).round(4)
print (pivot)
                  Yield_2006  Yield_2007  Yield_2008  Yield_2009  \
District   Crop                                                    
AHMEDNAGAR PADDY  809.212500      1214.1      983.45    398.1125   
AMRAVATI   PADDY  539.828571       799.9      370.90    272.8000   

                  Average_Yield  
District   Crop                  
AHMEDNAGAR PADDY      1011.6563  
AMRAVATI   PADDY       669.8643  
In [79]: res = df.groupby(["District","Crop"]).mean()

In [80]: res['Average_Yield'] = res.mean(1)

In [81]: res
Out[81]:
                  Yield_2006  Yield_2007  Yield_2008  Yield_2009  Average_Yield
District   Crop
AHMEDNAGAR PADDY  809.212500      1214.1      983.45    398.1125     851.218750
AMRAVATI   PADDY  539.828571       799.9      370.90    272.8000     495.857143