在python中,如何使用透视表输出进行下一步分析?
样本数据在python中,如何使用透视表输出进行下一步分析?,python,pandas,dataframe,pivot,pivot-table,Python,Pandas,Dataframe,Pivot,Pivot Table,样本数据 District Taluka Circle Crop Yield_2006 Yield_2007 Yield_2008 Yield_2009 AHMEDNAGAR AKOLE AKOLE PADDY 875.3 1338.9 894.9 339.2 AHMEDNAGAR AKOLE KOTUL PADDY 637.2 1007.4 919.7 323.9 AHMEDNAGAR AKOLE RAJUR PA
District Taluka Circle Crop Yield_2006 Yield_2007 Yield_2008 Yield_2009
AHMEDNAGAR AKOLE AKOLE PADDY 875.3 1338.9 894.9 339.2
AHMEDNAGAR AKOLE KOTUL PADDY 637.2 1007.4 919.7 323.9
AHMEDNAGAR AKOLE RAJUR PADDY 857.8 1227.1 1114.5 506.5
AHMEDNAGAR AKOLE SAMSHE PADDY 875.3 1338.9 894.9 339.2
AHMEDNAGAR AKOLE BRAMHA PADDY 637.2 1007.4 919.7 323.9
AHMEDNAGAR AKOLE VIRGAO PADDY 875.3 1338.9 894.9 339.2
AHMEDNAGAR AKOLE SHENDI PADDY 857.8 1227.1 1114.5 506.5
AHMEDNAGAR AKOLE SAKWADI PADDY 857.8 1227.1 1114.5 506.5
AMRAVATI DHARNI DHARNI PADDY 590 888.6 437.8 201.9
AMRAVATI DHARNI DHULAT PADDY 489.7 863.3 277 227.8
AMRAVATI DHARNI HARSUL PADDY 590 888.6 437.8 201.9
AMRAVATI DHARNI SIKHEDA PADDY 489.7 863.3 277 227.8
AMRAVATI CHIKARA CHHDARA PADDY 539.8 698.5 388.9 373.8
AMRAVATI CHIKARA SEDOH PADDY 539.8 698.5 388.9 338.2
AMRAVATI CHIKARA CHURNI PADDY 539.8 698.5 388.9 338.2
代码:
>>> import pandas as pd
>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> Data=pd.read_csv("/home/desktop/Desktop/noonion.csv")
>>> Data1 =Data[['District','Taluka','Circle','Crop', 'Yield_2006', 'Yield_2007','Yield_2008','Yield_2009']]
>>> pivot=pd.pivot_table(Data1,values=["Yield_2006", "Yield_2007", "Yield_2008", "Yield_2009"],index=["District","Crop"],aggfunc=[np.mean],fill_value=False)
>>> pivot.head()
mean
Yield_2006 Yield_2007 Yield_2008 Yield_2009
District Crop
AHMEDNAGAR BAJRA 781.804124 884.185567 770.402062 767.814433
BLACKGRAM 298.888889 517.722222 80.166667 608.166667
COTTON 722.241667 1000.156250 863.227083 870.489583
GREENGRAM 514.166667 660.938596 212.971930 512.380702
GROUNDNUT 843.243590 919.384615 815.717949 842.012821
现在,我想使用这个pivot输出
比如:我想创建一个新的列“Average_产量”,它是每种作物2006年到2009年的平均产量
如何创建一个新列,其中我的“平均收益率”列值四舍五入到小数点后4位,我的“平均收益率”列值为2006年到2009年的收益率平均值?您可以首先将
[]
从aggfunc
中删除,以便在列中不返回多索引
,然后按行(轴=1
)使用:
对于选定列,可以使用或子集:
pivot['Average_Yield'] = pivot.loc[:,'Yield_2006':'Yield_2007'].mean(axis=1).round(4)
print (pivot)
Yield_2006 Yield_2007 Yield_2008 Yield_2009 \
District Crop
AHMEDNAGAR PADDY 809.212500 1214.1 983.45 398.1125
AMRAVATI PADDY 539.828571 799.9 370.90 272.8000
Average_Yield
District Crop
AHMEDNAGAR PADDY 1011.6563
AMRAVATI PADDY 669.8643
替代解决方案:
In [79]: res = df.groupby(["District","Crop"]).mean()
In [80]: res['Average_Yield'] = res.mean(1)
In [81]: res
Out[81]:
Yield_2006 Yield_2007 Yield_2008 Yield_2009 Average_Yield
District Crop
AHMEDNAGAR PADDY 809.212500 1214.1 983.45 398.1125 851.218750
AMRAVATI PADDY 539.828571 799.9 370.90 272.8000 495.857143
是否可以从/home/desktop/desktop/noonion.csv
获取一些数据样本?或者,如果数据不保密,共享文件?@jezrael,有这么多列,因此无法在此附加:(嗯,是否可以通过dropbox
,gdocs
,wetransfer
,其他内容共享csv?@jezrael,我已经添加了样本数据,但有疑问假设我们只想取“2006年收益率和2007年收益率”的平均值在我们的数据透视表中,那么如何计算平均收益率?很高兴能帮助您。如果有两个选项可供选择,那么我将为这两个选项都做,谢谢help@e4e5,当然,不用担心!:-)
pivot['Average_Yield'] = pivot[['Yield_2006','Yield_2007']].mean(axis=1).round(4)
print (pivot)
Yield_2006 Yield_2007 Yield_2008 Yield_2009 \
District Crop
AHMEDNAGAR PADDY 809.212500 1214.1 983.45 398.1125
AMRAVATI PADDY 539.828571 799.9 370.90 272.8000
Average_Yield
District Crop
AHMEDNAGAR PADDY 1011.6563
AMRAVATI PADDY 669.8643
In [79]: res = df.groupby(["District","Crop"]).mean()
In [80]: res['Average_Yield'] = res.mean(1)
In [81]: res
Out[81]:
Yield_2006 Yield_2007 Yield_2008 Yield_2009 Average_Yield
District Crop
AHMEDNAGAR PADDY 809.212500 1214.1 983.45 398.1125 851.218750
AMRAVATI PADDY 539.828571 799.9 370.90 272.8000 495.857143