Python 熊猫:通过圆柱体数计算每行的平均值
我有一个CSV文件(Mspec数据),如下所示:Python 熊猫:通过圆柱体数计算每行的平均值,python,pandas,numpy,Python,Pandas,Numpy,我有一个CSV文件(Mspec数据),如下所示: #Header # "Cycle";"Time";"ms";"mass amu";"SEM c/s" 0000000001;00:00:01;0000001452; 1,00; 620 0000000001;00:00:01;0000001452; 1,20; 4730 0000000001;00:00:01;0000001452; 1,40; 4610 ... ;..:..:.
#Header
#
"Cycle";"Time";"ms";"mass amu";"SEM c/s"
0000000001;00:00:01;0000001452; 1,00; 620
0000000001;00:00:01;0000001452; 1,20; 4730
0000000001;00:00:01;0000001452; 1,40; 4610
... ;..:..:..;..........;.........;...........
Cycle Time ms mass amu SEM c/s
0 1 00:00:01 1452 1.0 620
1 1 00:00:01 1452 1.2 4730
2 1 00:00:01 1452 1.4 4610
... ... ... ... ... ...
3872 4 00:06:30 390971 1.0 32290
3873 4 00:06:30 390971 1.2 31510
df.groupby(['ms', 'mass amu'])['SEM c/s'].mean()
我通过以下途径阅读:
df = pd.read_csv(Filename, header=30,delimiter=';',decimal= ',' )
结果如下所示:
#Header
#
"Cycle";"Time";"ms";"mass amu";"SEM c/s"
0000000001;00:00:01;0000001452; 1,00; 620
0000000001;00:00:01;0000001452; 1,20; 4730
0000000001;00:00:01;0000001452; 1,40; 4610
... ;..:..:..;..........;.........;...........
Cycle Time ms mass amu SEM c/s
0 1 00:00:01 1452 1.0 620
1 1 00:00:01 1452 1.2 4730
2 1 00:00:01 1452 1.4 4610
... ... ... ... ... ...
3872 4 00:06:30 390971 1.0 32290
3873 4 00:06:30 390971 1.2 31510
df.groupby(['ms', 'mass amu'])['SEM c/s'].mean()
此数据包含几个具有相同参数的质量规格扫描。循环编号1表示扫描1,以此类推。我想计算最后一列中每个相应相同质量的平均值SEM c/s。最后,我希望有一个新的数据框,只包含:
ms "mass amu" "SEM c/s(mean over all cycles)"
显然,质量的平均值不需要计算。我希望避免将每个周期读入一个新的数据框,因为这意味着我必须查找每个质谱的长度。不同测量的“质量范围”和“共振”明显不同()。
我想直接在numpy做计算是最好的,但我被卡住了
提前感谢您您可以使用groupby()
,类似于以下内容:
#Header
#
"Cycle";"Time";"ms";"mass amu";"SEM c/s"
0000000001;00:00:01;0000001452; 1,00; 620
0000000001;00:00:01;0000001452; 1,20; 4730
0000000001;00:00:01;0000001452; 1,40; 4610
... ;..:..:..;..........;.........;...........
Cycle Time ms mass amu SEM c/s
0 1 00:00:01 1452 1.0 620
1 1 00:00:01 1452 1.2 4730
2 1 00:00:01 1452 1.4 4610
... ... ... ... ... ...
3872 4 00:06:30 390971 1.0 32290
3873 4 00:06:30 390971 1.2 31510
df.groupby(['ms', 'mass amu'])['SEM c/s'].mean()
在所有循环中,你有不同的ms,你想计算每组相同ms的SEM平均值。我将向你展示一个分步示例。
您应该调用每个组,然后将平均值放入字典中以在数据帧中转换。
ms_uni = df['ms'].unique() #calculate the unique ms values
new_df_dict = { "ma":[], "SEM":[] } #later you will rename them
for un in range( len(ms_uni) ):
cms = ms_uni[un]
new_df_dict['ma'].append( cms )
new_df_dict['SEM'].append( df[ df['ms']==cms ]['SEM c/s'].mean() ) #advise: change the column name in a more safe SEM-c_s
new_df = pd.DataFrame(new_df_dict) #end of the dirty work
new_df.rename(index=str, columns={'ma':"mass amu", "SEM": "SEM c/s(mean over all cycles)"} )
希望它能有帮助与功能一起使用,谢谢!但是只有df.groupby(['mass amu'])['SEM c/s'].mean()wokred。这有什么原因吗。在numpy我该怎么做?