Python 数据帧平均值和绘图值_Python_Pandas_Numpy

Python 数据帧平均值和绘图值

python pandas numpy

Python 数据帧平均值和绘图值,python,pandas,numpy,Python,Pandas,Numpy,我已经实现了一个Kfold交叉验证算法来研究机器学习问题，并设置了SVM的参数（我知道skilearn，但我想自己执行一个算法）。我创建了5个折叠，并用它来测试SVM参数“C”和“Tolerance”。我将结果保存在一个文本文件中，然后用熊猫创建了一个数据框，如下所示： C tol FP TN SPE TP FN SEN 0 100 0.10000 19 261 0.9469 107 6 0.9321 1

我已经实现了一个Kfold交叉验证算法来研究机器学习问题，并设置了SVM的参数（我知道skilearn，但我想自己执行一个算法）。我创建了5个折叠，并用它来测试SVM参数“C”和“Tolerance”。我将结果保存在一个文本文件中，然后用熊猫创建了一个数据框，如下所示：

           C      tol  FP   TN     SPE   TP  FN     SEN
0        100  0.10000  19  261  0.9469  107   6  0.9321
1        100  0.10000  30  250  0.8319   94  19  0.8929
2        100  0.10000  28  252  0.8496   96  17  0.9000
3        100  0.10000  27  253  0.9735  110   3  0.9036
4        100  0.10000  26  254  0.9469  107   6  0.9071
5        100  0.05000  16  264  0.9381  106   7  0.9429
6        100  0.05000  22  258  0.8319   94  19  0.9214
7        100  0.05000  25  255  0.8761   99  14  0.9107
8        100  0.05000  21  259  0.9646  109   4  0.9250
9        100  0.05000  20  260  0.9823  111   2  0.9286

.......
400  1000000  0.00001  21  259  0.9558  108   5  0.9250
401  1000000  0.00001  20  260  0.8850  100  13  0.9286
402  1000000  0.00001  14  266  0.8584   97  16  0.9500
403  1000000  0.00001  17  263  0.9558  108   5  0.9393
404  1000000  0.00001  23  257  0.9735  110   3  0.9179

它有405行。我需要计算列“SPE”和“SEN”中每一组5个元素的平均值，然后在整个数据帧中迭代该过程（例如，我需要计算列“SPE”和“SEN”的元素在第0:4行中的平均值，在第5:9行中的平均值，在第10:14行中的平均值……直到第400:404行）。对于每次迭代，我都希望获得一个具有以下值的矩阵：

['C', 'tol' , 'mean of SPE', 'mean of SEN']

矩阵将有405/5=81行和4列

因此，对于上面的数据帧部分，我需要一个结果链接：

[[100, 0.10000, 0.90976, 0.90714],
 [100,0.05000, 0.91860, 0.92572]]
.....
[1000000,0.00001,0.91860, 0.92572]

我想获得这个矩阵，因为我的目标是使用pyplot获得两个图：一个用于变量'SPE'vs'tol'，一个用于变量'SEN'vs'tol'，为每个'C'值绘制不同的曲线。
感谢使用由floor Division创建的

arange

，使用

first

和

mean

，用于列的变更顺序：

df = df.groupby(np.arange(len(df.index)) // 5) \
       .agg({'C':'first', 'tol':'first', 'SPE':'mean','SEN':'mean'}) \
       .reindex_axis(['C','tol','SPE','SEN'], axis=1) \
       .rename(columns = {'SPE':'mean of SPE','SEN':'mean of SEN'})
print (df)
         C      tol  mean of SPE  mean of SEN
0      100  0.10000      0.90976      0.90714
1      100  0.05000      0.91860      0.92572
2  1000000  0.00001      0.92570      0.93216

对于打印，可以使用+：

或者可能：

df1 = df.pivot(index='C', columns='tol', values='mean of SPE')
print (df1)
tol      0.00001  0.05000  0.10000
C                                 
100          NaN   0.9186  0.90976
1000000   0.9257      NaN      NaN

df1.plot()

对于numpy数组，请添加：

编辑：

如果每5行的公差值是唯一的，

df

的解决方案可能有点不同-

groupby

by column

tol

而不是

arange

：

df = df.groupby('tol', sort=False) \
       .agg({'C':'first', 'SPE':'mean','SEN':'mean'}) \
       .reset_index() \
       .reindex_axis(['C','tol','SPE','SEN'], axis=1) \
       .rename(columns = {'SPE':'mean of SPE','SEN':'mean of SEN'})
print (df)
         C      tol  mean of SPE  mean of SEN
0      100  0.10000      0.90976      0.90714
1      100  0.05000      0.91860      0.92572
2  1000000  0.00001      0.92570      0.93216

伟大的非常感谢你！！你的解决方案很棒！！它工作得很好！！这并不容易，但很有趣；）如果我的回答有帮助，别忘了。谢谢，当然！再次感谢你！！再见

df = df.groupby(np.arange(len(df.index)) // 5) \
       .agg({'C':'first', 'tol':'first', 'SPE':'mean','SEN':'mean'}) \
       .reindex_axis(['C','tol','SPE','SEN'], axis=1) \
       .values
print (df)
[[  1.00000000e+02   1.00000000e-01   9.09760000e-01   9.07140000e-01]
 [  1.00000000e+02   5.00000000e-02   9.18600000e-01   9.25720000e-01]
 [  1.00000000e+06   1.00000000e-05   9.25700000e-01   9.32160000e-01]]

df = df.groupby('tol', sort=False) \
       .agg({'C':'first', 'SPE':'mean','SEN':'mean'}) \
       .reset_index() \
       .reindex_axis(['C','tol','SPE','SEN'], axis=1) \
       .rename(columns = {'SPE':'mean of SPE','SEN':'mean of SEN'})
print (df)
         C      tol  mean of SPE  mean of SEN
0      100  0.10000      0.90976      0.90714
1      100  0.05000      0.91860      0.92572
2  1000000  0.00001      0.92570      0.93216