Python 使用pandas/numpy按面元边界平滑_Python_Pandas_Numpy_Data Mining_Smoothing

Python 使用pandas/numpy按面元边界平滑

python pandas numpy

Python 使用pandas/numpy按面元边界平滑,python,pandas,numpy,data-mining,smoothing,Python,Pandas,Numpy,Data Mining,Smoothing,我已经使用pandas.cut函数形成了垃圾箱。现在，为了按bin边界进行平滑，我使用groupby函数计算每个bin的最小值和最大值最小值 date births with noise bin A 1959-01-31 23 19.921049 B 1959-01-02 27 25.921175 C 1959-01-01 30 32.064698 D 1959-01-08 35 38.507170 E 1959-01-05

我已经使用pandas.cut函数形成了垃圾箱。现在，为了按bin边界进行平滑，我使用groupby函数计算每个bin的最小值和最大值
最小值

    date    births  with noise
bin         
A   1959-01-31  23  19.921049
B   1959-01-02  27  25.921175
C   1959-01-01  30  32.064698
D   1959-01-08  35  38.507170
E   1959-01-05  41  45.022163
F   1959-01-13  47  51.821755
G   1959-03-27  56  59.416700
H   1959-09-23  73  70.140119

最大值-

    date    births  with noise
bin         
A   1959-07-12  30  25.161292
B   1959-12-11  35  31.738422
C   1959-12-27  42  38.447807
D   1959-12-20  48  44.919703
E   1959-12-31  56  51.274550
F   1959-12-30  59  57.515927
G   1959-11-05  68  63.970382
H   1959-09-23  73  70.140119

现在我想替换原始数据帧中的值。如果该值小于（其料仓的）平均值，则将其替换为（该料仓的）最小值；如果该值大于平均值，则将其替换为最大值。
我的数据框看起来像这样-

    date    births  with noise  bin smooth_val_mean
0   1959-01-01  35  36.964692   C   35.461173
1   1959-01-02  32  29.861393   B   29.592061
2   1959-01-03  30  27.268515   B   29.592061
3   1959-01-04  31  31.513148   B   29.592061
4   1959-01-05  44  46.194690   E   47.850101

我应该如何使用pandas/numpy执行此操作？

让我们尝试一下此功能：

def thresh(col):
    means = df['bin'].replace(df_mean[col])
    mins = df['bin'].replace(df_min[col])
    maxs = df['bin'].replace(df_max[col])
    
    signs = np.signs(df[col] - means)
    
    df[f'{col}_smooth'] = np.select((signs==1, signs==-1), (maxs, mins), means)

for col in ['with noise']:
    thresh(col)

您显示了每个箱子的

max/min

值，但我看不到平均值。另外，是否要用noise列替换

和两个列？类似于我之前计算的平均值的max/min值。还有，就是“有噪音”一栏。（或者两者兼而有之，我只是在寻找程序）@QuangHoangThe语句signs=df[col]-表示
给出错误-“具有数据类型类别的对象无法执行numpy op subtract”@satashreroy尝试使用replace
而不是map
。请参阅更新。谢谢，它正在工作。然而，我对符号感到困惑。如果不是1或-1呢？（例如，我观察到很少有数据点给出符号=~2）。我仍然希望该值被任何边界值平滑，而不是平均值@QuangHoang@SatashreeRoy抱歉，它应该用np.符号包装，当值分别为正数、0和负数时，返回1,0，-1
。请参阅更新。