Python 基于数据帧中的两列删除异常值
我有一个数据框,如下所示:Python 基于数据帧中的两列删除异常值,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个数据框,如下所示: Year Month Equipment Weight 2017 1 TennisBall 5 2017 1 Football 4 2017 1 TennisBall 6 2017 1 TennisBall 7 2017 1 TennisBall 300 2017 2 TennisBall 300 2018 2 TennisBall 250 2018 2 Football 5
Year Month Equipment Weight
2017 1 TennisBall 5
2017 1 Football 4
2017 1 TennisBall 6
2017 1 TennisBall 7
2017 1 TennisBall 300
2017 2 TennisBall 300
2018 2 TennisBall 250
2018 2 Football 5
2018 2 TennisBall 6
2018 2 TennisBall 275
...
在上面的例子中,我们通常只在2月份发货300个网球,因此6个单位的订单是异常值,而在1月份,正常数量是~5个,使得该月任何较大的订单都是异常值。我想根据每个月的权重去掉异常值。有没有一个简单的方法可以做到这一点?我知道我可以做以下几点:
df1[np.abs(df1.Weight-df1.Weight.mean()) <= (5*df1.Weight.std())]
1月份的异常值300被删除(1月份高于正常值),2月份的异常值6被删除(1月份为正常值,但2月份为异常值)这是groupby的问题。您可以通过创建两个包含分组平均值和标准偏差的新列,然后对这些列进行筛选来解决此问题:
# Calculate difference between Weight and mean of group
df['Weight diff'] = df['Weight'].sub(df.groupby(['Year','Month','Equipment'])['Weight'].transform('mean'))
# Calculate standard deviation of group
df['std'] = df.groupby(['Year','Month','Equipment'])['Weight'].transform('std')
# Consider columns satisfying condition
# Include or condition accounting for NaN's from single value groups
df = df.loc[(np.abs(df['Weight diff']) <= df['std']) | (df['std'].isnull())]
# Remove unnecessary columns
df = df.drop(['Weight diff', 'std'], axis=1)
您是否可以包括示例数据帧和所需输出。示例数据帧是第一个数据帧。我在末尾添加了所需的输出。谢谢
# Calculate difference between Weight and mean of group
df['Weight diff'] = df['Weight'].sub(df.groupby(['Year','Month','Equipment'])['Weight'].transform('mean'))
# Calculate standard deviation of group
df['std'] = df.groupby(['Year','Month','Equipment'])['Weight'].transform('std')
# Consider columns satisfying condition
# Include or condition accounting for NaN's from single value groups
df = df.loc[(np.abs(df['Weight diff']) <= df['std']) | (df['std'].isnull())]
# Remove unnecessary columns
df = df.drop(['Weight diff', 'std'], axis=1)
>>> print(df)
0 Year Month Equipment Weight
1 2017 1 TennisBall 5
2 2017 1 Football 4
3 2017 1 TennisBall 6
4 2017 1 TennisBall 7
6 2017 2 TennisBall 300
7 2018 2 TennisBall 250
8 2018 2 Football 5
10 2018 2 TennisBall 275