用R中的第二个最小值组替换异常值
我是R新手,我有一个数据表dt用R中的第二个最小值组替换异常值,r,data.table,outliers,R,Data.table,Outliers,我是R新手,我有一个数据表dt > library(data.table) > dt <- data.table(A = c(1,2,3,4,74,6, 7, 8, 9, 75, 11, 12), + B=c("P","P","P","P", "P", "P" ,"Q","Q","Q", "Q", "Q", "Q"), + C=c("a","b","c","d","e","f", "g", "h", "i"
> library(data.table)
> dt <- data.table(A = c(1,2,3,4,74,6, 7, 8, 9, 75, 11, 12),
+ B=c("P","P","P","P", "P", "P" ,"Q","Q","Q", "Q", "Q", "Q"),
+ C=c("a","b","c","d","e","f", "g", "h", "i", "j", "k", "l"))
> dt
A B C
1: 1 P a
2: 2 P b
3: 3 P c
4: 4 P d
5: 74 P e
6: 6 P f
7: 7 Q g
8: 8 Q h
9: 9 Q i
10: 75 Q j
11: 11 Q k
12: 12 Q l
我们可以用
替换
dt[, A := replace(A, out ==1, sort(A)[2]) , by = B]
dt
# A B C out
# 1: 1 P a 0
# 2: 2 P b 0
# 3: 3 P c 0
# 4: 4 P d 0
# 5: 2 P e 1
# 6: 6 P f 0
# 7: 7 Q g 0
# 8: 8 Q h 0
# 9: 9 Q i 0
#10: 8 Q j 1
#11: 11 Q k 0
#12: 12 Q l 0
或者另一种选择是
dt[, A := pmax((out==1)*sort(A)[2], (out==0)*A), B]
在?sort
中似乎有一个部分排序选项,在这种情况下可能值得使用。
dt[, A := replace(A, out ==1, sort(A)[2]) , by = B]
dt
# A B C out
# 1: 1 P a 0
# 2: 2 P b 0
# 3: 3 P c 0
# 4: 4 P d 0
# 5: 2 P e 1
# 6: 6 P f 0
# 7: 7 Q g 0
# 8: 8 Q h 0
# 9: 9 Q i 0
#10: 8 Q j 1
#11: 11 Q k 0
#12: 12 Q l 0
dt[, A := pmax((out==1)*sort(A)[2], (out==0)*A), B]