R 如何替换异常值?

R 如何替换异常值?,r,outliers,R,Outliers,我有以下数据帧 d <- data.frame("Open" = rnorm(10,5,1) ) 现在我想用正常值替换这些异常值 我试着用以前的值替换它,但是如果一个接一个地存在异常值,它就不起作用了。或者第一个元素是异常值 d$Temp <- lag(d$Open ,1) d$Temp <- ifelse(is.na(d$Temp), d$Open, d$Temp) d$Open <- ifelse( ( ( d$Open - d$Temp)

我有以下数据帧

d <- data.frame("Open" = rnorm(10,5,1) )
现在我想用正常值替换这些异常值

我试着用以前的值替换它,但是如果一个接一个地存在异常值,它就不起作用了。或者第一个元素是异常值

d$Temp <- lag(d$Open ,1)
d$Temp <- ifelse(is.na(d$Temp), d$Open, d$Temp)

d$Open <- ifelse(
  ( ( d$Open - d$Temp)/ d$Temp) > 3,
  lag(d$Open, 1) ,
  d$Open
)

d <- d[,1]

在上述数据中,只有一个离群值为2900

为什么在使用rnorm生成值时,离群值取决于之前的值?为什么不使用比中值或平均值多X%的值?然后你可以找出所有的离群值,然后是最大的非离群值,然后用最大的非离群值替换所有离群值?为什么只将其与以前的值进行比较?若你们在和之前的值进行比较,那个么并没有任何情况会导致第一个值成为异常值!此外,如果您的第二个值与第一个值相比是一个离群值,那么如何进行离群值的第三个值测试。建议您在一些复杂情况下定义逻辑,例如c1、300、3000、2、2000,而不是创建随机情况。在这种情况下,你能告诉我你想要的输出是什么吗?你可以看看这里的选项在您修改的数据中,我还有两个问题。1如果您的价值观呈下降趋势,该怎么办。如果所有值均为正值,则不可能存在比其先前值低300%的值?2value后跟2900是31,大大低于2900被视为异常值?您永远不应该简单地替换异常值。这样做会使您的整个数据分析无效。在最后一步中,我可以用最近的非NA值替换NA吗?您好,您可以检查我发布的新数据吗?这样不行。
# Your dataframe
df <- data.frame("Open" = rnorm(10,5,1) )

# Adding outliers
df$Open[4] = 100
df$Open[5] = 100

# Visualize outliers
boxplot(d)

# create a vector of outliers for the numeric factor
outliers <- boxplot(df$Open, plot = FALSE)$out

# Replace the outliers with NA (or whatever you want)
df[df$Open %in% outliers, "Open"] = NA
df
dput(round(d$Open,0))
c(5, 5, 5, 5, 5, 5, 6, 6, 5, 5, 5, 6, 6, 5, 6, 7, 6, 6, 7, 6, 
6, 5, 6, 7, 8, 8, 9, 8, 7, 7, 6, 6, 6, 5, 5, 5, 5, 5, 5, 5, 5, 
5, 6, 6, 6, 5, 5, 6, 6, 7, 6, 6, 6, 6, 6, 7, 6, 6, 6, 6, 6, 6, 
5, 5, 5, 5, 5, 5, 5, 6, 5, 5, 6, 6, 6, 6, 6, 7, 7, 8, 7, 7, 7, 
7, 7, 9, 8, 8, 9, 8, 8, 10, 11, 10, 10, 10, 9, 12, 10, 9, 10, 
9, 11, 11, 11, 11, 10, 10, 10, 8, 8, 9, 8, 8, 8, 8, 7, 7, 8, 
8, 8, 8, 9, 8, 6, 8, 8, 7, 7, 7, 8, 8, 9, 10, 10, 11, 11, 10, 
10, 11, 11, 11, 11, 13, 13, 14, 13, 13, 15, 16, 16, 16, 18, 17, 
17, 17, 17, 18, 16, 15, 14, 17, 16, 17, 16, 16, 17, 18, 23, 19, 
18, 18, 17, 17, 16, 17, 18, 17, 16, 16, 16, 16, 15, 16, 16, 18, 
22, 24, 24, 30, 31, 31, 31, 32, 40, 41, 39, 38, 36, 34, 33, 33, 
38, 38, 36, 34, 34, 36, 33, 35, 34, 33, 33, 34, 33, 33, 34, 33, 
38, 37, 38, 37, 40, 42, 40, 43, 45, 46, 40, 39, 41, 33, 30, 32, 
29, 32, 30, 30, 29, 32, 31, 34, 35, 34, 34, 33, 32, 31, 32, 33, 
31, 30, 31, 36, 36, 34, 35, 36, 33, 32, 33, 32, 35, 35, 40, 40, 
42, 38, 39, 39, 36, 37, 38, 36, 38, 39, 40, 40, 40, 41, 45, 49, 
50, 50, 49, 48, 50, 49, 50, 57, 58, 54, 54, 54, 49, 49, 58, 54, 
50, 52, 70, 75, 77, 78, 82, 98, 102, 121, 129, 126, 125, 124, 
118, 138, 147, 126, 107, 97, 100, 89, 103, 93, 93, 74, 72, 68, 
82, 82, 81, 95, 98, 94, 96, 95, 94, 90, 85, 91, 76, 72, 72, 78, 
81, 83, 85, 84, 80, 80, 79, 78, 65, 63, 46, 55, 41, 36, 32, 40, 
33, 24, 21, 19, 21, 30, 27, 28, 30, 25, 24, 21, 24, 25, 24, 22, 
22, 20, 23, 25, 27, 30, 34, 34, 37, 35, 41, 49, 58, 55, 51, 54, 
52, 51, 50, 41, 44, 51, 54, 48, 52, 54, 54, 57, 77, 70, 69, 68, 
68, 74, 70, 62, 74, 73, 74, 73, 72, 76, 73, 79, 82, 88, 91, 90, 
82, 81, 82, 86, 88, 94, 91, 95, 97, 103, 102, 94, 102, 93, 87, 
79, 79, 80, 77, 80, 83, 86, 81, 82, 80, 84, 85, 83, 83, 84, 85, 
92, 88, 90, 88, 91, 93, 90, 91, 87, 84, 89, 81, 75, 76, 70, 68, 
71, 74, 68, 68, 64, 54, 58, 59, 58, 52, 52, 52, 54, 61, 59, 57, 
64, 63, 64, 60, 58, 54, 55, 55, 56, 51, 54, 56, 52, 51, 49, 41, 
39, 38, 37, 37, 35, 38, 36, 34, 32, 32, 31, 33, 32, 32, 28, 25, 
25, 24, 22, 23, 23, 23, 26, 31, 31, 32, 32, 37, 36, 37, 36, 36, 
34, 32, 32, 33, 31, 30, 30, 27, 24, 25, 25, 25, 24, 25, 30, 30, 
30, 31, 29, 27, 27, 27, 27, 24, 24, 22, 24, 25, 26, 28, 30, 30, 
29, 28, 30, 31, 31, 33, 34, 34, 34, 33, 35, 32, 30, 30, 31, 29, 
27, 27, 26, 26, 23, 21, 24, 25, 26, 25, 24, 27, 25, 25, 24, 24, 
24, 23, 22, 24, 24, 25, 24, 23, 22, 22, 20, 21, 21, 22, 22, 22, 
21, 23, 24, 24, 25, 26, 26, 28, 26, 27, 27, 27, 30, 30, 33, 36, 
33, 31, 30, 32, 31, 33, 30, 30, 29, 34, 35, 37, 40, 37, 37, 37, 
37, 41, 42, 41, 44, 41, 39, 38, 43, 38, 40, 39, 42, 44, 43, 39, 
41, 40, 38, 34, 24, 24, 23, 25, 27, 28, 27, 32, 28, 27, 29, 26, 
27, 26, 29, 28, 28, 29, 31, 28, 28, 31, 29, 28, 25, 26, 23, 23, 
24, 23, 20, 21, 20, 21, 20, 18, 17, 18, 21, 19, 19, 19, 21, 19, 
19, 19, 17, 16, 15, 14, 16, 15, 14, 15, 17, 17, 18, 16, 15, 15, 
14, 13, 14, 12, 12, 12, 12, 11, 9, 8, 9, 9, 8, 8, 7, 10, 11, 
12, 11, 11, 12, 14, 14, 15, 14, 14, 12, 13, 12, 12, 14, 15, 17, 
16, 16, 16, 17, 17, 15, 15, 13, 14, 13, 13, 13, 13, 15, 14, 15, 
17, 16, 14, 12, 15, 16, 15, 16, 15, 15, 18, 21, 20, 19, 19, 18, 
19, 18, 19, 18, 18, 18, 18, 21, 20, 22, 20, 20, 20, 19, 18, 18, 
19, 20, 18, 19, 19, 20, 21, 23, 22, 20, 20, 19, 20, 22, 23, 25, 
25, 28, 26, 27, 26, 25, 25, 24, 23, 22, 23, 21, 23, 24, 26, 30, 
28, 27, 20, 23, 21, 24, 22, 19, 19, 18, 20, 25, 24, 25, 23, 24, 
23, 23, 24, 22, 32, 29, 28, 27, 27, 24, 25, 28, 28, 29, 28, 28, 
30, 34, 33, 32, 29, 26, 29, 27, 27, 41, 43, 42, 40, 41, 34, 36, 
37, 36, 36, 33, 34, 32, 32, 30, 29, 33, 34, 36, 34, 37, 41, 40, 
36, 33, 33, 32, 32, 33, 33, 32, 31, 30, 27, 30, 30, 29, 26, 31, 
26, 26, 23, 23, 27, 27, 29, 26, 27, 27, 26, 28, 29, 31, 33, 31, 
31, 29, 27, 28, 28, 27, 27, 28, 27, 26, 25, 26, 24, 25, 24, 20, 
17, 14, 16, 15, 15, 15, 14, 15, 14, 14, 14, 15, 16, 15, 21, 20, 
19, 19, 18, 18, 18, 22, 26, 26, 28, 27, 28, 24, 25, 24, 22, 22, 
22, 21, 24, 26, 27, 26, 26, 27, 27, 34, 37, 34, 32, 30, 32, 34, 
30, *2900*, 31, 36, 32, 34, 33, 37, 37, 37, 42, 57, 52, 54, 52, 
51)
# Your dataframe
df <- data.frame("Open" = rnorm(10,5,1) )

# Adding outliers
df$Open[4] = 100
df$Open[5] = 100

# Visualize outliers
boxplot(d)

# create a vector of outliers for the numeric factor
outliers <- boxplot(df$Open, plot = FALSE)$out

# Replace the outliers with NA (or whatever you want)
df[df$Open %in% outliers, "Open"] = NA
df
       Open
1  4.589664
2  4.621286
3  7.317407
4        NA
5        NA
6  3.490202
7  3.536626
8  2.825471
9  5.710270
10 5.541880