使用R中的其他阈值表替换数据帧中的数据
我有两个数据帧,第一个有几个月的数据,第二个有阈值(最小值和最大值,每个月不同)。现在我想用NA替换阈值之外的任何值 数据帧的结构如下所示: 数据有名称为“月”、“a”、“b”和“c”的列。阈值有“月”、“a.min”、“a.max”、“b.min”和“b.max”使用R中的其他阈值表替换数据帧中的数据,r,dataframe,filter,threshold,R,Dataframe,Filter,Threshold,我有两个数据帧,第一个有几个月的数据,第二个有阈值(最小值和最大值,每个月不同)。现在我想用NA替换阈值之外的任何值 数据帧的结构如下所示: 数据有名称为“月”、“a”、“b”和“c”的列。阈值有“月”、“a.min”、“a.max”、“b.min”和“b.max” 阈值以R为基数: # use merge to pull in the thresholds outcome <- merge(df, thresholds, all.x=TRUE, by="month") # define
阈值以R为基数:
# use merge to pull in the thresholds
outcome <- merge(df, thresholds, all.x=TRUE, by="month")
# define the columns to look at, that require a .min, .max column
threshold_cols <- c("a", "b")
# loop and update
for(i in threshold_cols){
# create a condition vector to highlight ones out of the range
con <- outcome[[i]] < outcome[[sprintf("%s.min", i)]] |
outcome[[i]] > outcome[[sprintf("%s.max", i)]]
# force these as NA
outcome[[i]][con] <- NA
}
#使用merge拉入阈值
结果使用dplyr
您可以这样做
library(dplyr)
df2 <- df %>%
left_join(thresholds) %>%
mutate(a=ifelse(a > a.min & a < a.max, a, NA),
b=ifelse(b > b.min & b < b.max, b, NA)) %>%
select(month, a, b, c)
df2
month a b c
1 1 3.693 NA 384.3990
2 1 NA NA 388.0435
3 1 3.068 NA 391.1580
4 1 2.633 NA 394.1089
5 1 3.047 NA 396.2393
6 1 3.072 NA 397.7653
7 1 3.278 NA 405.9039
...
库(dplyr)
df2%
左联合(阈值)%>%
突变(a=ifelse(a>a.min&ab.min&b%
选择(月份、a、b、c)
df2
a月b月c
1 1 3.693 NA 384.3990
2 1 NA 388.0435
3 1 3.068 NA 391.1580
412.633NA 394.1089
5 1 3.047 NA 396.2393
613.072NA 397.7653
713.278NA 405.9039
...
或者,这可以通过一系列非等更新连接来实现:
编辑
OP在一篇文章中透露,他希望在一个包含许多列的巨大数据帧上运行该解决方案
非等更新联接的序列也可以在循环中执行:
threshold_cols <- c("a", "b")
setDT(df)
for(i in threshold_cols){
df[thresholds, on = c("month", sprintf("%s<%s.min", i, i)), (i) := NA][
thresholds, on = c("month", sprintf("%s>%s.max", i, i)), (i) := NA]
}
threshold\u cols你说的“应用限制”是什么意思?我的意思是使用阈值,那些阈值之外的任何内容都应该成为NAThis,这很好,但是如果有更多的列,那么每一列都必须在一行中处理。我想在一个巨大的DataFrame上运行它这很好,但是如果有更多的列,那么每一列都必须在一行中处理。我想在一个巨大的数据帧上运行它
library(dplyr)
df2 <- df %>%
left_join(thresholds) %>%
mutate(a=ifelse(a > a.min & a < a.max, a, NA),
b=ifelse(b > b.min & b < b.max, b, NA)) %>%
select(month, a, b, c)
df2
month a b c
1 1 3.693 NA 384.3990
2 1 NA NA 388.0435
3 1 3.068 NA 391.1580
4 1 2.633 NA 394.1089
5 1 3.047 NA 396.2393
6 1 3.072 NA 397.7653
7 1 3.278 NA 405.9039
...
library(data.table)
setDT(df)[setDT(thresholds), on = .(month, a < a.min), a := NA][
thresholds, on = .(month, a > a.max), a := NA][
thresholds, on = .(month, b < b.min), b := NA][
thresholds, on = .(month, b > b.max), b := NA][]
a b c month
1: 3.693 NA 384.3990 1
2: NA NA 388.0435 1
3: 3.068 NA 391.1580 1
4: 2.633 NA 394.1089 1
5: 3.047 NA 396.2393 1
6: 3.072 NA 397.7653 1
7: 3.278 NA 405.9039 1
8: 3.533 NA 413.3497 1
9: 3.406 NA 413.8737 1
10: 2.893 NA 412.4252 1
11: 2.722 NA 401.0619 1
12: NA NA 395.5369 1
13: 1.994 63.70 393.3440 1
14: 1.743 NA 390.2218 1
15: 1.958 NA 380.8314 1
16: 2.030 NA 370.9777 1
17: 2.222 56.69 365.3473 1
18: 2.207 NA 365.9187 1
19: 2.393 56.74 362.2083 1
20: 2.731 50.95 368.0958 1
21: NA 65.32 369.2954 1
22: 4.065 NA 369.1633 1
23: 3.458 67.36 367.9333 1
24: 3.142 65.04 364.1945 1
25: 2.705 60.00 359.7283 1
26: NA 53.26 357.4523 1
27: 1.794 NA 357.9721 1
28: 2.139 NA 356.7934 1
29: 2.455 57.16 355.4262 1
30: 2.830 NA 358.4297 1
31: 3.008 63.45 357.7325 1
32: 3.358 52.17 362.7329 1
33: 3.663 56.59 365.4261 1
34: 2.936 54.27 363.8837 1
35: 2.636 NA 362.5658 2
36: 2.420 NA 363.5668 2
37: 3.403 NA 369.6555 2
38: 2.830 NA 366.5757 2
39: 2.740 NA 360.5511 2
40: 3.119 NA 360.7731 2
41: 2.376 NA 360.5672 2
42: 3.285 NA 363.6154 2
43: 3.267 NA 367.0974 2
44: 2.966 NA 363.4489 2
45: 3.675 60.77 373.0476 2
46: 2.803 NA 379.0865 2
47: 3.097 NA 382.3346 2
48: 3.381 NA 386.7982 2
49: 2.774 NA 394.0651 2
50: 3.335 NA 398.8354 2
51: 3.857 NA 398.6193 2
52: 2.854 NA 401.3643 2
53: 3.093 69.54 401.9453 2
54: 2.368 70.30 405.3331 2
55: 2.800 NA 417.1013 2
56: 2.643 NA 425.4676 2
57: 3.047 NA 423.6085 2
58: 2.559 NA 421.9701 2
59: 2.119 NA 410.8265 2
60: NA NA 404.4327 2
61: NA NA 401.7433 2
62: NA NA 397.9707 2
63: NA NA 389.2195 2
64: 2.147 63.16 379.0507 2
65: 2.405 NA 371.2411 2
66: 2.543 61.44 370.1493 2
67: 2.374 NA 365.7072 2
68: 2.962 60.45 367.7261 2
69: 3.375 69.92 370.8189 2
70: 3.002 69.54 368.1045 2
71: 2.785 67.86 365.2104 2
72: 2.643 73.45 366.9838 2
73: 2.304 NA 370.7158 2
74: 2.052 NA 371.3767 2
75: 2.116 NA 370.1482 2
76: 2.203 71.70 367.5164 2
77: 2.574 NA 365.9738 2
78: 2.537 NA 367.5455 2
79: 2.306 NA 368.9097 2
80: NA NA 366.8438 2
81: 2.164 NA 361.4221 2
82: NA NA 363.1824 2
83: NA NA 364.9451 2
84: NA NA 362.9793 2
85: NA NA 364.1421 2
86: NA NA 360.9064 2
87: NA NA 359.4199 2
88: NA NA 358.8081 2
89: NA NA 354.5116 2
90: 1.406 NA 352.8780 3
91: 0.975 NA 351.8854 3
92: 1.480 66.98 354.0268 3
93: 0.473 39.31 364.0585 3
94: 0.689 41.21 368.6769 3
95: 0.046 NA 382.3471 3
96: 0.498 NA 385.0213 3
97: 1.847 NA 385.3837 3
98: 2.079 NA 390.9940 3
99: 2.454 NA 388.8896 3
100: NA NA 386.2610 3
a b c month
threshold_cols <- c("a", "b")
setDT(df)
for(i in threshold_cols){
df[thresholds, on = c("month", sprintf("%s<%s.min", i, i)), (i) := NA][
thresholds, on = c("month", sprintf("%s>%s.max", i, i)), (i) := NA]
}