使用R中的其他阈值表替换数据帧中的数据

使用R中的其他阈值表替换数据帧中的数据,r,dataframe,filter,threshold,R,Dataframe,Filter,Threshold,我有两个数据帧,第一个有几个月的数据,第二个有阈值(最小值和最大值,每个月不同)。现在我想用NA替换阈值之外的任何值 数据帧的结构如下所示: 数据有名称为“月”、“a”、“b”和“c”的列。阈值有“月”、“a.min”、“a.max”、“b.min”和“b.max” 阈值以R为基数: # use merge to pull in the thresholds outcome <- merge(df, thresholds, all.x=TRUE, by="month") # define

我有两个数据帧,第一个有几个月的数据,第二个有阈值(最小值和最大值,每个月不同)。现在我想用NA替换阈值之外的任何值

数据帧的结构如下所示: 数据有名称为“月”、“a”、“b”和“c”的列。阈值有“月”、“a.min”、“a.max”、“b.min”和“b.max”

阈值以R为基数:

# use merge to pull in the thresholds
outcome <- merge(df, thresholds, all.x=TRUE, by="month")

# define the columns to look at, that require a .min, .max column
threshold_cols <- c("a", "b")

# loop and update
for(i in threshold_cols){
  # create a condition vector to highlight ones out of the range
  con <- outcome[[i]] < outcome[[sprintf("%s.min", i)]] |
    outcome[[i]] > outcome[[sprintf("%s.max", i)]]
  # force these as NA
  outcome[[i]][con] <- NA
}
#使用merge拉入阈值

结果使用
dplyr
您可以这样做

library(dplyr)

df2 <- df %>% 
  left_join(thresholds) %>% 
  mutate(a=ifelse(a > a.min & a < a.max, a, NA),
         b=ifelse(b > b.min & b < b.max, b, NA)) %>% 
  select(month, a, b, c)

df2
    month     a     b        c
1       1 3.693    NA 384.3990
2       1    NA    NA 388.0435
3       1 3.068    NA 391.1580
4       1 2.633    NA 394.1089
5       1 3.047    NA 396.2393
6       1 3.072    NA 397.7653
7       1 3.278    NA 405.9039
...
库(dplyr)
df2%
左联合(阈值)%>%
突变(a=ifelse(a>a.min&ab.min&b%
选择(月份、a、b、c)
df2
a月b月c
1 1 3.693 NA 384.3990
2 1 NA 388.0435
3 1 3.068 NA 391.1580
412.633NA 394.1089
5 1 3.047 NA 396.2393
613.072NA 397.7653
713.278NA 405.9039
...

或者,这可以通过一系列非等更新连接来实现:

编辑

OP在一篇文章中透露,他希望在一个包含许多列的巨大数据帧上运行该解决方案

非等更新联接的序列也可以在循环中执行:

threshold_cols <- c("a", "b")
setDT(df)
for(i in threshold_cols){
  df[thresholds, on = c("month", sprintf("%s<%s.min", i, i)), (i) := NA][
    thresholds, on = c("month", sprintf("%s>%s.max", i, i)), (i) := NA]
}

threshold\u cols你说的“应用限制”是什么意思?我的意思是使用阈值,那些阈值之外的任何内容都应该成为NAThis,这很好,但是如果有更多的列,那么每一列都必须在一行中处理。我想在一个巨大的DataFrame上运行它这很好,但是如果有更多的列,那么每一列都必须在一行中处理。我想在一个巨大的数据帧上运行它
library(dplyr)

df2 <- df %>% 
  left_join(thresholds) %>% 
  mutate(a=ifelse(a > a.min & a < a.max, a, NA),
         b=ifelse(b > b.min & b < b.max, b, NA)) %>% 
  select(month, a, b, c)

df2
    month     a     b        c
1       1 3.693    NA 384.3990
2       1    NA    NA 388.0435
3       1 3.068    NA 391.1580
4       1 2.633    NA 394.1089
5       1 3.047    NA 396.2393
6       1 3.072    NA 397.7653
7       1 3.278    NA 405.9039
...
library(data.table)
setDT(df)[setDT(thresholds), on = .(month, a < a.min), a := NA][
  thresholds, on = .(month, a > a.max), a := NA][
    thresholds, on = .(month, b < b.min), b := NA][
      thresholds, on = .(month, b > b.max), b := NA][]
         a     b        c month
  1: 3.693    NA 384.3990     1
  2:    NA    NA 388.0435     1
  3: 3.068    NA 391.1580     1
  4: 2.633    NA 394.1089     1
  5: 3.047    NA 396.2393     1
  6: 3.072    NA 397.7653     1
  7: 3.278    NA 405.9039     1
  8: 3.533    NA 413.3497     1
  9: 3.406    NA 413.8737     1
 10: 2.893    NA 412.4252     1
 11: 2.722    NA 401.0619     1
 12:    NA    NA 395.5369     1
 13: 1.994 63.70 393.3440     1
 14: 1.743    NA 390.2218     1
 15: 1.958    NA 380.8314     1
 16: 2.030    NA 370.9777     1
 17: 2.222 56.69 365.3473     1
 18: 2.207    NA 365.9187     1
 19: 2.393 56.74 362.2083     1
 20: 2.731 50.95 368.0958     1
 21:    NA 65.32 369.2954     1
 22: 4.065    NA 369.1633     1
 23: 3.458 67.36 367.9333     1
 24: 3.142 65.04 364.1945     1
 25: 2.705 60.00 359.7283     1
 26:    NA 53.26 357.4523     1
 27: 1.794    NA 357.9721     1
 28: 2.139    NA 356.7934     1
 29: 2.455 57.16 355.4262     1
 30: 2.830    NA 358.4297     1
 31: 3.008 63.45 357.7325     1
 32: 3.358 52.17 362.7329     1
 33: 3.663 56.59 365.4261     1
 34: 2.936 54.27 363.8837     1
 35: 2.636    NA 362.5658     2
 36: 2.420    NA 363.5668     2
 37: 3.403    NA 369.6555     2
 38: 2.830    NA 366.5757     2
 39: 2.740    NA 360.5511     2
 40: 3.119    NA 360.7731     2
 41: 2.376    NA 360.5672     2
 42: 3.285    NA 363.6154     2
 43: 3.267    NA 367.0974     2
 44: 2.966    NA 363.4489     2
 45: 3.675 60.77 373.0476     2
 46: 2.803    NA 379.0865     2
 47: 3.097    NA 382.3346     2
 48: 3.381    NA 386.7982     2
 49: 2.774    NA 394.0651     2
 50: 3.335    NA 398.8354     2
 51: 3.857    NA 398.6193     2
 52: 2.854    NA 401.3643     2
 53: 3.093 69.54 401.9453     2
 54: 2.368 70.30 405.3331     2
 55: 2.800    NA 417.1013     2
 56: 2.643    NA 425.4676     2
 57: 3.047    NA 423.6085     2
 58: 2.559    NA 421.9701     2
 59: 2.119    NA 410.8265     2
 60:    NA    NA 404.4327     2
 61:    NA    NA 401.7433     2
 62:    NA    NA 397.9707     2
 63:    NA    NA 389.2195     2
 64: 2.147 63.16 379.0507     2
 65: 2.405    NA 371.2411     2
 66: 2.543 61.44 370.1493     2
 67: 2.374    NA 365.7072     2
 68: 2.962 60.45 367.7261     2
 69: 3.375 69.92 370.8189     2
 70: 3.002 69.54 368.1045     2
 71: 2.785 67.86 365.2104     2
 72: 2.643 73.45 366.9838     2
 73: 2.304    NA 370.7158     2
 74: 2.052    NA 371.3767     2
 75: 2.116    NA 370.1482     2
 76: 2.203 71.70 367.5164     2
 77: 2.574    NA 365.9738     2
 78: 2.537    NA 367.5455     2
 79: 2.306    NA 368.9097     2
 80:    NA    NA 366.8438     2
 81: 2.164    NA 361.4221     2
 82:    NA    NA 363.1824     2
 83:    NA    NA 364.9451     2
 84:    NA    NA 362.9793     2
 85:    NA    NA 364.1421     2
 86:    NA    NA 360.9064     2
 87:    NA    NA 359.4199     2
 88:    NA    NA 358.8081     2
 89:    NA    NA 354.5116     2
 90: 1.406    NA 352.8780     3
 91: 0.975    NA 351.8854     3
 92: 1.480 66.98 354.0268     3
 93: 0.473 39.31 364.0585     3
 94: 0.689 41.21 368.6769     3
 95: 0.046    NA 382.3471     3
 96: 0.498    NA 385.0213     3
 97: 1.847    NA 385.3837     3
 98: 2.079    NA 390.9940     3
 99: 2.454    NA 388.8896     3
100:    NA    NA 386.2610     3
         a     b        c month
threshold_cols <- c("a", "b")
setDT(df)
for(i in threshold_cols){
  df[thresholds, on = c("month", sprintf("%s<%s.min", i, i)), (i) := NA][
    thresholds, on = c("month", sprintf("%s>%s.max", i, i)), (i) := NA]
}