Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/visual-studio-2008/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 使用对应ID名称的3个阈值条件对数据帧进行子集_R_Dataframe_Dplyr - Fatal编程技术网

R 使用对应ID名称的3个阈值条件对数据帧进行子集

R 使用对应ID名称的3个阈值条件对数据帧进行子集,r,dataframe,dplyr,R,Dataframe,Dplyr,我有一个这样的数据帧 ID <- c ("ABC_10","AZM_11","ABC_11","ABC_12", "ABC_13","AZM_12","ABC_14","ABC_15", "CZX_10","CZX_11","CZX_12","CZX_13", "FIN_10","FIN_11","FIN_12","FIN_13", "FNM_10","FNM_11","FXS_10","FXS_11") Id.n

我有一个这样的数据帧

ID <- c ("ABC_10","AZM_11","ABC_11","ABC_12",
         "ABC_13","AZM_12","ABC_14","ABC_15",
         "CZX_10","CZX_11","CZX_12","CZX_13",
         "FIN_10","FIN_11","FIN_12","FIN_13",
         "FNM_10","FNM_11","FXS_10","FXS_11")  
Id.n <- c(345,380,339,361,
          245,390,639,661,
          545,580,539,261,
          345,180,139,261,
          1045,1580,39,161)
df <- data.frame(ID,Id.n)
我的期望输出是

       ID Id.n
   ABC_10  345
   AZM_11  380
   ABC_11  339
   ABC_12  361
   AZM_12  390
   ABC_14  639
   ABC_15  661
   CZX_10  545
   CZX_11  580
   CZX_12  539
   FIN_10  345
   FIN_13  261
   FNM_10 1045
   FNM_11 1580
   FXS_11  161
我试图这样做,但只是没有得到它的权利

df <- subset(df,ifelse(grepl("FXS",df$ID), df$ID.n > 100,))
df 100,))

有人能给我指出正确的方向吗

使用
dplyr

library(dplyr)

df2 <- df %>%
  filter((grepl("FXS", ID) & Id.n > 100) | 
           (grepl("FIN", ID) & Id.n > 200) |
           (!grepl("FXS|FIN", ID) & Id.n > 300))

df2
 #     ID Id.n
 # ABC_10  345
 # AZM_11  380
 # ABC_11  339
 # ABC_12  361
 # AZM_12  390
 # ABC_14  639
 # ABC_15  661
 # CZX_10  545
 # CZX_11  580
 # CZX_12  539
 # FIN_10  345
 # FIN_13  261
 # FNM_10 1045
 # FNM_11 1580
 # FXS_11  161
库(dplyr)
df2%
过滤器((grepl(“FXS”,ID)和ID.n>100)|
(grepl(“FIN”,ID)和ID.n>200)|
(!grepl(“FXS | FIN”,ID)和ID.n>300))
df2
#身份证号码
#ABC_10 345
#阿兹穆乌11380
#美国广播公司11 339
#ABC_12 361
#阿兹穆乌12390
#ABC_14 639
#ABC_15 661
#捷克克苏10545
#捷克克苏11580
#捷克克苏12539
#财政部10345
#财务部13 261
#FNM_10 1045
#FNM_11 1580
#FXS_11 161

这对于净化数据来说更简单。使用data.table,看起来像

library(data.table)
setDT(df)
df[, c("x", "y") := tstrsplit(ID, "_")][, ID := NULL ]

xDT = data.table(x = unique(df$x))
xDT[, th := 300 ]    
xDT[.(x = c("FXS", "FIN"), th = c(100, 200)), on=.(x), th := i.th ]   
然后,非等联接用于过滤:

df[xDT, on=.(x, Id.n > th)]

    Id.n   x  y
 1:  300 ABC 11
 2:  300 ABC 10
 3:  300 ABC 12
 4:  300 ABC 14
 5:  300 ABC 15
 6:  300 AZM 11
 7:  300 AZM 12
 8:  300 CZX 12
 9:  300 CZX 10
10:  300 CZX 11
11:  200 FIN 13
12:  200 FIN 10
13:  300 FNM 10
14:  300 FNM 11
15:  100 FXS 11
关于这里的
grepl
,我想是的

df[xDT, on=.(x, Id.n > th)]

    Id.n   x  y
 1:  300 ABC 11
 2:  300 ABC 10
 3:  300 ABC 12
 4:  300 ABC 14
 5:  300 ABC 15
 6:  300 AZM 11
 7:  300 AZM 12
 8:  300 CZX 12
 9:  300 CZX 10
10:  300 CZX 11
11:  200 FIN 13
12:  200 FIN 10
13:  300 FNM 10
14:  300 FNM 11
15:  100 FXS 11
df[(grepl("FXS",df$ID) & df$Id.n >= 100) | 
       (grepl("FIN",df$ID) & df$Id.n >= 200) | 
       (!(grepl("FXS",df$ID) | grepl("FIN", df$ID)) & df$Id.n >= 300),]
#       ID Id.n
#1  ABC_10  345
#2  AZM_11  380
#3  ABC_11  339
#4  ABC_12  361
#6  AZM_12  390
#7  ABC_14  639
#8  ABC_15  661
#9  CZX_10  545
#10 CZX_11  580
#11 CZX_12  539
#13 FIN_10  345
#16 FIN_13  261
#17 FNM_10 1045
#18 FNM_11 1580
#20 FXS_11  161