Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/83.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 如果在另一列中找到特定值1次或多次,如何保留列ID的所有实例_R_Dplyr_Multiple Columns_Data Manipulation - Fatal编程技术网

R 如果在另一列中找到特定值1次或多次,如何保留列ID的所有实例

R 如果在另一列中找到特定值1次或多次,如何保留列ID的所有实例,r,dplyr,multiple-columns,data-manipulation,R,Dplyr,Multiple Columns,Data Manipulation,我有这样的数据(大约400万行): 我想做的是,如果一个特定ID在“代码”列中有一个或多个“F11”实例,那么我想保留该ID的所有实例(而不仅仅是在“代码”中有“F11”的行)。否则,我想从数据库中删除整个条目,只剩下那些至少有一个F11的ID 换句话说,这是我想要的结果(删除了两个条目): 我假设我可以使用dplyr并尝试了以下命令: placeholder <- mutate(Flag = ifelse(file1$icd10_code == 'F11\\.*',1,0) %>%

我有这样的数据(大约400万行):

我想做的是,如果一个特定ID在“代码”列中有一个或多个“F11”实例,那么我想保留该ID的所有实例(而不仅仅是在“代码”中有“F11”的行)。否则,我想从数据库中删除整个条目,只剩下那些至少有一个F11的ID

换句话说,这是我想要的结果(删除了两个条目):

我假设我可以使用dplyr并尝试了以下命令:

placeholder <- mutate(Flag = ifelse(file1$icd10_code == 'F11\\.*',1,0) %>% group_by(file1$new_id) %>% mutate (max_flag = max(flag)))
如果组中的任何值与目标值匹配,则可以使用
any()
保留案例:

library(dplyr)

file1 %>% 
  group_by(ID) %>%
  filter(any(CODE == "F11"))

   ID      CODE  DATE     
   <fct>   <fct> <fct>    
 1 A567001 F11   1/1/2019 
 2 A567001 T67   1/1/2019 
 3 A567001 P09   1/5/2019 
 4 A567001 F11   1/7/2019 
 5 A568002 F11   1/9/2019 
 6 A568002 A56   1/9/2019 
 7 A002456 F11   1/10/2019
 8 A002456 H09   1/11/2019
 9 A021324 F11   1/11/2019
10 A021324 G65   1/10/2019
11 B125983 F11   1/9/2019 

我们可以使用
data.table
。使用
setDT
转换为“data.table”,按“ID”分组,使用%中的
%检查单个真/假,并将data.table子集

library(data.table)
setDT(file1)[, .SD['F11' %in% CODE], ID]
或者不是固定的匹配

setDT(file1)[, .SD[any(grepl("F11\\.x", CODE))], ID]

或者使用
dplyr
,使用相同的逻辑

library(dplyr)
file1 %>%
    group_by(ID) %>%
    filter('F11' %in% CODE)
# A tibble: 11 x 3
# Groups:   ID [5]
#   ID      CODE  DATE     
#   <chr>   <chr> <chr>    
# 1 A567001 F11   1/1/2019 
# 2 A567001 T67   1/1/2019 
# 3 A567001 P09   1/5/2019 
# 4 A567001 F11   1/7/2019 
# 5 A568002 F11   1/9/2019 
# 6 A568002 A56   1/9/2019 
# 7 A002456 F11   1/10/2019
# 8 A002456 H09   1/11/2019
# 9 A021324 F11   1/11/2019
#10 A021324 G65   1/10/2019
#11 B125983 F11   1/9/2019 

或使用
base R

subset(file1, ave(CODE == 'F11', ID, FUN = any))
数据
file1@h1我该怎么做呢?换句话说,如果我想删除F11至少出现一次的ID的任何实例,只需否定它-
filter(!any(grepl(^F11\\”,code))
library(data.table)
setDT(file1)[, .SD['F11' %in% CODE], ID]
setDT(file1)[, .SD[any(grepl("F11\\.x", CODE))], ID]
library(dplyr)
file1 %>%
    group_by(ID) %>%
    filter('F11' %in% CODE)
# A tibble: 11 x 3
# Groups:   ID [5]
#   ID      CODE  DATE     
#   <chr>   <chr> <chr>    
# 1 A567001 F11   1/1/2019 
# 2 A567001 T67   1/1/2019 
# 3 A567001 P09   1/5/2019 
# 4 A567001 F11   1/7/2019 
# 5 A568002 F11   1/9/2019 
# 6 A568002 A56   1/9/2019 
# 7 A002456 F11   1/10/2019
# 8 A002456 H09   1/11/2019
# 9 A021324 F11   1/11/2019
#10 A021324 G65   1/10/2019
#11 B125983 F11   1/9/2019 
library(stringr)
file1 %>%
    group_by(ID) %>%
    filter(any(str_detect(CODE, "F11\\.x")))
subset(file1, ave(CODE == 'F11', ID, FUN = any))
file1 <- structure(list(ID = c("A567001", "A567001", "A567001", "A567001", 
"A568002", "A568002", "A567891", "A002456", "A002456", "A021324", 
"A021324", "B125983", "C172749"), CODE = c("F11", "T67", "P09", 
"F11", "F11", "A56", "C45", "F11", "H09", "F11", "G65", "F11", 
"H76"), DATE = c("1/1/2019", "1/1/2019", "1/5/2019", "1/7/2019", 
"1/9/2019", "1/9/2019", "1/7/2019", "1/10/2019", "1/11/2019", 
"1/11/2019", "1/10/2019", "1/9/2019", "1/8/2019")), class = "data.frame", row.names = c(NA, 
-13L))