Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/84.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 删除变量的一个值出现一次的ID_R_Dataframe_Dplyr - Fatal编程技术网

R 删除变量的一个值出现一次的ID

R 删除变量的一个值出现一次的ID,r,dataframe,dplyr,R,Dataframe,Dplyr,我希望我能解释清楚 我有一个这样的数据集 dataset <- data.frame(ID = c(1,1,1,2,2,2,3,3,3), Invoice = c(1,2,3,1,2,3,1,2,3), Invoice_Date = c('09/30/2019','10/30/2019','11/30/2019', '10

我希望我能解释清楚

我有一个这样的数据集

dataset <- data.frame(ID = c(1,1,1,2,2,2,3,3,3), 
                      Invoice = c(1,2,3,1,2,3,1,2,3), 
                      Invoice_Date = c('09/30/2019','10/30/2019','11/30/2019',
                                       '10/31/2019','11/30/2019','12/31/2019',
                                       '7/31/2019','9/30/2019','12/31/2019'),
                      paid_unpaid = c('no','yes','yes','yes','no','no','no','yes','no'), 
                      stringsAsFactors = FALSE)
dataset$Invoice_Date <- as.Date(dataset$Invoice_Date, '%m/%d/%y')  
我想选择有多张未付发票的客户。因此,no在变量paid或not中出现的频率不止一次

选择后,我的理想数据如下所示 数据集$Invoice\u Date您可以执行以下操作:

library(dplyr)

dataset %>%
  group_by(ID) %>%
  filter(sum(paid_unpaid == 'no') > 1)
输出:

# A tibble: 6 x 4
# Groups:   ID [2]
     ID Invoice Invoice_Date paid_unpaid
  <dbl>   <dbl> <date>       <chr>      
1     2       1 2020-10-31   yes        
2     2       2 2020-11-30   no         
3     2       3 2020-12-31   no         
4     3       1 2020-07-31   no         
5     3       2 2020-09-30   yes        
6     3       3 2020-12-31   no    
我们可以使用基于R的表的子集

library(dplyr)

dataset %>%
  group_by(ID) %>%
  filter(sum(paid_unpaid == 'no') > 1)
# A tibble: 6 x 4
# Groups:   ID [2]
     ID Invoice Invoice_Date paid_unpaid
  <dbl>   <dbl> <date>       <chr>      
1     2       1 2020-10-31   yes        
2     2       2 2020-11-30   no         
3     2       3 2020-12-31   no         
4     3       1 2020-07-31   no         
5     3       2 2020-09-30   yes        
6     3       3 2020-12-31   no    
subset(dataset,  ID %in% names(which(table(ID, paid_unpaid == 'no')[, 2]> 1)))
#  ID Invoice Invoice_Date paid_unpaid
#4  2       1   2020-10-31         yes
#5  2       2   2020-11-30          no
#6  2       3   2020-12-31          no
#7  3       1   2020-07-31          no
#8  3       2   2020-09-30         yes
#9  3       3   2020-12-31          no