R 删除变量的一个值出现一次的ID
我希望我能解释清楚 我有一个这样的数据集R 删除变量的一个值出现一次的ID,r,dataframe,dplyr,R,Dataframe,Dplyr,我希望我能解释清楚 我有一个这样的数据集 dataset <- data.frame(ID = c(1,1,1,2,2,2,3,3,3), Invoice = c(1,2,3,1,2,3,1,2,3), Invoice_Date = c('09/30/2019','10/30/2019','11/30/2019', '10
dataset <- data.frame(ID = c(1,1,1,2,2,2,3,3,3),
Invoice = c(1,2,3,1,2,3,1,2,3),
Invoice_Date = c('09/30/2019','10/30/2019','11/30/2019',
'10/31/2019','11/30/2019','12/31/2019',
'7/31/2019','9/30/2019','12/31/2019'),
paid_unpaid = c('no','yes','yes','yes','no','no','no','yes','no'),
stringsAsFactors = FALSE)
dataset$Invoice_Date <- as.Date(dataset$Invoice_Date, '%m/%d/%y')
我想选择有多张未付发票的客户。因此,no在变量paid或not中出现的频率不止一次
选择后,我的理想数据如下所示
数据集$Invoice\u Date您可以执行以下操作:
library(dplyr)
dataset %>%
group_by(ID) %>%
filter(sum(paid_unpaid == 'no') > 1)
输出:
# A tibble: 6 x 4
# Groups: ID [2]
ID Invoice Invoice_Date paid_unpaid
<dbl> <dbl> <date> <chr>
1 2 1 2020-10-31 yes
2 2 2 2020-11-30 no
3 2 3 2020-12-31 no
4 3 1 2020-07-31 no
5 3 2 2020-09-30 yes
6 3 3 2020-12-31 no
我们可以使用基于R的表的子集
library(dplyr)
dataset %>%
group_by(ID) %>%
filter(sum(paid_unpaid == 'no') > 1)
# A tibble: 6 x 4
# Groups: ID [2]
ID Invoice Invoice_Date paid_unpaid
<dbl> <dbl> <date> <chr>
1 2 1 2020-10-31 yes
2 2 2 2020-11-30 no
3 2 3 2020-12-31 no
4 3 1 2020-07-31 no
5 3 2 2020-09-30 yes
6 3 3 2020-12-31 no
subset(dataset, ID %in% names(which(table(ID, paid_unpaid == 'no')[, 2]> 1)))
# ID Invoice Invoice_Date paid_unpaid
#4 2 1 2020-10-31 yes
#5 2 2 2020-11-30 no
#6 2 3 2020-12-31 no
#7 3 1 2020-07-31 no
#8 3 2 2020-09-30 yes
#9 3 3 2020-12-31 no