R 验证两个数据帧中的值
我有两个数据帧,大约有100万条记录,现在我正在尝试检查Uniq_ID是否存在于df2中,而对于city=mum,是否存在于df1中。然后用1或0对df2进行突变,以确定是真还是假R 验证两个数据帧中的值,r,dplyr,R,Dplyr,我有两个数据帧,大约有100万条记录,现在我正在尝试检查Uniq_ID是否存在于df2中,而对于city=mum,是否存在于df1中。然后用1或0对df2进行突变,以确定是真还是假 df1 <- data.frame(ID =c("DEV2962","KTN2252","ANA2719","ITI2624","DEV2698","HRT2921",""
df1 <- data.frame(ID =c("DEV2962","KTN2252","ANA2719","ITI2624","DEV2698","HRT2921","","KTN2624","ANA2548","ITI2535","DEV2732","HRT2837","ERV2951","KTN2542","ANA2813","ITI2210"),
city=c("del","mum","mum","pun","bang","mum","triv","vish","mum","mum","bang","vish","mum","kol","noi","mum"))
df2 <- data.frame(Uniq_ID =c("DEV2962","KTN2252","ANA2719","H7236","DEV2692","HRT2921","","KTN2624","ANA2548","ITI2535","DEV2732","HRT2831","ERV2951","KTN2542","ANA2813","ITI2210"),
city=c("del","mum","bho","pun","mum","chen","mum","vish","mum","mum","bang","mum","mum","kol","noi","mum"))
df1在这种情况下,我们可以使用base R。这是否有效:
> df2$ID_not_in_df1 <- ifelse(!df2$Uniq_ID %in% df1$ID & df2$city == 'mum', 1 ,0)
> df2
Uniq_ID city ID_not_in_df1
1 DEV2962 del 0
2 KTN2252 mum 0
3 ANA2719 bho 0
4 H7236 pun 0
5 DEV2692 mum 1
6 HRT2921 chen 0
7 mum 0
8 KTN2624 vish 0
9 ANA2548 mum 0
10 ITI2535 mum 0
11 DEV2732 bang 0
12 HRT2831 mum 1
13 ERV2951 mum 0
14 KTN2542 kol 0
15 ANA2813 noi 0
16 ITI2210 mum 0
>
>df2$ID不在df1 df2中
唯一标识城市标识不在df1中
1 DEV2962 del 0
2 KTN252毫米0
3 ANA2719 bho 0
4 H7236双关0
5 DEV2692 mum 1
6 HRT2921陈0
7妈妈0
8 KTN2624 vish 0
9.2548.0
10.2535妈妈0
11 DEV2732 bang 0
12 HRT2831妈妈1
13.1.0
14 KTN2542 kol 0
15 ANA2813 noi 0
16.2210.0
>
您可以将1分配给df2
中的Uniq\u ID
,该Uniq\u ID在df1
中不存在,并且具有city='mum'
df2$ID_not_in_df1 <- +(!df2$Uniq_ID %in% unique(df1$ID) & df2$city == 'mum')
df2
# Uniq_ID city ID_not_in_df1
#1 DEV2962 del 0
#2 KTN2252 mum 0
#3 ANA2719 bho 0
#4 H7236 pun 0
#5 DEV2692 mum 1
#6 HRT2921 chen 0
#7 mum 0
#8 KTN2624 vish 0
#9 ANA2548 mum 0
#10 ITI2535 mum 0
#11 DEV2732 bang 0
#12 HRT2831 mum 1
#13 ERV2951 mum 0
#14 KTN2542 kol 0
#15 ANA2813 noi 0
#16 ITI2210 mum 0
df2$ID_not_in_df1 This:df2$ID_not_in_df1我已经尝试过了,但是我在mum中有一个城市过滤条件在我的实际数据中我有很多col变量,所以默认情况下左连接将通过所有列连接,我的问题是,在孟买城市的df2中,像跨行或行方式使用所有唯一的id,然后检查它们是否存在于df1df2$Uniq\u id%中的%unique(df1$id)&df2$city=='mum'
是一个逻辑条件,给出了TRUE
/FALSE
值<代码>+
将TRUE
转换为1,将FALSE
转换为0。