Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/72.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 验证两个数据帧中的值_R_Dplyr - Fatal编程技术网

R 验证两个数据帧中的值

R 验证两个数据帧中的值,r,dplyr,R,Dplyr,我有两个数据帧,大约有100万条记录,现在我正在尝试检查Uniq_ID是否存在于df2中,而对于city=mum,是否存在于df1中。然后用1或0对df2进行突变,以确定是真还是假 df1 <- data.frame(ID =c("DEV2962","KTN2252","ANA2719","ITI2624","DEV2698","HRT2921",""

我有两个数据帧,大约有100万条记录,现在我正在尝试检查Uniq_ID是否存在于df2中,而对于city=mum,是否存在于df1中。然后用1或0对df2进行突变,以确定是真还是假

df1 <- data.frame(ID =c("DEV2962","KTN2252","ANA2719","ITI2624","DEV2698","HRT2921","","KTN2624","ANA2548","ITI2535","DEV2732","HRT2837","ERV2951","KTN2542","ANA2813","ITI2210"),
                  city=c("del","mum","mum","pun","bang","mum","triv","vish","mum","mum","bang","vish","mum","kol","noi","mum"))
df2 <- data.frame(Uniq_ID =c("DEV2962","KTN2252","ANA2719","H7236","DEV2692","HRT2921","","KTN2624","ANA2548","ITI2535","DEV2732","HRT2831","ERV2951","KTN2542","ANA2813","ITI2210"),
                  city=c("del","mum","bho","pun","mum","chen","mum","vish","mum","mum","bang","mum","mum","kol","noi","mum"))



df1在这种情况下,我们可以使用base R。这是否有效:

> df2$ID_not_in_df1 <- ifelse(!df2$Uniq_ID %in% df1$ID & df2$city == 'mum', 1 ,0)
> df2
   Uniq_ID city ID_not_in_df1
1  DEV2962  del             0
2  KTN2252  mum             0
3  ANA2719  bho             0
4    H7236  pun             0
5  DEV2692  mum             1
6  HRT2921 chen             0
7           mum             0
8  KTN2624 vish             0
9  ANA2548  mum             0
10 ITI2535  mum             0
11 DEV2732 bang             0
12 HRT2831  mum             1
13 ERV2951  mum             0
14 KTN2542  kol             0
15 ANA2813  noi             0
16 ITI2210  mum             0
> 
>df2$ID不在df1 df2中
唯一标识城市标识不在df1中
1 DEV2962 del 0
2 KTN252毫米0
3 ANA2719 bho 0
4 H7236双关0
5 DEV2692 mum 1
6 HRT2921陈0
7妈妈0
8 KTN2624 vish 0
9.2548.0
10.2535妈妈0
11 DEV2732 bang 0
12 HRT2831妈妈1
13.1.0
14 KTN2542 kol 0
15 ANA2813 noi 0
16.2210.0
> 

您可以将1分配给
df2
中的
Uniq\u ID
,该Uniq\u ID在
df1
中不存在,并且具有
city='mum'

df2$ID_not_in_df1 <- +(!df2$Uniq_ID %in% unique(df1$ID) & df2$city == 'mum')
df2

#   Uniq_ID city ID_not_in_df1
#1  DEV2962  del             0
#2  KTN2252  mum             0
#3  ANA2719  bho             0
#4    H7236  pun             0
#5  DEV2692  mum             1
#6  HRT2921 chen             0
#7           mum             0
#8  KTN2624 vish             0
#9  ANA2548  mum             0
#10 ITI2535  mum             0
#11 DEV2732 bang             0
#12 HRT2831  mum             1
#13 ERV2951  mum             0
#14 KTN2542  kol             0
#15 ANA2813  noi             0
#16 ITI2210  mum             0

df2$ID_not_in_df1 This:
df2$ID_not_in_df1我已经尝试过了,但是我在mum中有一个城市过滤条件在我的实际数据中我有很多col变量,所以默认情况下左连接将通过所有列连接,我的问题是,在孟买城市的df2中,像跨行或行方式使用所有唯一的id,然后检查它们是否存在于df1
df2$Uniq\u id%中的%unique(df1$id)&df2$city=='mum'
是一个逻辑条件,给出了
TRUE
/
FALSE
值<代码>+
TRUE
转换为1,将
FALSE
转换为0。