Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/78.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 如果发现重复,则对数据帧进行变异_R - Fatal编程技术网

R 如果发现重复,则对数据帧进行变异

R 如果发现重复,则对数据帧进行变异,r,R,我有一个数据帧,我正在尝试对一个新列进行变异,并给出1,0来复制找到的列。例如,我有如下数据框 df4 <- data.frame(emp_id =c("DEV-2962","KTN_2252","ANA2719","ITI_2624","DEV2698","HRT2921","","KTN2624","DEV2698&q

我有一个数据帧,我正在尝试对一个新列进行变异,并给出1,0来复制找到的列。例如,我有如下数据框

df4 <- data.frame(emp_id =c("DEV-2962","KTN_2252","ANA2719","ITI_2624","DEV2698","HRT2921","","KTN2624","DEV2698","ITI2535","DEV2698","HRT2837","ERV2951","KTN2542","ANA2813","ITI2210"),
                  email = c("akash.dev@abcd.com","rahul.singh@abcd.com","salman.abbas@abcd.com","ram.lal@abcd.com","ram.lal@xyz.com","prabal.garg@xyz.com","sanu.ali@abcd.com","kunal.singh@abcd.com","lakhan.tomar@abcd.com","praveen.thakur@abcd.com","sarman.ali@abcd.com","zuber.khan@dkl.com","giriraj.singh@dkl.com","lokesh.sharma@abcd.com","pooja.pawar@abcd.com","nikita.sharma@abcd.com"))

df4%
mutate(`Duplicate_id`=ifelse(duplicated(df4[!!EMP_id]])==TRUE,“00.id duplicated”,“”)
它创建了一个名为“empid”的不必要的新列,但输出应该类似于

> df4 %>% group_by(!!sym(EMP_ID)) %>%
+   mutate(`Duplicate_id` = ifelse(duplicated(!!sym(EMP_ID)),"00. id duplicated",''))
# A tibble: 16 x 3
# Groups:   emp_id [14]
   emp_id   email                   Duplicate_id     
   <fct>    <fct>                   <chr>            
 1 DEV-2962 akash.dev@abcd.com      ""               
 2 KTN_2252 rahul.singh@abcd.com    ""               
 3 ANA2719  salman.abbas@abcd.com   ""               
 4 ITI_2624 ram.lal@abcd.com        ""               
 5 DEV2698  ram.lal@xyz.com         ""               
 6 HRT2921  prabal.garg@xyz.com     ""               
 7 ""       sanu.ali@abcd.com       ""               
 8 KTN2624  kunal.singh@abcd.com    ""               
 9 DEV2698  lakhan.tomar@abcd.com   00. id duplicated
10 ITI2535  praveen.thakur@abcd.com ""               
11 DEV2698  sarman.ali@abcd.com     00. id duplicated
12 HRT2837  zuber.khan@dkl.com      ""               
13 ERV2951  giriraj.singh@dkl.com   ""               
14 KTN2542  lokesh.sharma@abcd.com  ""               
15 ANA2813  pooja.pawar@abcd.com    ""               
16 ITI2210  nikita.sharma@abcd.com  ""               
> 
>df4%>%group\u by(!!sym(EMP\u ID))%>%
+mutate(`Duplicate_id`=ifelse(duplicated(!!sym(EMP_id)),“00.id duplicated”,“”))
#一个tibble:16 x 3
#组别:emp_id[14]
emp\u id电子邮件副本\u id
1 DEV-2962阿卡什。dev@abcd.com      ""               
2 KTN_2252 rahul。singh@abcd.com    ""               
3.2719萨尔曼。abbas@abcd.com   ""               
4 ITI_2624 ram。lal@abcd.com        ""               
5 DEV2698 ram。lal@xyz.com         ""               
6 HRT2921普拉巴尔。garg@xyz.com     ""               
7英寸萨努。ali@abcd.com       ""               
8 KTN2624库纳尔。singh@abcd.com    ""               
9 DEV2698 lakhan。tomar@abcd.com0重复的身份证
10.2535普拉文。thakur@abcd.com ""               
11 DEV2698萨尔曼。ali@abcd.com0重复的身份证
12 HRT2837 zuber。khan@dkl.com      ""               
13.2951吉里拉杰。singh@dkl.com   ""               
14 KTN2542勒克什。sharma@abcd.com  ""               
15 ANA2813 pooja。pawar@abcd.com    ""               
16.2210尼基塔。sharma@abcd.com  ""               
> 

实际上,您不需要dplyr::groupby

library(dplyr)

EMP_ID <- "emp_id"

df4 %>% 
  dplyr::mutate(`Duplicate_id` = ifelse(duplicated(!!sym(EMP_ID)),"00. id duplicated",''))
库(dplyr)
EMP_ID%
dplyr::mutate(`Duplicate_id`=ifelse(duplicated(!!sym(EMP_id)),“00.id duplicated”,“”))

如果您想在表的组中查找重复项,可以使用group_by(例如,按“电子邮件”列分组,然后查找重复的“emp_id”,反之亦然)

我刚刚更新了我的代码,我正在给一个用户输入参数,以给出列的名称,并且我正在代码中回忆。@newmer,已经更新了我的答案,请检查一下这是否适合你。
library(dplyr)

EMP_ID <- "emp_id"

df4 %>% 
  dplyr::mutate(`Duplicate_id` = ifelse(duplicated(!!sym(EMP_ID)),"00. id duplicated",''))