R 如果发现重复,则对数据帧进行变异
我有一个数据帧,我正在尝试对一个新列进行变异,并给出1,0来复制找到的列。例如,我有如下数据框R 如果发现重复,则对数据帧进行变异,r,R,我有一个数据帧,我正在尝试对一个新列进行变异,并给出1,0来复制找到的列。例如,我有如下数据框 df4 <- data.frame(emp_id =c("DEV-2962","KTN_2252","ANA2719","ITI_2624","DEV2698","HRT2921","","KTN2624","DEV2698&q
df4 <- data.frame(emp_id =c("DEV-2962","KTN_2252","ANA2719","ITI_2624","DEV2698","HRT2921","","KTN2624","DEV2698","ITI2535","DEV2698","HRT2837","ERV2951","KTN2542","ANA2813","ITI2210"),
email = c("akash.dev@abcd.com","rahul.singh@abcd.com","salman.abbas@abcd.com","ram.lal@abcd.com","ram.lal@xyz.com","prabal.garg@xyz.com","sanu.ali@abcd.com","kunal.singh@abcd.com","lakhan.tomar@abcd.com","praveen.thakur@abcd.com","sarman.ali@abcd.com","zuber.khan@dkl.com","giriraj.singh@dkl.com","lokesh.sharma@abcd.com","pooja.pawar@abcd.com","nikita.sharma@abcd.com"))
df4%
mutate(`Duplicate_id`=ifelse(duplicated(df4[!!EMP_id]])==TRUE,“00.id duplicated”,“”)
它创建了一个名为“empid”的不必要的新列,但输出应该类似于
> df4 %>% group_by(!!sym(EMP_ID)) %>%
+ mutate(`Duplicate_id` = ifelse(duplicated(!!sym(EMP_ID)),"00. id duplicated",''))
# A tibble: 16 x 3
# Groups: emp_id [14]
emp_id email Duplicate_id
<fct> <fct> <chr>
1 DEV-2962 akash.dev@abcd.com ""
2 KTN_2252 rahul.singh@abcd.com ""
3 ANA2719 salman.abbas@abcd.com ""
4 ITI_2624 ram.lal@abcd.com ""
5 DEV2698 ram.lal@xyz.com ""
6 HRT2921 prabal.garg@xyz.com ""
7 "" sanu.ali@abcd.com ""
8 KTN2624 kunal.singh@abcd.com ""
9 DEV2698 lakhan.tomar@abcd.com 00. id duplicated
10 ITI2535 praveen.thakur@abcd.com ""
11 DEV2698 sarman.ali@abcd.com 00. id duplicated
12 HRT2837 zuber.khan@dkl.com ""
13 ERV2951 giriraj.singh@dkl.com ""
14 KTN2542 lokesh.sharma@abcd.com ""
15 ANA2813 pooja.pawar@abcd.com ""
16 ITI2210 nikita.sharma@abcd.com ""
>
>df4%>%group\u by(!!sym(EMP\u ID))%>%
+mutate(`Duplicate_id`=ifelse(duplicated(!!sym(EMP_id)),“00.id duplicated”,“”))
#一个tibble:16 x 3
#组别:emp_id[14]
emp\u id电子邮件副本\u id
1 DEV-2962阿卡什。dev@abcd.com ""
2 KTN_2252 rahul。singh@abcd.com ""
3.2719萨尔曼。abbas@abcd.com ""
4 ITI_2624 ram。lal@abcd.com ""
5 DEV2698 ram。lal@xyz.com ""
6 HRT2921普拉巴尔。garg@xyz.com ""
7英寸萨努。ali@abcd.com ""
8 KTN2624库纳尔。singh@abcd.com ""
9 DEV2698 lakhan。tomar@abcd.com0重复的身份证
10.2535普拉文。thakur@abcd.com ""
11 DEV2698萨尔曼。ali@abcd.com0重复的身份证
12 HRT2837 zuber。khan@dkl.com ""
13.2951吉里拉杰。singh@dkl.com ""
14 KTN2542勒克什。sharma@abcd.com ""
15 ANA2813 pooja。pawar@abcd.com ""
16.2210尼基塔。sharma@abcd.com ""
>
实际上,您不需要dplyr::groupby
library(dplyr)
EMP_ID <- "emp_id"
df4 %>%
dplyr::mutate(`Duplicate_id` = ifelse(duplicated(!!sym(EMP_ID)),"00. id duplicated",''))
库(dplyr)
EMP_ID%
dplyr::mutate(`Duplicate_id`=ifelse(duplicated(!!sym(EMP_id)),“00.id duplicated”,“”))
如果您想在表的组中查找重复项,可以使用group_by(例如,按“电子邮件”列分组,然后查找重复的“emp_id”,反之亦然)我刚刚更新了我的代码,我正在给一个用户输入参数,以给出列的名称,并且我正在代码中回忆。@newmer,已经更新了我的答案,请检查一下这是否适合你。
library(dplyr)
EMP_ID <- "emp_id"
df4 %>%
dplyr::mutate(`Duplicate_id` = ifelse(duplicated(!!sym(EMP_ID)),"00. id duplicated",''))