R 检查姓名是否与电子邮件中的姓名相同
我想用电子邮件中的名称验证名称,我正在尝试以下解决方案,但不适合我。 目的是检查姓名是否和电子邮件中的姓名完全相同,姓名可以用(空格、逗号、点)分隔,这就是我使用分隔符的原因R 检查姓名是否与电子邮件中的姓名相同,r,R,我想用电子邮件中的名称验证名称,我正在尝试以下解决方案,但不适合我。 目的是检查姓名是否和电子邮件中的姓名完全相同,姓名可以用(空格、逗号、点)分隔,这就是我使用分隔符的原因 df <- data.frame(name = c("Nic Hawk","tt dy","anz kpw p","timm ral","Karen Mulc","lew wey","su
df <- data.frame(name = c("Nic Hawk","tt dy","anz kpw p","timm ral","Karen Mulc","lew wey","sun mark"),
email = c("Nic.Hawk@tttt.com", "tt.dy@aquan@tttt.com", "anz.kpw.p@tttt.com", "frez.tal@tttt.com", "Karen.Mulc@tttt.com", "lew.wey@tttt.com", "wall.kit@tttt.com"))
Name= "name"
Email="email"
separator = " "
df <- df %>%
mutate(Name_match = map2_int(str_extract_all(Name, "\\w+"),
str_extract_all(str_remove(Email, "\\@.*"), "\\w+"),
~ +(!all(str_detect(.y, str_c(.x, collapse=" "))))))
df <- df %>%
separate(Name,
into = c("last_name", "first_name"),
sep = separator,
remove = FALSE) %>%
mutate(first_name = tolower(first_name),
last_name = tolower(last_name)) %>%
mutate(name_email_match = 0L*str_detect(Email,
paste0("^", first_name, separator, last_name,
"@\\w+\\.com$"))) %>%
select(-c(first_name, last_name))
df%
变异(名字=tolower(名字),
last_name=tolower(last_name))%>%
变异(名称\u电子邮件\u匹配=0L*str\u检测(电子邮件、,
粘贴0(“^”,名字,分隔符,姓氏,
“@\\w+\\.com$”))%>%
选择(-c(名字、姓氏))
输出应该是带有1和0的变异列(1表示真(匹配),0表示假(不匹配))试试这个:
library(dplyr)
library(stringr)
Name <- "name"
Email <- "email"
separator <- " "
df %>%
# everything to lower
mutate(across(all_of(c(Name, Email)), tolower)) %>%
# extract interesting part from email
mutate(email_name = str_extract(!!sym(Email), "([a-z.]+)(?=@.+)")) %>%
# replace . with separator
mutate(email_name = str_replace_all(email_name, "\\.", separator)) %>%
# compare
mutate(name_email_match = +(!!sym(Name) == email_name))
#> name email email_name name_email_match
#> 1 nic hawk nic.hawk@tttt.com nic hawk 1
#> 2 tt dy tt.dy@aquan@tttt.com tt dy 1
#> 3 anz kpw p anz.kpw.p@tttt.com anz kpw p 1
#> 4 timm ral frez.tal@tttt.com frez tal 0
#> 5 karen mulc karen.mulc@tttt.com karen mulc 1
#> 6 lew wey lew.wey@tttt.com lew wey 1
#> 7 sun mark wall.kit@tttt.com wall kit 0
这是否有效:
library(dplyr)
library(stringr)
df %>% mutate(name1 = str_remove_all(name, '\\s'), email1 = str_remove(str_remove_all(str_extract(email, '.*(?=@.*)'), '[\\.\\s]' ), '@.*')) %>%
mutate(op = +(str_detect(name1, email1))) %>% select(-c(name1, email1))
name email op
1 Nic Hawk Nic.Hawk@tttt.com 1
2 tt dy tt.dy@aquan@tttt.com 1
3 anz kpw p anz.kpw.p@tttt.com 1
4 timm ral frez.tal@tttt.com 0
5 Karen Mulc Karen.Mulc@tttt.com 1
6 lew wey lew.wey@tttt.com 1
7 sun mark wall.kit@tttt.com 0
我记得我已经从你那里看到了一个类似的问题。。。我认为您不太清楚如何在dplyr中使用变量名。您不需要创建变量名称和电子邮件。只要写
name
和email
就可以了,因为它就在您的dplyr语句中:-)[即使这不能解决您的问题,它也会提高代码的质量]事实上,有时候数据有不同的列名和电子邮件名称,所以我根据数据中的名称为name和email提供了一个输入参数。那么你用错了。你需要这样写:!!符号(名称)
。我的建议是在开始时重命名它们,这样您就不必编写!!sym
每次。ok会更新,但我的代码不起作用name=“name”;名称=!!sym(名称)是这样的…??如果列名或电子邮件是空白的,我怎么能忽略Na和空白单元格呢。。你可以过滤它们,例如(签出?dplyr::filter
),如果我过滤掉它们,那么它也会影响原始数据。我不想现在这样,如果数据中有NAs,你只需在name\u email\u match
上获得一些NAs即可。你可以用零来填充它们。例如,使用%%>%tidyr::替换(列表(名称\u电子邮件\u匹配=0))
library(dplyr)
library(stringr)
df %>% mutate(name1 = str_remove_all(name, '\\s'), email1 = str_remove(str_remove_all(str_extract(email, '.*(?=@.*)'), '[\\.\\s]' ), '@.*')) %>%
mutate(op = +(str_detect(name1, email1))) %>% select(-c(name1, email1))
name email op
1 Nic Hawk Nic.Hawk@tttt.com 1
2 tt dy tt.dy@aquan@tttt.com 1
3 anz kpw p anz.kpw.p@tttt.com 1
4 timm ral frez.tal@tttt.com 0
5 Karen Mulc Karen.Mulc@tttt.com 1
6 lew wey lew.wey@tttt.com 1
7 sun mark wall.kit@tttt.com 0