R 从两列中检测相似的连续模式
比较两列的第三列的输出应为“是”。因为“太阳”、“绿色”和“天空”表示匹配。我们如何才能检测到它(最多三个连续字母)这里有一个可能性是R 从两列中检测相似的连续模式,r,text,expression,R,Text,Expression,比较两列的第三列的输出应为“是”。因为“太阳”、“绿色”和“天空”表示匹配。我们如何才能检测到它(最多三个连续字母)这里有一个可能性是tidyverse: x1= c("Sunwood", "Greengrass", "bluesky") x2= c("Sun wood", "green", "sky Pl") testframe = data.frame(Address1=x1, Address2=x2) 它检查“Address1”中的前三个元素是否与“Address2”中的前三个元素匹
tidyverse
:
x1= c("Sunwood", "Greengrass", "bluesky")
x2= c("Sun wood", "green", "sky Pl")
testframe = data.frame(Address1=x1, Address2=x2)
它检查“Address1”中的前三个元素是否与“Address2”中的前三个元素匹配(不考虑大小写)。如果是,则返回“是”,否则返回“否”
或者手动将机箱设置为降低:
testframe %>%
mutate_if(is.factor, as.character) %>%
mutate(Res = ifelse(str_detect(str_extract(Address1, "^.{3}"),
fixed(str_extract(Address2, "^.{3}"), ignore_case = TRUE)), "Yes", "No"))
Address1 Address2 Res
1 Sunwood Sun wood Yes
2 Greengrass green Yes
3 bluesky sky Pl No
相同,但基于@PoGibas的思想进行了简化:
testframe %>%
mutate_if(is.factor, as.character) %>%
mutate(Res = ifelse(str_detect(tolower(str_extract(Address1, "^.{3}")),
tolower(str_extract(Address2, "^.{3}"))), "Yes", "No"))
testframe$Address1 <- as.character(testframe$Address1)
testframe$Address2 <- as.character(testframe$Address2)
testframe$Res <- ifelse(tolower(sub("^(.{3}).*", "\\1", testframe$Address1)) %in%
tolower(sub("^(.{3}).*", "\\1", testframe$Address2)), "Yes", "No")
Address1 Address2 Res
1 Sunwood Sun wood Yes
2 Greengrass green Yes
3 bluesky sky Pl No
或者只使用基本R:
testframe %>%
mutate_if(is.factor, as.character) %>%
mutate(Res = ifelse(tolower(str_extract(Address1, "^.{3}")) == tolower(str_extract(Address2, "^.{3}")), "Yes", "No"))
testframe$Address1您可以substring
并使用ifelse
:ifelse(tolower(substring(testframe$Address1,1,3))==tolower(substring(testframe$Address2,1,3)),“yes”,“no”)
类似的问题,但答案未被投票/接受:
testframe$Res <- ifelse(tolower(substring(testframe$Address1, 1, 3)) %in%
tolower(substring(testframe$Address2, 1, 3)), "Yes", "No")