Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/date/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/meteor/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R:我必须在字符串中进行软匹配_R_Strsplit - Fatal编程技术网

R:我必须在字符串中进行软匹配

R:我必须在字符串中进行软匹配,r,strsplit,R,Strsplit,我必须用给定的输入字符串在一列数据帧中进行软匹配,如 col <- c("John Collingson","J Collingson","Dummy Name1","Dummy Name2") inputText <- "J Collingson" #Vice-Versa inputText <- "John Collingson" col似乎agrep就是您要寻找的功能。它进行近似字符串匹配(模糊匹配)。它根据某种距离度量返回与输入模式最接近的匹配,即广义Levensh

我必须用给定的输入字符串在一列数据帧中进行软匹配,如

col <- c("John Collingson","J Collingson","Dummy Name1","Dummy Name2")

inputText <- "J Collingson"
#Vice-Versa
inputText <- "John Collingson"

col似乎
agrep
就是您要寻找的功能。它进行
近似字符串匹配(模糊匹配)
。它根据某种距离度量返回与输入模式最接近的匹配,即广义Levenshtein编辑距离。有关更多详细信息,请参见agrep

agrep("J Collingson", col, value = TRUE)
[1] "John Collingson" "J Collingson"  

agrep
如果您只有一点数据,那么绝对是一个快速简便的base R解决方案。如果这只是一个较大数据帧的玩具示例,您可能会对一个更耐用的工具感兴趣。在过去的一个月里,了解到@PaulHiemstra(也用不同的语言)提到的Levenshtein距离,我找到了这个包裹。这些小插曲让我想要更多的“软”或“模糊”匹配的例子,特别是在超过1个字段中,但您的问题的基本答案可能是:

library(RecordLinkage)
col <- data.frame(names1 = c("John Collingson","J Collingson","Dummy Name1","Dummy Name2"))
inputText <- data.frame(names2 = c("J Collingson"))
g1 <- compare.linkage(inputText, col, strcmp = T)
g2 <- epiWeights(g1)
getPairs(g2, min.weight=0.6) 
# id          names2 Weight
# 1  1    J Collingson       
# 2  2    J Collingson  1.000
# 3                          
# 4  1    J Collingson       
# 5  1 John Collingson  0.815

inputText2 <- data.frame(names2 = c("Jon Collinson"))
g1 <- compare.linkage(inputText2, col, strcmp = T)
g2 <- epiWeights(g1)
getPairs(g2, min.weight=0.6)
# id          names2    Weight
# 1  1   Jon Collinson          
# 2  1 John Collingson 0.9644444
# 3                             
# 4  1   Jon Collinson          
# 5  2    J Collingson 0.7924825
库(记录链接)
上校