Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/288.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在r/python中查找id列之间的相似性_Python_R - Fatal编程技术网

在r/python中查找id列之间的相似性

在r/python中查找id列之间的相似性,python,r,Python,R,数据如下: id <- c(1,1,2,1,3,2) address <- c("ABC Ret1","ABC","NY AB1","XYZ","DEL1","NY AB") similar_address <- data.frame(id,address) id使用qlcMatrix包中的函数sim.strings: get_count_of_simila

数据如下:

id <- c(1,1,2,1,3,2)
address <- c("ABC Ret1","ABC","NY AB1","XYZ","DEL1","NY AB")
similar_address <- data.frame(id,address)

id使用qlcMatrix包中的函数sim.strings:

get_count_of_similar_strings = function(x){
  issim=(sum(sim.strings(x)>=.5) - length(x))/1
  isnotsim=length(x)-issim
  c(issim,isnotsim)
}

out = by(similar_address$address
,similar_address$id
,get_count_of_similar_strings
,simplify = T)    

data.frame(id=unique(similar_address$id),t(sapply(out,I)))

这篇文章应该有帮助。JanLauGe图书馆(qlcMatrix)id的答案