在r/python中查找id列之间的相似性
数据如下:在r/python中查找id列之间的相似性,python,r,Python,R,数据如下: id <- c(1,1,2,1,3,2) address <- c("ABC Ret1","ABC","NY AB1","XYZ","DEL1","NY AB") similar_address <- data.frame(id,address) id使用qlcMatrix包中的函数sim.strings: get_count_of_simila
id <- c(1,1,2,1,3,2)
address <- c("ABC Ret1","ABC","NY AB1","XYZ","DEL1","NY AB")
similar_address <- data.frame(id,address)
id使用qlcMatrix包中的函数sim.strings:
get_count_of_similar_strings = function(x){
issim=(sum(sim.strings(x)>=.5) - length(x))/1
isnotsim=length(x)-issim
c(issim,isnotsim)
}
out = by(similar_address$address
,similar_address$id
,get_count_of_similar_strings
,simplify = T)
data.frame(id=unique(similar_address$id),t(sapply(out,I)))
这篇文章应该有帮助。JanLauGe图书馆(qlcMatrix)id的答案