如何在R中的data.table列中矢量化最长的公共子字符串

如何在R中的data.table列中矢量化最长的公共子字符串,r,string,data.table,substring,lcs,R,String,Data.table,Substring,Lcs,如何创建一个函数,使我能够快速计算最长公共子字符串中的字符数,或返回R中大型data.table中两列或多列之间的最长公共子字符串? 我修改了这个问题的答案:但是有1.)在使用sapply创建一个新的结果列时,在向量上应用空格和其他字符串功能失败时出现问题,2.)在超过2列上应用问题,3.)给定的答案在潜在匹配中不包含空格,我想这样做。这个功能也很慢,我想应用于大数据 创建示例数据: sampdata <- data.frame( str1=c("Doug Olivas&qu

如何创建一个函数,使我能够快速计算最长公共子字符串中的字符数,或返回R中大型data.table中两列或多列之间的最长公共子字符串?

我修改了这个问题的答案:但是有1.)在使用sapply创建一个新的结果列时,在向量上应用空格和其他字符串功能失败时出现问题,2.)在超过2列上应用问题,3.)给定的答案在潜在匹配中不包含空格,我想这样做。这个功能也很慢,我想应用于大数据

创建示例数据:

sampdata <- data.frame(
  str1=c("Doug Olivas", "GRANT MANAGEMENT LLC", "LUNA VAN DERESH", "wendy t marzardo", "AMIN NYGUEN COMPANY LLC", "GERARDO CONTRARAS", "miguel martinez","albert marks porter"),
  str2=c("doug olivas", "miguel grant", "LUNA VAN DERESH MANAGEMENT LLC", "marzardo", "amin nyguen llc", "gerardo contraras", "miggy martinez","albert"),
  str3=c("Martin Olivas", "GRANT PROPERTIES", "luna company", "wendy marzardo", "the company of amin nyguen llc", "gerardo c", "miguel t martinez","")
  )

#option type="nchar" to return number of characters INCLUDING SPACES, IGNORING CASE in max common substring
sampdata$desired_LCSnchar <- lcsfoo(sampdata$str1,sampdata$str2,sampdata$str3,type="nchar")

#option type="str" to return the string INCLUDING SPACES, IGNORING CASE of the longest common substring between the columns
sampdata$desired_LCSstr <- lcsfoo(sampdata$str1,sampdata$str2,sampdata$str3,type="str")

sampdata$str1str2_LCSnchar <- lcsfoo(sampdata$str1,sampdata$str2,type="nchar")
sampdata$str1str2_LCSstr <- lcsfoo(sampdata$str1,sampdata$str2,type="str")


sampdata不清楚您在问什么,
lcsfoo()
的定义在哪里?例如,一个组合的非工作函数
sampdata$str1str2_LCSnchar <- lcsfoo(sampdata$str1,sampdata$str2,type="nchar")
sampdata$str1str2_LCSstr <- lcsfoo(sampdata$str1,sampdata$str2,type="str")

sampdata$str1str2_LCSstr<- c("doug olivas","grant","luna van deresh","marzardo","amin nyguen ","gerardo contraras"," martinez","albert")
sampdata$str1str2_LCSnchar <- c(11,5,15,8,12,17,9,6)

library(data.table)
###Create sample big data from previous sampledata and apply on huge DT
samplist <- lapply(c(1:1000),FUN=function(x){sampdata})
bigsampdata <- rbindlist(samplist)

DESIRED FUNCTION APPLIED ON BIG DATA: 
bigsampdata$desired_LCSnchar <- lcsfoo(bigsampdata$str1,bigsampdata$str2,bigsampdata$str3,type="nchar")
bigsampdata$desired_LCSstr <- lcsfoo(bigsampdata$str1,bigsampdata$str2,bigsampdata$str3,type="str")