如何在R中的data.table列中矢量化最长的公共子字符串_R_String_Data.table_Substring_Lcs

如何在R中的data.table列中矢量化最长的公共子字符串

r string

如何在R中的data.table列中矢量化最长的公共子字符串,r,string,data.table,substring,lcs,R,String,Data.table,Substring,Lcs,如何创建一个函数，使我能够快速计算最长公共子字符串中的字符数，或返回R中大型data.table中两列或多列之间的最长公共子字符串？我修改了这个问题的答案：但是有1.）在使用sapply创建一个新的结果列时，在向量上应用空格和其他字符串功能失败时出现问题，2.）在超过2列上应用问题，3.）给定的答案在潜在匹配中不包含空格，我想这样做。这个功能也很慢，我想应用于大数据创建示例数据： sampdata <- data.frame( str1=c("Doug Olivas&qu

如何创建一个函数，使我能够快速计算最长公共子字符串中的字符数，或返回R中大型data.table中两列或多列之间的最长公共子字符串？

我修改了这个问题的答案：但是有1.）在使用sapply创建一个新的结果列时，在向量上应用空格和其他字符串功能失败时出现问题，2.）在超过2列上应用问题，3.）给定的答案在潜在匹配中不包含空格，我想这样做。这个功能也很慢，我想应用于大数据

创建示例数据：

sampdata <- data.frame(
  str1=c("Doug Olivas", "GRANT MANAGEMENT LLC", "LUNA VAN DERESH", "wendy t marzardo", "AMIN NYGUEN COMPANY LLC", "GERARDO CONTRARAS", "miguel martinez","albert marks porter"),
  str2=c("doug olivas", "miguel grant", "LUNA VAN DERESH MANAGEMENT LLC", "marzardo", "amin nyguen llc", "gerardo contraras", "miggy martinez","albert"),
  str3=c("Martin Olivas", "GRANT PROPERTIES", "luna company", "wendy marzardo", "the company of amin nyguen llc", "gerardo c", "miguel t martinez","")
  )

#option type="nchar" to return number of characters INCLUDING SPACES, IGNORING CASE in max common substring
sampdata$desired_LCSnchar <- lcsfoo(sampdata$str1,sampdata$str2,sampdata$str3,type="nchar")

#option type="str" to return the string INCLUDING SPACES, IGNORING CASE of the longest common substring between the columns
sampdata$desired_LCSstr <- lcsfoo(sampdata$str1,sampdata$str2,sampdata$str3,type="str")

sampdata$str1str2_LCSnchar <- lcsfoo(sampdata$str1,sampdata$str2,type="nchar")
sampdata$str1str2_LCSstr <- lcsfoo(sampdata$str1,sampdata$str2,type="str")

sampdata不清楚您在问什么，lcsfoo（）的定义在哪里？例如，一个组合的非工作函数
sampdata$str1str2_LCSnchar <- lcsfoo(sampdata$str1,sampdata$str2,type="nchar")
sampdata$str1str2_LCSstr <- lcsfoo(sampdata$str1,sampdata$str2,type="str")


sampdata$str1str2_LCSstr<- c("doug olivas","grant","luna van deresh","marzardo","amin nyguen ","gerardo contraras"," martinez","albert")
sampdata$str1str2_LCSnchar <- c(11,5,15,8,12,17,9,6)


library(data.table)
###Create sample big data from previous sampledata and apply on huge DT
samplist <- lapply(c(1:1000),FUN=function(x){sampdata})
bigsampdata <- rbindlist(samplist)

DESIRED FUNCTION APPLIED ON BIG DATA: 
bigsampdata$desired_LCSnchar <- lcsfoo(bigsampdata$str1,bigsampdata$str2,bigsampdata$str3,type="nchar")
bigsampdata$desired_LCSstr <- lcsfoo(bigsampdata$str1,bigsampdata$str2,bigsampdata$str3,type="str")