R警告:条件具有长度>;1且仅使用第一个元素。外部功能

R警告:条件具有长度>;1且仅使用第一个元素。外部功能,r,matrix,text,matching,R,Matrix,Text,Matching,我有以下两个功能: name_fitting <- function(term1, term2) { if (nchar(term1) <= 3) { temp <- substring(term2, 1,nchar(term1)) return(temp==term1) } else {return(grepl(term1, term2))} } name_matching <- functi

我有以下两个功能:

name_fitting <- function(term1, term2)
  {
    if (nchar(term1) <= 3)
      {
       temp <- substring(term2, 1,nchar(term1))
       return(temp==term1)
      }
    else {return(grepl(term1, term2))}
  }

name_matching <- function(name1, name2)
  {
    name1 <- gsub('[[:punct:]]+','', name1)
    name2 <- gsub('[[:punct:]]+','', name2)
    if (length(intersect(as.character(unlist(strsplit(name1, ' '))), as.character(unlist(strsplit(name2, ' '))))) > 1) {return(TRUE)}
    if (length(intersect(as.character(unlist(strsplit(name1, ' '))), as.character(unlist(strsplit(name2, ' '))))) == 1) 
        {
          non_matching <- union(setdiff(as.character(unlist(strsplit(name1, ' '))), as.character(unlist(strsplit(name2, ' ')))), setdiff(as.character(unlist(strsplit(name2, ' '))), as.character(unlist(strsplit(name1, ' ')))))
          temp <- outer(X = non_matching, Y = non_matching, FUN = 'name_fitting')
          diag(temp)<-FALSE
          return(any(temp))
        }
    else(return(FALSE))
  }

name\u fitting您的函数传递name\u fitting字符向量
非匹配
,该字符向量包含三个元素:
[1]“MARCO”“M”“BRANDUARDI”
。该向量被传递给
if
调用
if(nchar(term1)
矢量化(函数)

解决方案是:

name_fitting <- function(term1, term2)
  {
    if (nchar(term1) <= 3)
      {
       temp <- substring(term2, 1,nchar(term1))
       return(temp==term1)
      }
    else {return(grepl(term1, term2))}
  }
name_fitting <- Vectorize(name_fitting)

name_matching <- function(name1, name2)
  {
    name1 <- trimws(gsub('[[:punct:]]+','', name1))
    name2 <- trimws(gsub('[[:punct:]]+','', name2))
    temp <- intersect(as.character(unlist(strsplit(name1, ' '))), as.character(unlist(strsplit(name2, ' '))))
    temp <- temp[temp!=c('')]
    if (length(temp) > 1) {return(TRUE)}
    if (length(intersect(as.character(unlist(strsplit(name1, ' '))), as.character(unlist(strsplit(name2, ' '))))) == 1) 
        {
          non_matching <- union(setdiff(as.character(unlist(strsplit(name1, ' '))), as.character(unlist(strsplit(name2, ' ')))), setdiff(as.character(unlist(strsplit(name2, ' '))), as.character(unlist(strsplit(name1, ' ')))))
          non_matching <- non_matching[non_matching!=c("")]
          temp <- outer(X = non_matching, Y = non_matching, FUN = 'name_fitting')
          diag(temp)<-FALSE
          return(any(temp))
        }
    else(return(FALSE))
  }

name_matching <- Vectorize(name_matching)

name\u fitting我觉得我遗漏了什么-为什么要将name\u fitting传递给同一个参数两次?因为这样我就得到了一个针对name fitting函数的所有不匹配项的矩阵。我认为outer会传递向量的元素:“non\u matching”一个接一个地作为数组的产物。因为这是我一直在寻找的。从outer的文档中:“FUN是用这两个扩展向量作为参数(加上…)调用的。它必须是一个向量化函数(或一个函数的名称),至少需要两个参数,并返回与第一个参数长度相同的值(及第二项)
name_fitting <- function(term1, term2)
  {
    if (nchar(term1) <= 3)
      {
       temp <- substring(term2, 1,nchar(term1))
       return(temp==term1)
      }
    else {return(grepl(term1, term2))}
  }
name_fitting <- Vectorize(name_fitting)

name_matching <- function(name1, name2)
  {
    name1 <- trimws(gsub('[[:punct:]]+','', name1))
    name2 <- trimws(gsub('[[:punct:]]+','', name2))
    temp <- intersect(as.character(unlist(strsplit(name1, ' '))), as.character(unlist(strsplit(name2, ' '))))
    temp <- temp[temp!=c('')]
    if (length(temp) > 1) {return(TRUE)}
    if (length(intersect(as.character(unlist(strsplit(name1, ' '))), as.character(unlist(strsplit(name2, ' '))))) == 1) 
        {
          non_matching <- union(setdiff(as.character(unlist(strsplit(name1, ' '))), as.character(unlist(strsplit(name2, ' ')))), setdiff(as.character(unlist(strsplit(name2, ' '))), as.character(unlist(strsplit(name1, ' ')))))
          non_matching <- non_matching[non_matching!=c("")]
          temp <- outer(X = non_matching, Y = non_matching, FUN = 'name_fitting')
          diag(temp)<-FALSE
          return(any(temp))
        }
    else(return(FALSE))
  }

name_matching <- Vectorize(name_matching)