R字符匹配与秩_R_Pattern Matching_String Matching

R字符匹配与秩

R字符匹配与秩,r,pattern-matching,string-matching,R,Pattern Matching,String Matching,我有一个字符向量 var1 <- c("pine tree", "dense forest", "red fruits", "green fruits", "clean water", "pine") var2[2]是等级1（var1中的4个短语：松树、密林、松树和水与var2[2]匹配） "tall tree" "fruits" "star" var2[1]是等级2，（var1中的3个短语：松树、红色水果和绿色水果与var2[1]匹配） va

我有一个字符向量

var1 <- c("pine tree", "dense forest", "red fruits", "green fruits",
                 "clean water", "pine")

var2[2]是等级1（var1中的4个短语：松树、密林、松树和水与var2[2]匹配）

"tall tree" "fruits"    "star"

var2[1]是等级2，（var1中的3个短语：松树、红色水果和绿色水果与var2[1]匹配）

var2[3]是等级3，与var1不匹配

我试过了

indx1 <- sapply(var2, function(x) sum(grepl(var1, x)))

indx1我们可以循环使用'var2'（sappy（var2，
），在空白处拆分字符串（strsplit（x，）
），grep
输出将元素列为“var1”的模式。检查是否有任何匹配，sum
逻辑向量和rank
它。这可用于对“var2”元素重新排序
 indx <- rank(-sapply(var2, function(x) sum(sapply(strsplit(x, ' '),
              function(y) any(grepl(paste(y,collapse='|'), var1))))),
                 ties.method='first')
 indx
 #[1] 2 1 3


var2[indx]
#[[1]]
#[1] "tree tall"    "pine tree"    "tree pine"    "black forest" "water"       

#[[2]]
#[1] "tall tree" "fruits"    "star"     

#[[3]]
#[1] "apple"  "orange" "grapes"

indx以下代码可以工作：
idx <- rank(-sapply(var2, 
         function(x) sum(unlist(sapply(strsplit(var1,split=' '), 
           function(y) any(unlist(sapply(y,
             function(z) grepl(z,x))>0))>0)))),
  ties.method='random')

idx 0））>0）），
ties.method='random'）
我检查了第二个示例。对于第一个元素，有3个匹配项，对于环境，对于第二个元素，状态也有3个匹配项。所以，这又是一个tieI编辑了var11，现在只有两个状态；此对话已结束。感谢bluefeet将我们的对话移至聊天。同时，过滤器问题仍然存在，需要更换过滤器。@johntryvar2[indx][sapply（var2[indx]，function（x）any（grepl（pat，x））]
indx1 <- sapply(var2, function(x) sum(grepl(var1, x)))

var11 <- c("nature" ,  "environmental", "ringing", "valley" ,            "status" ,            "climate" ,          
       "forge"  ,            "environmental" ,     "common" ,           
       "birdwatch",          "big"    ,            "link" ,             
       "day" ,              "pintail"    ,        "morning" ,          
       "big garden" ,        "birdwatch deadline", "deadline february" ,
       "mu condition" ,        "garden birdwatch" ,  "status" ,           
       "chorus walk" ,       "dawn choru"  ,       "walk sunday", 
       "climate lobby" ,     "lobby parliament" ,  "u status" ,              
       "sandwell valley" ,   "my status of"  ,           "environmental lake")


var22 <- list(c("environmental condition"),  c("condition", "status"), c("water", "ocean water"))

 indx <- rank(-sapply(var2, function(x) sum(sapply(strsplit(x, ' '),
              function(y) any(grepl(paste(y,collapse='|'), var1))))),
                 ties.method='first')
 indx
 #[1] 2 1 3


var2[indx]
#[[1]]
#[1] "tree tall"    "pine tree"    "tree pine"    "black forest" "water"       

#[[2]]
#[1] "tall tree" "fruits"    "star"     

#[[3]]
#[1] "apple"  "orange" "grapes"

indx <- rank(-sapply(var22, function(x) sum(sapply(strsplit(x, ' '), 
        function(y) sum(sapply(strsplit(var11, ' '), 
          function(z) any(grepl(paste(y, collapse="|"), z))))))),
             ties.method='random')
indx
#[1] 1 2

pat <- paste(unique(unlist(strsplit(var1, ' '))), collapse="|")
Filter(function(x) any(grepl(pat, x)), var2[indx])
#[[1]]
#[1] "tree tall"    "pine tree"    "tree pine"    "black forest" "water"       

#[[2]]
#[1] "tall tree" "fruits"    "star"     

idx <- rank(-sapply(var2, 
         function(x) sum(unlist(sapply(strsplit(var1,split=' '), 
           function(y) any(unlist(sapply(y,
             function(z) grepl(z,x))>0))>0)))),
  ties.method='random')