R字符匹配与秩
我有一个字符向量R字符匹配与秩,r,pattern-matching,string-matching,R,Pattern Matching,String Matching,我有一个字符向量 var1 <- c("pine tree", "dense forest", "red fruits", "green fruits", "clean water", "pine") var2[2]是等级1(var1中的4个短语:松树、密林、松树和水与var2[2]匹配) "tall tree" "fruits" "star" var2[1]是等级2,(var1中的3个短语:松树、红色水果和绿色水果与var2[1]匹配) va
var1 <- c("pine tree", "dense forest", "red fruits", "green fruits",
"clean water", "pine")
var2[2]是等级1(var1中的4个短语:松树、密林、松树和水与var2[2]匹配)
"tall tree" "fruits" "star"
var2[1]是等级2,(var1中的3个短语:松树、红色水果和绿色水果与var2[1]匹配)
var2[3]是等级3,与var1不匹配
我试过了
indx1 <- sapply(var2, function(x) sum(grepl(var1, x)))
indx1我们可以循环使用'var2'(sappy(var2,
),在空白处拆分字符串(strsplit(x,)
),grep
输出将元素列为“var1”的模式。检查是否有任何匹配,sum
逻辑向量和rank
它。这可用于对“var2”元素重新排序
indx <- rank(-sapply(var2, function(x) sum(sapply(strsplit(x, ' '),
function(y) any(grepl(paste(y,collapse='|'), var1))))),
ties.method='first')
indx
#[1] 2 1 3
var2[indx]
#[[1]]
#[1] "tree tall" "pine tree" "tree pine" "black forest" "water"
#[[2]]
#[1] "tall tree" "fruits" "star"
#[[3]]
#[1] "apple" "orange" "grapes"
indx以下代码可以工作:
idx <- rank(-sapply(var2,
function(x) sum(unlist(sapply(strsplit(var1,split=' '),
function(y) any(unlist(sapply(y,
function(z) grepl(z,x))>0))>0)))),
ties.method='random')
idx 0))>0)),
ties.method='random')
我检查了第二个示例。对于第一个元素,有3个匹配项,对于环境,对于第二个元素,状态也有3个匹配项。所以,这又是一个tieI编辑了var11,现在只有两个状态;此对话已结束。感谢bluefeet将我们的对话移至聊天。同时,过滤器问题仍然存在,需要更换过滤器。@johntryvar2[indx][sapply(var2[indx],function(x)any(grepl(pat,x))]
indx1 <- sapply(var2, function(x) sum(grepl(var1, x)))
var11 <- c("nature" , "environmental", "ringing", "valley" , "status" , "climate" ,
"forge" , "environmental" , "common" ,
"birdwatch", "big" , "link" ,
"day" , "pintail" , "morning" ,
"big garden" , "birdwatch deadline", "deadline february" ,
"mu condition" , "garden birdwatch" , "status" ,
"chorus walk" , "dawn choru" , "walk sunday",
"climate lobby" , "lobby parliament" , "u status" ,
"sandwell valley" , "my status of" , "environmental lake")
var22 <- list(c("environmental condition"), c("condition", "status"), c("water", "ocean water"))
indx <- rank(-sapply(var2, function(x) sum(sapply(strsplit(x, ' '),
function(y) any(grepl(paste(y,collapse='|'), var1))))),
ties.method='first')
indx
#[1] 2 1 3
var2[indx]
#[[1]]
#[1] "tree tall" "pine tree" "tree pine" "black forest" "water"
#[[2]]
#[1] "tall tree" "fruits" "star"
#[[3]]
#[1] "apple" "orange" "grapes"
indx <- rank(-sapply(var22, function(x) sum(sapply(strsplit(x, ' '),
function(y) sum(sapply(strsplit(var11, ' '),
function(z) any(grepl(paste(y, collapse="|"), z))))))),
ties.method='random')
indx
#[1] 1 2
pat <- paste(unique(unlist(strsplit(var1, ' '))), collapse="|")
Filter(function(x) any(grepl(pat, x)), var2[indx])
#[[1]]
#[1] "tree tall" "pine tree" "tree pine" "black forest" "water"
#[[2]]
#[1] "tall tree" "fruits" "star"
idx <- rank(-sapply(var2,
function(x) sum(unlist(sapply(strsplit(var1,split=' '),
function(y) any(unlist(sapply(y,
function(z) grepl(z,x))>0))>0)))),
ties.method='random')