Regex 从没有语料库的数据帧中提取子词列表_Regex_R_Dataframe

Regex 从没有语料库的数据帧中提取子词列表

regex r dataframe

Regex 从没有语料库的数据帧中提取子词列表,regex,r,dataframe,Regex,R,Dataframe,事实上，我想在数据框中提取一些子单词列表，我知道我们可以通过语料库提取，但我不想做不必要的事情。首先，我使用了匹配以及grep，但问题是除了精确匹配之外不能使用match，grep不能用于多个单词 a=sample(c("Client","offshor","V1fax","12mobile"),10) z=data.frame(a) z a 1 V1fax 2 V1fax 3 12mobile 4 12mobile 5 V1fax 6

事实上，我想在数据框中提取一些子单词列表，我知道我们可以通过语料库提取，但我不想做不必要的事情。首先，我使用了匹配以及grep，但问题是除了精确匹配之外不能使用match，grep不能用于多个单词

a=sample(c("Client","offshor","V1fax","12mobile"),10) z=data.frame(a) z a 1 V1fax 2 V1fax 3 12mobile 4 12mobile 5 V1fax 6 clint 7 offshor 8 clint 9 clint 10 12mobile d=z[is.na(match(tolower(z[,1]),c("fax","mobile","except","talwade"))),] grep(c("fax","mobile","except","talwade"),tolower(z[,1])) [1] 1 2 5 Warning message: In grep(c("fax", "mobile", "except", "talwade" : argument 'pattern' has length > 1 and only the first element will be used
希望o/p为

z a 1 clint 2 offshor 3 clint 4 clint

正如预期的那样，任何有效的方法都可以提取子词列表。谢谢。
您可以使用
grep
完成此操作。您只需使用正则表达式
或运算符，即 grep( paste( c("fax","mobile","except","talwade") , collapse = "|" ) , tolower(z[,1]) ) # [1] 1 2 3 4 5 10 # The pattern... paste( c("fax","mobile","except","talwade") , collapse = "|" ) # [1] "fax|mobile|except|talwade" 这将比Simon的解决方案慢一点，但它可以访问更多的数据进行分析。您可以使用sapply 返回匹配矩阵： patterns <- c("fax","mobile","except","talwade") match.mat <- sapply(patterns, grepl, z$a) rownames(match.mat) <- z$a # fax mobile except talwade # V1fax TRUE FALSE FALSE FALSE # V1fax TRUE FALSE FALSE FALSE # 12mobile FALSE TRUE FALSE FALSE # 12mobile FALSE TRUE FALSE FALSE # V1fax TRUE FALSE FALSE FALSE # clint FALSE FALSE FALSE FALSE # offshor FALSE FALSE FALSE FALSE # clint FALSE FALSE FALSE FALSE # clint FALSE FALSE FALSE FALSE # 12mobile FALSE TRUE FALSE FALSE 哪些： which(rowSums(match.mat) > 0) # V1fax V1fax 12mobile 12mobile V1fax 12mobile # 1 2 3 4 5 10 对于特定单词，匹配的模式是什么，反之亦然： which(match.mat["12mobile", ]) which(match.mat[, "fax"]) which(match.mat["12mobile", ]) which(match.mat[, "fax"])