如果在r中的变量中找到值,则用于子集数据帧的If-else语句
我有三个单词和两个短语的数据框架,以及在文本中分别找到的每个短语的计数。以下是一些虚拟数据:如果在r中的变量中找到值,则用于子集数据帧的If-else语句,r,match,R,Match,我有三个单词和两个短语的数据框架,以及在文本中分别找到的每个短语的计数。以下是一些虚拟数据: trig <- c("took my dog", "took my cat", "took my hat", "ate my dinner", "ate my lunch") trig_count <- c(3, 2, 1, 3, 1) big <- c("took my", "took my", "took my", "ate my", "ate my") b
trig <- c("took my dog", "took my cat", "took my hat", "ate my dinner", "ate my lunch")
trig_count <- c(3, 2, 1, 3, 1)
big <- c("took my", "took my", "took my", "ate my", "ate my")
big_count <- c(6,6,6,4,4)
df <- data.frame(trig, trig_count, big, big_count)
df$trig <- as.character(df$trig)
df$big <- as.character(df$big)
trig trig_count big big_count
1 took my dog 3 took my 6 2 took my cat
2 took my 6
3 took my hat 1 took my 6
4 ate my dinner 3 ate my 4
5 ate my lunch 1 ate my 4
返回
"no match"
"took my dog" "took my cat" "took my hat"
但对于匹配的单词,它不起作用,例如:
match_test("looked for")
match_test("took my")
返回
"no match"
"took my dog" "took my cat" "took my hat"
我要找的是:
trig trig_count big big_count
1 took my dog 3 took my 6
2 took my cat 2 took my 6
3 took my hat 1 took my 6
我不明白的是关于%的什么?还是别的什么?非常感谢您的指导 我们可以使用
stru-detect
library(stringr)
library(dplyr)
df %>%
filter(str_detect(big, "took my"))
# trig trig_count big big_count
#1 took my dog 3 took my 6
#2 took my cat 2 took my 6
#3 took my hat 1 took my 6
您不需要
ifelse
;您只需按照@Ronak Shah的建议将原始df子集即可:
df[grep(match_test, df$big), ]
如果要将其转换为仍然返回不匹配的函数,可以执行以下操作:
match_test <- function(match_string) {
subset_df <- df[grep(match_string, df$big), ]
if (nrow(subset_df) < 1) {
warning("no match")
} else {
subset_df
}
}
match_test("took my")
# trig trig_count big big_count
# 1 took my dog 3 took my 6
# 2 took my cat 2 took my 6
# 3 took my hat 1 took my 6
我们也可以试试这个:
library(stringr)
match_test <- function(x){
res <- df[which(!is.na(str_match(df$big,x))),]
if(nrow(res) == 0) return('no match')
return(res)
}
match_test("looked for")
#[1] "no match"
match_test("took my")
# trig trig_count big big_count
#1 took my dog 3 took my 6
#2 took my cat 2 took my 6
#3 took my hat 1 took my 6
match_test("ate my")
# trig trig_count big big_count
#4 ate my dinner 3 ate my 4
#5 ate my lunch 1 ate my 4
库(stringr)
也许是这个df[grep(“拿走了我的”,df$big),]
感谢您的快速响应,Ronak。我想改为使用grep(),但我不知道如何以编程方式使用该函数,即grep(x,df$big)不起作用,因为需要引号。有什么想法吗?会的。试试看<代码>匹配\u测试可能的重复项或感谢大家的输入。在您的帮助下,我已经让函数完成了我需要它做的事情,但是我仍然想理解为什么我的代码不起作用——如果有人有任何想法的话……我确实需要一个函数中的结果,如果没有匹配的话,它将返回字符串“不匹配”(而不是警告),否则我只需要执行df[df$big==x,]),尽管grep也可以工作。谢谢你,菲尔@在这种情况下,将warning()
替换为return()