R 删除前n个单词并计数
我有一个带有文本列的数据框,我需要忽略或删除前两个单词,并计算该列中的字符串R 删除前n个单词并计数,r,regex,R,Regex,我有一个带有文本列的数据框,我需要忽略或删除前两个单词,并计算该列中的字符串 b <- data.frame(text = c("hello sunitha what can I do for you?", "hi john what can I do for you?") b您可以使用gsub删除前两个单词,然后使用tapply和count,即 i1 <- gsub("^\\w*\\s*
b <- data.frame(text = c("hello sunitha what can I do for you?",
"hi john what can I do for you?")
b您可以使用gsub
删除前两个单词,然后使用tapply
和count,即
i1 <- gsub("^\\w*\\s*\\w*\\s*", "", b$text)
tapply(i1, i1, length)
#what can I do for you?
# 2
你在找这样的东西吗?注意stringsAsFactors=FALSE
否则您的文本将是factor
类型且更难处理。前2或3个??第二个数是多少?在去掉前两个单词后,剩下的字符串将是相同的,因此当我计算它们时,答案应该是2当你去掉前两个单词时,剩下的是“我能为你做些什么?”,这里的数字是6(即,我不明白你的问题)如果你已经知道你已经去掉了两个单词,你能分享代码吗,为什么还需要数一数呢?假设我想去掉从第二个单词到第N个单词(Ex:2:4)这样的单词范围,怎么做?因为我也有这样的文本。首先,你应该在你最初的问题中提到这一点,因为现在整个概念需要改变
i1 <- sapply(strsplit(as.character(b$text), ' '), function(i)paste(i[-c(2:4)], collapse = ' '))
tapply(i1, i1, length)
#hello I do for you? hi I do for you?
# 1 1
b=data.frame(text=c("hello sunitha what can I do for you?","hi john what can I do for you?"),stringsAsFactors = FALSE)
b$processed = sapply(b$text, function(x) (strsplit(x," ")[[1]]%>%.[-c(1:2)])%>%paste0(.,collapse=" "))
b$count = sapply(b$processed, function(x) length(strsplit(x," ")[[1]]))
> b
text processed count
1 hello sunitha what can I do for you? what can I do for you? 6
2 hi john what can I do for you? what can I do for you? 6