R 根据字数拆分列表
我有3个字符向量列表,例如R 根据字数拆分列表,r,list,split,count,R,List,Split,Count,我有3个字符向量列表,例如 list1 = list(c("bla bla bla bla", "sample text", "dumdidum bla bla", "a very long text is written in here")) list2 = list(c("bla ", "blubb")) list3 = list(c("bla bla bla bla", "sample text", "another very long text", "cat dog bird")) 我
list1 = list(c("bla bla bla bla", "sample text", "dumdidum bla bla", "a very long text is written in here"))
list2 = list(c("bla ", "blubb"))
list3 = list(c("bla bla bla bla", "sample text", "another very long text", "cat dog bird"))
我想以同样的格式创建一个新的列表,其中只包含上述列表中计数超过3个单词的条目。将放入新列表的条目应在原始列表中删除。
我期望的输出应为以下形式:
list1 = list(c("sample text", "dumdidum bla bla"))
list2 = list(c("bla ", "blubb"))
list3 = list(c("sample text","cat dog bird"))
newlist = list(c("bla bla bla bla", "a very long text is written in here", "bla bla bla bla", "another very long text"))
有可能这样做吗?我将您的数据放入列表,然后使用
lappy
:
data_list <- list(
list1 = list(c("bla bla bla bla", "sample text", "dumdidum bla bla", "a very long text is written in here")),
list2 = list(c("bla ", "blubb")),
list3 = list(c("bla bla bla bla", "sample text", "another very long text", "cat dog bird")))
data_vec <- unname(unlist(data_list))
data_list <- lapply(data_list,function(x){
keep_ind <- lapply(strsplit(x[[1]]," "),length) <= 3
c(x[[1]][keep_ind])
})
newlist <- data_vec[!data_vec %in% unlist(data_list)]
data_list
#$list1
#[1] "sample text" "dumdidum bla bla"
#
#$list2
#[1] "bla " "blubb"
#
#$list3
#[1] "sample text" "cat dog bird"
newlist
#[1] "bla bla bla bla" "a very long text is written in here"
#[3] "bla bla bla bla" "another very long text"
data\u list我们可以尝试使用str\u count
library(stringr)
list(unlist(lapply(c(list1, list2, list3), function(x) x[str_count(x, "\\w+")>3])))
#[[1]]
#[1] "bla bla bla bla" "a very long text is written in here" "bla bla bla bla" "another very long text"
另一个带有stringi
库的选项
library(stringi)
v1 <- unlist(c(list1, list2, list3))
v2 <- v1[stri_count_words(v1) > 3]
v2
#[1] "bla bla bla bla" "a very long text is written in here" "bla bla bla bla" "another very long text"
这就给了,
从一开始就取消列表非常好,这使得解决方案非常简单@tobiasegli_te谢谢。我认为,既然OP不想区分单词来自哪个列表,那么取消列表似乎很容易找到答案。效果非常好。但是我仍然需要删除旧列表中的条目。我相应地更改了解决方案,您现在可以得到旧列表和新列表。谢谢,但我需要相反的列表。所以list1,list2,list3应该只包含少于3个单词的字符串。我明白了,我错过了你问题中的那个部分。我又相应地改变了我的解决方案。
lapply(c(list1, list2, list3), function(i) setdiff(i, v2))
[[1]]
[1] "sample text" "dumdidum bla bla"
[[2]]
[1] "bla " "blubb"
[[3]]
[1] "sample text" "cat dog bird"