Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/list/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 根据字数拆分列表_R_List_Split_Count - Fatal编程技术网

R 根据字数拆分列表

R 根据字数拆分列表,r,list,split,count,R,List,Split,Count,我有3个字符向量列表,例如 list1 = list(c("bla bla bla bla", "sample text", "dumdidum bla bla", "a very long text is written in here")) list2 = list(c("bla ", "blubb")) list3 = list(c("bla bla bla bla", "sample text", "another very long text", "cat dog bird")) 我

我有3个字符向量列表,例如

list1 = list(c("bla bla bla bla", "sample text", "dumdidum bla bla", "a very long text is written in here"))
list2 = list(c("bla ", "blubb"))
list3 = list(c("bla bla bla bla", "sample text", "another very long text", "cat dog bird"))
我想以同样的格式创建一个新的列表,其中只包含上述列表中计数超过3个单词的条目。将放入新列表的条目应在原始列表中删除。 我期望的输出应为以下形式:

list1 = list(c("sample text", "dumdidum bla bla"))
list2 = list(c("bla ", "blubb"))
list3 = list(c("sample text","cat dog bird"))

newlist = list(c("bla bla bla bla", "a very long text is written in here", "bla bla bla bla", "another very long text"))

有可能这样做吗?

我将您的数据放入列表,然后使用
lappy

data_list <- list(
    list1 = list(c("bla bla bla bla", "sample text", "dumdidum bla bla", "a very long text is written in here")),
    list2 = list(c("bla ", "blubb")),
    list3 = list(c("bla bla bla bla", "sample text", "another very long text", "cat dog bird")))

data_vec <- unname(unlist(data_list))

data_list <- lapply(data_list,function(x){
    keep_ind <- lapply(strsplit(x[[1]]," "),length) <= 3
    c(x[[1]][keep_ind])
})

newlist <- data_vec[!data_vec %in% unlist(data_list)]

data_list
#$list1
#[1] "sample text"      "dumdidum bla bla"
#
#$list2
#[1] "bla "  "blubb"
#
#$list3
#[1] "sample text"  "cat dog bird"

newlist
#[1] "bla bla bla bla"                     "a very long text is written in here"
#[3] "bla bla bla bla"                     "another very long text"  

data\u list我们可以尝试使用
str\u count

library(stringr)
list(unlist(lapply(c(list1, list2, list3), function(x) x[str_count(x, "\\w+")>3])))
#[[1]]
#[1] "bla bla bla bla"                     "a very long text is written in here" "bla bla bla bla"                     "another very long text"  

另一个带有
stringi
库的选项

library(stringi)

v1 <- unlist(c(list1, list2, list3))
v2 <- v1[stri_count_words(v1) > 3]
v2

#[1] "bla bla bla bla" "a very long text is written in here" "bla bla bla bla"  "another very long text" 
这就给了,


从一开始就取消列表非常好,这使得解决方案非常简单@tobiasegli_te谢谢。我认为,既然OP不想区分单词来自哪个列表,那么取消列表似乎很容易找到答案。效果非常好。但是我仍然需要删除旧列表中的条目。我相应地更改了解决方案,您现在可以得到旧列表和新列表。谢谢,但我需要相反的列表。所以list1,list2,list3应该只包含少于3个单词的字符串。我明白了,我错过了你问题中的那个部分。我又相应地改变了我的解决方案。
lapply(c(list1, list2, list3), function(i) setdiff(i, v2))
[[1]]
[1] "sample text"      "dumdidum bla bla"

[[2]]
[1] "bla "  "blubb"

[[3]]
[1] "sample text"  "cat dog bird"