R 迭代连接字符串中的前/后n个字
假设我有以下data.frame:R 迭代连接字符串中的前/后n个字,r,string,data.table,combinations,word,R,String,Data.table,Combinations,Word,假设我有以下data.frame: df <- data.frame(string=c("word1 word2 word3 word4", "word1 word2", "word1"), stringsAsFactors = FALSE) 元素名称根本不需要,但便于理解 不需要:粘贴中间元素,如word2 word3 我目前使用strsplitdf$string来准备所需列表的第一步,然后可以通过双循环实现我想要的,但这远远不够高效 基本R/data.table中首选的方法,但tid
df <- data.frame(string=c("word1 word2 word3 word4", "word1 word2", "word1"), stringsAsFactors = FALSE)
元素名称根本不需要,但便于理解
不需要:粘贴中间元素,如word2 word3
我目前使用strsplitdf$string来准备所需列表的第一步,然后可以通过双循环实现我想要的,但这远远不够高效
基本R/data.table中首选的方法,但tidyverse高效解决方案非常合适。一个dplyr、tidyr和purrr选项可以是:
df %>%
rowid_to_column() %>%
separate_rows(string, sep = " ") %>%
group_by(rowid) %>%
transmute(concatenated = accumulate(string, ~ paste(.x, .y)),
concatenated_rev = accumulate(rev(string), ~ paste(.x, .y)))
rowid concatenated concatenated_rev
<int> <chr> <chr>
1 1 word1 word4
2 1 word1 word2 word4 word3
3 1 word1 word2 word3 word4 word3 word2
4 1 word1 word2 word3 word4 word4 word3 word2 word1
5 2 word1 word2
6 2 word1 word2 word2 word1
7 3 word1 word1
或具有进一步的左/右信息:
df %>%
rowid_to_column() %>%
separate_rows(string, sep = " ") %>%
group_by(rowid) %>%
transmute(left = paste0("left", 1:n()),
concatenated = accumulate(string, ~ paste(.x, .y)),
right = paste0("right", 1:n()),
concatenated_rev = accumulate(rev(string), ~ paste(.x, .y)))
rowid left concatenated right concatenated_rev
<int> <chr> <chr> <chr> <chr>
1 1 left1 word1 right1 word4
2 1 left2 word1 word2 right2 word4 word3
3 1 left3 word1 word2 word3 right3 word4 word3 word2
4 1 left4 word1 word2 word3 word4 right4 word4 word3 word2 word1
5 2 left1 word1 right1 word2
6 2 left2 word1 word2 right2 word2 word1
7 3 left1 word1 right1 word1
一个dplyr、tidyr和purrr选项可以是:
df %>%
rowid_to_column() %>%
separate_rows(string, sep = " ") %>%
group_by(rowid) %>%
transmute(concatenated = accumulate(string, ~ paste(.x, .y)),
concatenated_rev = accumulate(rev(string), ~ paste(.x, .y)))
rowid concatenated concatenated_rev
<int> <chr> <chr>
1 1 word1 word4
2 1 word1 word2 word4 word3
3 1 word1 word2 word3 word4 word3 word2
4 1 word1 word2 word3 word4 word4 word3 word2 word1
5 2 word1 word2
6 2 word1 word2 word2 word1
7 3 word1 word1
或具有进一步的左/右信息:
df %>%
rowid_to_column() %>%
separate_rows(string, sep = " ") %>%
group_by(rowid) %>%
transmute(left = paste0("left", 1:n()),
concatenated = accumulate(string, ~ paste(.x, .y)),
right = paste0("right", 1:n()),
concatenated_rev = accumulate(rev(string), ~ paste(.x, .y)))
rowid left concatenated right concatenated_rev
<int> <chr> <chr> <chr> <chr>
1 1 left1 word1 right1 word4
2 1 left2 word1 word2 right2 word4 word3
3 1 left3 word1 word2 word3 right3 word4 word3 word2
4 1 left4 word1 word2 word3 word4 right4 word4 word3 word2 word1
5 2 left1 word1 right1 word2
6 2 left2 word1 word2 right2 word2 word1
7 3 left1 word1 right1 word1
基本R版本:
我们可以编写一个函数,每次递增粘贴每个单词的值
paste_words <- function(x) {
sapply(seq_along(x), function(y) paste0(x[1:y], collapse = " "))
}
lapply(strsplit(df$string, " "), function(x) c(paste_words(x), paste_words(rev(x))))
#[[1]]
#[1] "word1" "word1 word2" "word1 word2 word3" "word1 word2 word3 word4"
#[5] "word4" "word4 word3" "word4 word3 word2" "word4 word3 word2 word1"
#[[2]]
#[1] "word1" "word1 word2" "word2" "word2 word1"
#[[3]]
#[1] "word1" "word1"
您可能希望包装为“唯一”,以避免重复最后一个元素中类似的单词。基本R版本:
我们可以编写一个函数,每次递增粘贴每个单词的值
paste_words <- function(x) {
sapply(seq_along(x), function(y) paste0(x[1:y], collapse = " "))
}
lapply(strsplit(df$string, " "), function(x) c(paste_words(x), paste_words(rev(x))))
#[[1]]
#[1] "word1" "word1 word2" "word1 word2 word3" "word1 word2 word3 word4"
#[5] "word4" "word4 word3" "word4 word3 word2" "word4 word3 word2 word1"
#[[2]]
#[1] "word1" "word1 word2" "word2" "word2 word1"
#[[3]]
#[1] "word1" "word1"
您可能希望包装为unique,以避免重复类似于最后一个元素中的单词。多亏了Ronak方法,多谢,我最终得到了以下代码。 比我的循环更优雅、更出色
paste_words_left <- function(x) {
sapply(seq_along(x), function(y) paste0(x[1:y], collapse = " "))
}
paste_words_right <- function(x) {
sapply(seq_along(x)[-1], function(y) paste0(x[y:length(x)], collapse = " "))
}
## lapply(strsplit(df$string, " "), function(x) c(paste_words_left(x), paste_words_right(x)))
lapply(strsplit(df$string, " "), function(x){
if (length(x)==1) x else c(paste_words_left(x), paste_words_right(x))})
感谢Ronak方法,谢谢,我最终得到了以下代码。 比我的循环更优雅、更出色
paste_words_left <- function(x) {
sapply(seq_along(x), function(y) paste0(x[1:y], collapse = " "))
}
paste_words_right <- function(x) {
sapply(seq_along(x)[-1], function(y) paste0(x[y:length(x)], collapse = " "))
}
## lapply(strsplit(df$string, " "), function(x) c(paste_words_left(x), paste_words_right(x)))
lapply(strsplit(df$string, " "), function(x){
if (length(x)==1) x else c(paste_words_left(x), paste_words_right(x))})
你需要所有的左i,其中i是1到n,右i,其中i是1到n-1?所有的左i,从1到n正确;右i:n-1到你需要所有左i,其中i是1到n,右i,其中i是1到n-1?所有左i从1到n正确;右i:n-1到nHi,这并不是我们想要的:x和revx的处理方式不同。对于右边/最后一个单词,我们需要例如word2 word3 word4 word3 word4 word3。但是你的函数对我来说很好,我用它作为我提议的模板。参考我的帖子。嗨,这并不是我们想要的:在x和revx上的处理方式不一样。对于右边/最后一个单词,我们需要例如word2 word3 word4 word3 word4 word3。但是你的函数对我来说很好,我用它作为我提议的模板。参见我的帖子。Hi tmfmnk,与我在Ronak solution上发布的评论相同。如果我的问题不够清楚,很抱歉。您好,tmfmnk,与我在Ronak solution上发布的评论相同。如果我的问题不够清楚,很抱歉。更新后,长度为1的向量的输出不正确。将字符串与空列表关联将导致列表。我将错误的行放在注释中,并更新了代码。更新后,长度为1的向量的输出不正确,如果将字符串与空列表关联,则会导致列表。我在注释中输入了错误的行,并更新了代码。