R lead或lag函数可获得多个值，而不仅仅是第n个值_R_Dplyr_Lag_Lead

R lead或lag函数可获得多个值，而不仅仅是第n个值

R lead或lag函数可获得多个值，而不仅仅是第n个值,r,dplyr,lag,lead,R,Dplyr,Lag,Lead,我有一个tibble，每行有一个单词列表。我想从一个搜索关键字的函数中创建一个新变量，如果它找到关键字，则创建一个由关键字加减3个单词组成的字符串下面的代码很接近，但是，它没有抓住我的关键字前后的所有三个单词，而是抓住前面/后面的单个单词3 df <- tibble(words = c("it", "was", "the", "best", "of", "times", "it", "was", "the", "worst", "of",

我有一个tibble，每行有一个单词列表。我想从一个搜索关键字的函数中创建一个新变量，如果它找到关键字，则创建一个由关键字加减3个单词组成的字符串

下面的代码很接近，但是，它没有抓住我的关键字前后的所有三个单词，而是抓住前面/后面的单个单词3

df <- tibble(words = c("it", "was", "the", "best", "of", "times", 
                       "it", "was", "the", "worst", "of", "times"))
df <- df %>% mutate(chunks = ifelse(words=="times", 
                                    paste(lag(words, 3), 
                                          words, 
                                          lead(words, 3), sep = " "),
                                    NA))

df一个选项是sapply
：
library(dplyr)

df %>%
  mutate(
    chunks = ifelse(
      words == "times",
      sapply(
        1:nrow(.),
        function(x) paste(words[pmax(1, x - 3):pmin(x + 3, nrow(.))], collapse = " ")
        ),
      NA
      )
  )

输出：
# A tibble: 12 x 2
   words chunks                      
   <chr> <chr>                       
 1 it    NA                          
 2 was   NA                          
 3 the   NA                          
 4 best  NA                          
 5 of    NA                          
 6 times the best of times it was the
 7 it    NA                          
 8 was   NA                          
 9 the   NA                          
10 worst NA                          
11 of    NA                          
12 times the worst of times   

#一个tible:12 x 2
词块
1 it NA
2是NA
3北美
4最佳北美酒店
NA的5
6次最好的时候是
7它是NA
8是NA
9北美
10最差的NA
11/NA
是最糟糕的12倍

虽然不是一个明确的lead
或lag
功能，但它通常也可以用于此目的。
类似于@arg0naut，但没有dplyr：
r  = 1:nrow(df)
w  = which(df$words == "times")
wm = lapply(w, function(wi) intersect(r, seq(wi-3L, wi+3L)))

df$chunks <- NA_character_
df$chunks[w] <- tapply(df$words[unlist(wm)], rep(w, lengths(wm)), FUN = paste, collapse=" ")

# A tibble: 12 x 2
   words chunks                      
   <chr> <chr>                       
 1 it    <NA>                        
 2 was   <NA>                        
 3 the   <NA>                        
 4 best  <NA>                        
 5 of    <NA>                        
 6 times the best of times it was the
 7 it    <NA>                        
 8 was   <NA>                        
 9 the   <NA>                        
10 worst <NA>                        
11 of    <NA>                        
12 times the worst of times      

data.table:：shift
为n
（lag）参数接受一个向量并输出一个列表，因此您可以使用该向量和do.call（将列表元素粘贴在一起）。但是，除非您使用的是data.table version>=1.12，否则我认为它不会让您混合正负n
值（如下所示）
使用数据表：
library(data.table)
setDT(df)

df[, chunks := trimws(ifelse(words != "times", NA, do.call(paste, shift(words, 3:-3, ''))))]

#     words                       chunks
#  1:    it                         <NA>
#  2:   was                         <NA>
#  3:   the                         <NA>
#  4:  best                         <NA>
#  5:    of                         <NA>
#  6: times the best of times it was the
#  7:    it                         <NA>
#  8:   was                         <NA>
#  9:   the                         <NA>
# 10: worst                         <NA>
# 11:    of                         <NA>
# 12: times           the worst of times

下面是另一个使用lag
和lead

laglead_f <- function(what, range)
    setNames(paste(what, "(., ", range, ", default = '')"), paste(what, range))

df %>%
    mutate_at(vars(words), funs_(c(laglead_f("lag", 3:0), laglead_f("lead", 1:3)))) %>%
    unite(chunks, -words, sep = " ") %>%
    mutate(chunks = ifelse(words == "times", trimws(chunks), NA))
## A tibble: 12 x 2
#   words chunks
#   <chr> <chr>
# 1 it    NA
# 2 was   NA
# 3 the   NA
# 4 best  NA
# 5 of    NA
# 6 times the best of times it was the
# 7 it    NA
# 8 was   NA
# 9 the   NA
#10 worst NA
#11 of    NA
#12 times the worst of times

laglead\u f%
变异（变量（单词），funs（滞后，3:0），滞后（1:3）））%>%
联合（块，-words，sep=“”）%>%
变异（块=ifelse（单词=times），trimws（块），NA））
##一个tibble:12x2
#词块
#    
#1 it NA
#2是NA
#3北美
#4最佳北美酒店
#NA的5
#6次最好的时候是
#7它是NA
#8是NA
#9北美
#10最差的NA
#11/NA
#是最糟糕的12倍

其思想是将三个lag
ged和lead
ing向量中的值存储在新的列中，使用mutate\u at
和一个命名函数unite
这些列，然后根据您的条件过滤条目，其中words==“times”

library(dplyr)

df %>% 
  mutate(chunks = do.call(paste, data.table::shift(words, 3:-3, fill = '')),
         chunks = trimws(ifelse(words != "times", NA, chunks)))

# # A tibble: 12 x 2
#    words chunks                      
#    <chr> <chr>                       
#  1 it    NA                          
#  2 was   NA                          
#  3 the   NA                          
#  4 best  NA                          
#  5 of    NA                          
#  6 times the best of times it was the
#  7 it    NA                          
#  8 was   NA                          
#  9 the   NA                          
# 10 worst NA                          
# 11 of    NA                          
# 12 times the worst of times         

laglead_f <- function(what, range)
    setNames(paste(what, "(., ", range, ", default = '')"), paste(what, range))

df %>%
    mutate_at(vars(words), funs_(c(laglead_f("lag", 3:0), laglead_f("lead", 1:3)))) %>%
    unite(chunks, -words, sep = " ") %>%
    mutate(chunks = ifelse(words == "times", trimws(chunks), NA))
## A tibble: 12 x 2
#   words chunks
#   <chr> <chr>
# 1 it    NA
# 2 was   NA
# 3 the   NA
# 4 best  NA
# 5 of    NA
# 6 times the best of times it was the
# 7 it    NA
# 8 was   NA
# 9 the   NA
#10 worst NA
#11 of    NA
#12 times the worst of times