R 在新列中连接当前行和后续行

R 在新列中连接当前行和后续行,r,loops,dataframe,concatenation,R,Loops,Dataframe,Concatenation,假设我们在R中有这个数据帧: df <- data.frame(id = c(rep(1,5), rep(2, 3), rep(3, 4), rep(4, 2)), brand = c("A", "B", "A", "D", "Closed", "B", "C", "D", "D", "A", "B", "Closed", "C", "Closed")) > df # id brand #1 1 A #2 1 B #3 1 A #4

假设我们在R中有这个数据帧:

df <- data.frame(id = c(rep(1,5), rep(2, 3), rep(3, 4), rep(4, 2)), brand = c("A", "B", "A", "D", "Closed", "B", "C", "D", "D", "A", "B", "Closed", "C", "Closed"))

> df
#   id  brand
#1   1      A
#2   1      B
#3   1      A
#4   1      D
#5   1 Closed
#6   2      B
#7   2      C
#8   2      D
#9   3      D
#10  3      A
#11  3      B
#12  3 Closed
#13  4      C
#14  4 Closed
df
#id品牌
#11A
#21 B
#31A
#41D
#5.1关闭
#6.2 B
#7.2 C
#82D
#9 3 D
#103A
#11.3 B
#12.3关闭
#13 4 C
#14.4关闭
我想创建一个新变量,表示brand列中从当前行到下一行的更改,但这只能发生在每个id号内

我创建新列:

df$brand_chg <- ""
df$brand\u chg B
#2 1 B->A
#3 1 A->D
#4 1D->关闭
#5.1关闭
#6 2 B->C
#7 2 C C->D
#82D
#9 3D->A
#10 3 A->B
#11 3 B->关闭
#12.3关闭
#13 4 C C->关闭
#14.4关闭
然而,在一个有287k行的表上,这个循环至少需要10分钟才能运行。有人知道一种更快的方法来完成这种连接吗


谢谢您,我感谢您的见解。

使用
dplyr
软件包:

library(dplyr)

df %>% group_by(id) %>% 
    mutate(brand_chg = ifelse(seq_along(brand) == n(), 
                              "", 
                              paste(brand, lead(brand), sep = "->")))

还有dplyr,只是有点不同,没有更好的!使用is.na而不是n==n()


这里有一个使用
数据的选项。表

library(data.table)
setDT(df)[, brand_chg := paste(brand, shift(brand, type = "lead"), sep="->"), id]
df[df[, .I[.N] , id]$V1, brand_chg := ""]
df
#    id  brand brand_chg
# 1:  1      A      A->B
# 2:  1      B      B->A
# 3:  1      A      A->D
# 4:  1      D D->Closed
# 5:  1 Closed          
# 6:  2      B      B->C
# 7:  2      C      C->D
# 8:  2      D          
# 9:  3      D      D->A
#10:  3      A      A->B
#11:  3      B B->Closed
#12:  3 Closed          
#13:  4      C C->Closed
#14:  4 Closed          

或者是一个紧凑的选择

setDT(df)[, brand_chg := c(paste(brand[-.N], brand[-1], sep="->"), ""), id]

使用(df,ave(brand,id,FUN=function(x)c(粘贴(head(x,-1),tail(x,-1),sep='->'),'')在287k行
上未测试
我在
with()
中得到错误,但是当删除
ave()
函数时,会给出正确连接的列表。非常感谢。我得研究一下这到底是怎么回事。谢谢!这是可行的,它保持数据帧的形式。您是否介意解释一下
seq_-along(brand)==n()
seq_-along(brand)==n()
对于组的最后一行返回true,
seq_-along
类似于行索引,
n()
是每组的行数。
library(data.table)
setDT(df)[, brand_chg := paste(brand, shift(brand, type = "lead"), sep="->"), id]
df[df[, .I[.N] , id]$V1, brand_chg := ""]
df
#    id  brand brand_chg
# 1:  1      A      A->B
# 2:  1      B      B->A
# 3:  1      A      A->D
# 4:  1      D D->Closed
# 5:  1 Closed          
# 6:  2      B      B->C
# 7:  2      C      C->D
# 8:  2      D          
# 9:  3      D      D->A
#10:  3      A      A->B
#11:  3      B B->Closed
#12:  3 Closed          
#13:  4      C C->Closed
#14:  4 Closed          
setDT(df)[, brand_chg := c(paste(brand[-.N], brand[-1], sep="->"), ""), id]