R 在新列中连接当前行和后续行
假设我们在R中有这个数据帧:R 在新列中连接当前行和后续行,r,loops,dataframe,concatenation,R,Loops,Dataframe,Concatenation,假设我们在R中有这个数据帧: df <- data.frame(id = c(rep(1,5), rep(2, 3), rep(3, 4), rep(4, 2)), brand = c("A", "B", "A", "D", "Closed", "B", "C", "D", "D", "A", "B", "Closed", "C", "Closed")) > df # id brand #1 1 A #2 1 B #3 1 A #4
df <- data.frame(id = c(rep(1,5), rep(2, 3), rep(3, 4), rep(4, 2)), brand = c("A", "B", "A", "D", "Closed", "B", "C", "D", "D", "A", "B", "Closed", "C", "Closed"))
> df
# id brand
#1 1 A
#2 1 B
#3 1 A
#4 1 D
#5 1 Closed
#6 2 B
#7 2 C
#8 2 D
#9 3 D
#10 3 A
#11 3 B
#12 3 Closed
#13 4 C
#14 4 Closed
df
#id品牌
#11A
#21 B
#31A
#41D
#5.1关闭
#6.2 B
#7.2 C
#82D
#9 3 D
#103A
#11.3 B
#12.3关闭
#13 4 C
#14.4关闭
我想创建一个新变量,表示brand列中从当前行到下一行的更改,但这只能发生在每个id号内
我创建新列:
df$brand_chg <- ""
df$brand\u chg B
#2 1 B->A
#3 1 A->D
#4 1D->关闭
#5.1关闭
#6 2 B->C
#7 2 C C->D
#82D
#9 3D->A
#10 3 A->B
#11 3 B->关闭
#12.3关闭
#13 4 C C->关闭
#14.4关闭
然而,在一个有287k行的表上,这个循环至少需要10分钟才能运行。有人知道一种更快的方法来完成这种连接吗
谢谢您,我感谢您的见解。使用
dplyr
软件包:
library(dplyr)
df %>% group_by(id) %>%
mutate(brand_chg = ifelse(seq_along(brand) == n(),
"",
paste(brand, lead(brand), sep = "->")))
还有dplyr,只是有点不同,没有更好的!使用is.na而不是n==n()
这里有一个使用
数据的选项。表
library(data.table)
setDT(df)[, brand_chg := paste(brand, shift(brand, type = "lead"), sep="->"), id]
df[df[, .I[.N] , id]$V1, brand_chg := ""]
df
# id brand brand_chg
# 1: 1 A A->B
# 2: 1 B B->A
# 3: 1 A A->D
# 4: 1 D D->Closed
# 5: 1 Closed
# 6: 2 B B->C
# 7: 2 C C->D
# 8: 2 D
# 9: 3 D D->A
#10: 3 A A->B
#11: 3 B B->Closed
#12: 3 Closed
#13: 4 C C->Closed
#14: 4 Closed
或者是一个紧凑的选择
setDT(df)[, brand_chg := c(paste(brand[-.N], brand[-1], sep="->"), ""), id]
使用(df,ave(brand,id,FUN=function(x)c(粘贴(head(x,-1),tail(x,-1),sep='->'),'')在287k行
上未测试我在with()
中得到错误,但是当删除ave()
函数时,会给出正确连接的列表。非常感谢。我得研究一下这到底是怎么回事。谢谢!这是可行的,它保持数据帧的形式。您是否介意解释一下seq_-along(brand)==n()
?seq_-along(brand)==n()
对于组的最后一行返回true,seq_-along
类似于行索引,n()
是每组的行数。
library(data.table)
setDT(df)[, brand_chg := paste(brand, shift(brand, type = "lead"), sep="->"), id]
df[df[, .I[.N] , id]$V1, brand_chg := ""]
df
# id brand brand_chg
# 1: 1 A A->B
# 2: 1 B B->A
# 3: 1 A A->D
# 4: 1 D D->Closed
# 5: 1 Closed
# 6: 2 B B->C
# 7: 2 C C->D
# 8: 2 D
# 9: 3 D D->A
#10: 3 A A->B
#11: 3 B B->Closed
#12: 3 Closed
#13: 4 C C->Closed
#14: 4 Closed
setDT(df)[, brand_chg := c(paste(brand[-.N], brand[-1], sep="->"), ""), id]