只为每个用户保留最后的事件名称（r语言）_R

只为每个用户保留最后的事件名称（r语言）

只为每个用户保留最后的事件名称（r语言）,r,R,我有一个表格（输入）：我需要添加最后一个进度列，这取决于时间戳期望输出： user_id timestamp progression lastProgression 1 Rob 22223333 Level1 Level3 2 Mike 33334444 Level2 Level7 3 Rob 55558888 Level3 Level3 4 Mike 44447777 Level7 Level7 使用bas

我有一个表格（输入）：

我需要添加最后一个

进度

列，这取决于

时间戳

期望输出：

user_id  timestamp  progression lastProgression

1 Rob     22223333   Level1     Level3
2 Mike    33334444   Level2     Level7
3 Rob     55558888   Level3     Level3
4 Mike    44447777   Level7     Level7

使用

base R

中的

ave

，我们可以在按“user_id”（假设“timestamp”）分组后选择最后一个观察（

tail（x，1）

）

df1$lastProgression <- with(df1, ave(progression, user_id, FUN= function(x) tail(x,1)))

或者使用

which.max

修改的选项（来自@docendo discimus comments）

使用

base R

中的

ave

，我们可以在按“user_id”（假设“timestamp”）分组后选择最后一个观察（

tail（x，1）

）

df1$lastProgression <- with(df1, ave(progression, user_id, FUN= function(x) tail(x,1)))

或者使用

which.max

修改的选项（来自@docendo discimus comments）

这里可能不需要外部包，但我会使用

library（data.table）；setDT（df）[unique（df[order（-timestamp）]，by=“user\u id”），lastProgression:=i.progression，on=“user\u id”]

为了提高效率，或者

setDT（df）[，lastProgression:=progression[which.max（timestamp）]，by=user\u id]

这里可能不需要外部包，但我会使用

库（data.table）；setDT（df）[unique（df[order（-timestamp）]，by=“user\u id”），lastProgression:=i.progression，on=“user\u id”]

为了提高效率，或者

setDT（df）[，lastProgression:=progression[which.max（timestamp）]，by=user\u id]

我理解这个问题取决于

时间戳，也就是说，它不一定是最后一个条目（但可能是错误的）@docendodiscimus I使用dplyr
选项进行更新，在该选项中，它被命令选择first
值。由于timestamp
是一个数字/整数变量，我想您也可以使用groupby（df1，user\u id）%>%mutate（lastProgression=progression[which.max（timestamp）]）
我理解这个问题取决于时间戳
，也就是说，它不一定是最后一个条目（但可能是错误的）@docendodiscimus I使用dplyr
选项进行更新，在该选项中，它被命令选择第一个
值。因为时间戳
是一个数字/整数变量，我想你也可以使用groupby（df1，user\u id）%>%mutate（lastProgression=progression[which.max（timestamp）]）
library(dplyr)
df1 %>% 
  group_by(user_id) %>%
  arrange(desc(timestamp)) %>% 
  mutate(lastProgression = first(progression))

df1 %>%
   group_by(user_id) %>%
   mutate(lastProgression =  progression[which.max(timestamp)])