R 创建一列,其中包含每个人的基线时间

R 创建一列,其中包含每个人的基线时间,r,R,我有这样一个数据集: df=data.frame(subject= c(rep(1, 3), rep(2, 2),rep(3,4)), visit=c(1:3,1:2,1:4),time=c('2003-03-07 6:34','2003-03-07 7:33','2003-03-07 8:15','2003-03-15 6:42','2003-03-15 7:42','2003-03-16 6:20','2003-03-16 6:40','2003-03-16 7:38','2003-03-1

我有这样一个数据集:

df=data.frame(subject= c(rep(1, 3), rep(2, 2),rep(3,4)), visit=c(1:3,1:2,1:4),time=c('2003-03-07 6:34','2003-03-07 7:33','2003-03-07 8:15','2003-03-15 6:42','2003-03-15 7:42','2003-03-16 6:20','2003-03-16 6:40','2003-03-16 7:38','2003-03-16 8:42')) 

  subject visit            time
1       1     1 2003-03-07 6:34
2       1     2 2003-03-07 7:33
3       1     3 2003-03-07 8:15
4       2     1 2003-03-15 6:42
5       2     2 2003-03-15 7:42
6       3     1 2003-03-16 6:20
7       3     2 2003-03-16 6:40
8       3     3 2003-03-16 7:38
9       3     4 2003-03-16 8:42
我希望创建一个列,以包含每次就诊时每个人的基线时间,预期输出应如下所示:

df1=data.frame(subject= c(rep(1, 3), rep(2, 2),rep(3,4)), visit=c(1:3,1:2,1:4),time=c('2003-03-07 6:34','2003-03-07 6:34','2003-03-07 6:34','2003-03-15 6:42','2003-03-15 6:42','2003-03-16 6:20','2003-03-16 6:20','2003-03-16 6:20','2003-03-16 6:20')) 

  subject visit            time
1       1     1 2003-03-07 6:34
2       1     2 2003-03-07 6:34
3       1     3 2003-03-07 6:34
4       2     1 2003-03-15 6:42
5       2     2 2003-03-15 6:42
6       3     1 2003-03-16 6:20
7       3     2 2003-03-16 6:20
8       3     3 2003-03-16 6:20
9       3     4 2003-03-16 6:20
有人知道如何实现这一点吗?

选项1(假设排序顺序):

选项2(一个更稳健的解决方案,可以确定哪一个是第一次就诊):

选项3(转换为POSIXct并使用
min
):

选项4(可能最快/最容易):


您可以为此使用
dplyr

require(dplyr)

df %>%
  group_by(subject) %>%
  summarize(time2 = time[1]) %>%
  left_join(df, by = "subject")
下面是生成的数据帧:

  subject           time2 visit            time
1       1 2003-03-07 6:34     1 2003-03-07 6:34
2       1 2003-03-07 6:34     2 2003-03-07 7:33
3       1 2003-03-07 6:34     3 2003-03-07 8:15
4       2 2003-03-15 6:42     1 2003-03-15 6:42
5       2 2003-03-15 6:42     2 2003-03-15 7:42
6       3 2003-03-16 6:20     1 2003-03-16 6:20
7       3 2003-03-16 6:20     2 2003-03-16 6:40
8       3 2003-03-16 6:20     3 2003-03-16 7:38
9       3 2003-03-16 6:20     4 2003-03-16 8:42

数据表
方法

library(data.table)
setDT(df)[, time2 := min(as.POSIXct(time)), by = subject]
library(dplyr)
df %>%
  group_by(subject) %>%
  mutate(time = min(as.POSIXct(time)))

dplyr
进近

library(data.table)
setDT(df)[, time2 := min(as.POSIXct(time)), by = subject]
library(dplyr)
df %>%
  group_by(subject) %>%
  mutate(time = min(as.POSIXct(time)))

感谢大家分享这些有用的解决方案!只是做了一些简单的调整。如果在
ave
中只有一个分组因子,则不需要
中的
内的
列表(…)
如果在
ave
中只有一个分组因子,则不需要
中的
应该保留,分组因子应该包含在
列表(…)
命令中,对吗?
require(dplyr)

df %>%
  group_by(subject) %>%
  summarize(time2 = time[1]) %>%
  left_join(df, by = "subject")
  subject           time2 visit            time
1       1 2003-03-07 6:34     1 2003-03-07 6:34
2       1 2003-03-07 6:34     2 2003-03-07 7:33
3       1 2003-03-07 6:34     3 2003-03-07 8:15
4       2 2003-03-15 6:42     1 2003-03-15 6:42
5       2 2003-03-15 6:42     2 2003-03-15 7:42
6       3 2003-03-16 6:20     1 2003-03-16 6:20
7       3 2003-03-16 6:20     2 2003-03-16 6:40
8       3 2003-03-16 6:20     3 2003-03-16 7:38
9       3 2003-03-16 6:20     4 2003-03-16 8:42
library(data.table)
setDT(df)[, time2 := min(as.POSIXct(time)), by = subject]
library(dplyr)
df %>%
  group_by(subject) %>%
  mutate(time = min(as.POSIXct(time)))