R 创建一列,其中包含每个人的基线时间
我有这样一个数据集:R 创建一列,其中包含每个人的基线时间,r,R,我有这样一个数据集: df=data.frame(subject= c(rep(1, 3), rep(2, 2),rep(3,4)), visit=c(1:3,1:2,1:4),time=c('2003-03-07 6:34','2003-03-07 7:33','2003-03-07 8:15','2003-03-15 6:42','2003-03-15 7:42','2003-03-16 6:20','2003-03-16 6:40','2003-03-16 7:38','2003-03-1
df=data.frame(subject= c(rep(1, 3), rep(2, 2),rep(3,4)), visit=c(1:3,1:2,1:4),time=c('2003-03-07 6:34','2003-03-07 7:33','2003-03-07 8:15','2003-03-15 6:42','2003-03-15 7:42','2003-03-16 6:20','2003-03-16 6:40','2003-03-16 7:38','2003-03-16 8:42'))
subject visit time
1 1 1 2003-03-07 6:34
2 1 2 2003-03-07 7:33
3 1 3 2003-03-07 8:15
4 2 1 2003-03-15 6:42
5 2 2 2003-03-15 7:42
6 3 1 2003-03-16 6:20
7 3 2 2003-03-16 6:40
8 3 3 2003-03-16 7:38
9 3 4 2003-03-16 8:42
我希望创建一个列,以包含每次就诊时每个人的基线时间,预期输出应如下所示:
df1=data.frame(subject= c(rep(1, 3), rep(2, 2),rep(3,4)), visit=c(1:3,1:2,1:4),time=c('2003-03-07 6:34','2003-03-07 6:34','2003-03-07 6:34','2003-03-15 6:42','2003-03-15 6:42','2003-03-16 6:20','2003-03-16 6:20','2003-03-16 6:20','2003-03-16 6:20'))
subject visit time
1 1 1 2003-03-07 6:34
2 1 2 2003-03-07 6:34
3 1 3 2003-03-07 6:34
4 2 1 2003-03-15 6:42
5 2 2 2003-03-15 6:42
6 3 1 2003-03-16 6:20
7 3 2 2003-03-16 6:20
8 3 3 2003-03-16 6:20
9 3 4 2003-03-16 6:20
有人知道如何实现这一点吗?选项1(假设排序顺序):
选项2(一个更稳健的解决方案,可以确定哪一个是第一次就诊):
选项3(转换为POSIXct并使用min
):
选项4(可能最快/最容易):
您可以为此使用
dplyr
require(dplyr)
df %>%
group_by(subject) %>%
summarize(time2 = time[1]) %>%
left_join(df, by = "subject")
下面是生成的数据帧:
subject time2 visit time
1 1 2003-03-07 6:34 1 2003-03-07 6:34
2 1 2003-03-07 6:34 2 2003-03-07 7:33
3 1 2003-03-07 6:34 3 2003-03-07 8:15
4 2 2003-03-15 6:42 1 2003-03-15 6:42
5 2 2003-03-15 6:42 2 2003-03-15 7:42
6 3 2003-03-16 6:20 1 2003-03-16 6:20
7 3 2003-03-16 6:20 2 2003-03-16 6:40
8 3 2003-03-16 6:20 3 2003-03-16 7:38
9 3 2003-03-16 6:20 4 2003-03-16 8:42
数据表
方法
library(data.table)
setDT(df)[, time2 := min(as.POSIXct(time)), by = subject]
library(dplyr)
df %>%
group_by(subject) %>%
mutate(time = min(as.POSIXct(time)))
dplyr
进近
library(data.table)
setDT(df)[, time2 := min(as.POSIXct(time)), by = subject]
library(dplyr)
df %>%
group_by(subject) %>%
mutate(time = min(as.POSIXct(time)))
感谢大家分享这些有用的解决方案!只是做了一些简单的调整。如果在
ave
中只有一个分组因子,则不需要中的和内的和列表(…)
如果在ave
中只有一个分组因子,则不需要中的
应该保留,分组因子应该包含在列表(…)
命令中,对吗?
require(dplyr)
df %>%
group_by(subject) %>%
summarize(time2 = time[1]) %>%
left_join(df, by = "subject")
subject time2 visit time
1 1 2003-03-07 6:34 1 2003-03-07 6:34
2 1 2003-03-07 6:34 2 2003-03-07 7:33
3 1 2003-03-07 6:34 3 2003-03-07 8:15
4 2 2003-03-15 6:42 1 2003-03-15 6:42
5 2 2003-03-15 6:42 2 2003-03-15 7:42
6 3 2003-03-16 6:20 1 2003-03-16 6:20
7 3 2003-03-16 6:20 2 2003-03-16 6:40
8 3 2003-03-16 6:20 3 2003-03-16 7:38
9 3 2003-03-16 6:20 4 2003-03-16 8:42
library(data.table)
setDT(df)[, time2 := min(as.POSIXct(time)), by = subject]
library(dplyr)
df %>%
group_by(subject) %>%
mutate(time = min(as.POSIXct(time)))