使用启动/停止时间而不是后续时间时，Pyear的tcut行为_R_Survival

使用启动/停止时间而不是后续时间时，Pyear的tcut行为

使用启动/停止时间而不是后续时间时，Pyear的tcut行为,r,survival,R,Survival,我试图使用pyears来估计队列中的发病率，其中我感兴趣的协变量之一是事件发生时的年龄（而不是注册时的年龄，即注册队列）。事件发生时的年龄当然取决于时间。要做到这一点，正确的方法似乎是在入学时使用tcut，如Pyear帮助中所示。但是，它似乎仅在“开始时间”始终为零时才起作用（或者使用为Surv对象提供后续时间而不是开始/结束时间的等效方法）。对于我的场景，使用实际开始/结束时间很重要，因为我还想使用其他时变协变量，如日历年下面是一个例子来说明这个问题： library(tidyverse)

我试图使用pyears来估计队列中的发病率，其中我感兴趣的协变量之一是事件发生时的年龄（而不是注册时的年龄，即注册队列）。事件发生时的年龄当然取决于时间。要做到这一点，正确的方法似乎是在入学时使用tcut，如Pyear帮助中所示。但是，它似乎仅在“开始时间”始终为零时才起作用（或者使用为Surv对象提供后续时间而不是开始/结束时间的等效方法）。对于我的场景，使用实际开始/结束时间很重要，因为我还想使用其他时变协变量，如日历年

下面是一个例子来说明这个问题：

library(tidyverse)
library(survival)

# encode actual start/end dates
s1 <- tibble(stime = as.numeric(as.Date("2000-01-01")) + 1:10,
             etime = stime + 365.25,
             futime = etime - stime,
             outcome = c(1,1,1,0,0,0,0,0,0,0),
             age.enr = floor(runif(10, 15, 64.999)))

# encode time elapsed from origin of zero
s2 <- tibble(stime = 0,
             etime = stime + 365.25,
             futime = etime - stime,
             outcome = c(1,1,1,0,0,0,0,0,0,0),
             age.enr = floor(runif(10, 15, 64.999)))

# these ought to give the same results, but don't (the second one appears to be right)
pyears(Surv(stime, etime, outcome) ~ tcut(age.enr, c(0, 24, 34, 44, 54, 64), scale=365.25), data=s1, scale=1)$pyears
pyears(Surv(futime, outcome) ~ tcut(age.enr, c(0, 24, 34, 44, 54, 64), scale=365.25), data=s1, scale=1)$pyears

# test it with a dataset where start time is always zero - works
pyears(Surv(stime, etime, outcome) ~ tcut(age.enr, c(0, 24, 34, 44, 54, 64), scale=365.25), data=s2, scale=1)$pyears
pyears(Surv(futime, outcome) ~ tcut(age.enr, c(0, 24, 34, 44, 54, 64), scale=365.25), data=s2, scale=1)$pyears

第一个示例在提供开始/结束时间时失败，但在提供经过的时间时有效，而第二个示例在开始/结束或经过的时间下都有效（因为开始时间被人为设置为零）

我意识到这是一个解决方案，但是pyears+tcut的行为不应该与间隔编码方式相同吗？我是否误解了tcut应该做什么

谢谢，

Peter

我的目标是正确地将年龄制成表格，要求在间隔开始时指定年龄，而不是在（之前的一些注册）日期指定年龄，如下所示：

# another example, using DOB which is truly constant
set.seed(1234)
s1 <- tibble(stime = as.numeric(as.Date("2000-01-01")) + 1:10,
             etime = stime + 3652.50,
             outcome = c(1,1,1,0,0,0,0,0,0,0),
             dob = round(runif(10, as.Date("1930-01-01"), 
                               as.Date("1985-01-01"))),
             age.enr = floor((stime - dob)/365.25),
             age.end = floor((etime - dob)/365.25),
             sobj = Surv(etime - stime, outcome)) # just for convenience
summary(s1)
s1 %>% mutate_at(vars(stime, etime, dob), ~as.Date(.x, origin="1970-01-01"))

s1$enrd <- s1$stime - 365.25*3               # simulate an erolment date 3 years prior to this interval
s1$age.int <- s1$age.enr                     # actually, this is the age at beginning of interval, not enrolment
s1$age.enr <- floor((s1$enrd - s1$dob)/365.25)

pyears(sobj ~ tcut(age.enr, c(0, 25, 35, 45, 55, 65,999), scale=365.25), data=s1)$pyears # incorrect
pyears(sobj ~ tcut(age.int, c(0, 25, 35, 45, 55, 65,999), scale=365.25), data=s1)$pyears # correct

#另一个例子，使用真正恒定的DOB
种子集（1234）
s1%突变发生在（变量（时间、时间、出生日期）~as.Date（.x，origin=“1970-01-01”））
s1$enrd我认为这与tcut
无关。问题在于Surv模型。当您传入开始时间和结束时间时，Surv
假定您使用的是间隔审查，而不是右审查。让我困惑的是，为什么需要模型中的实际开始和结束时间。您可以将整个Surv对象作为数据框中的一列，并将您喜欢的任何协变量传递给模型。这比强迫Surv做一些不寻常的事情，并试图从中提取协变量要容易得多。事实上，除了@AllanCameron指出的问题外，我还注意到另一个概念上的缺陷。由于临时向外迁移，我的跟进时间有间隔，因此我用开始和结束时间对时间间隔进行编码，因为同一个人可以在时间线上的多个时间间隔内跟进。然而，在这个例子中，tcut无法知道注册日期，所以我真正需要运行tcut的是“间隔开始时的年龄”。我已经将代码转换为使用这种方法。
# another example, using DOB which is truly constant
set.seed(1234)
s1 <- tibble(stime = as.numeric(as.Date("2000-01-01")) + 1:10,
             etime = stime + 3652.50,
             outcome = c(1,1,1,0,0,0,0,0,0,0),
             dob = round(runif(10, as.Date("1930-01-01"), 
                               as.Date("1985-01-01"))),
             age.enr = floor((stime - dob)/365.25),
             age.end = floor((etime - dob)/365.25),
             sobj = Surv(etime - stime, outcome)) # just for convenience
summary(s1)
s1 %>% mutate_at(vars(stime, etime, dob), ~as.Date(.x, origin="1970-01-01"))

s1$enrd <- s1$stime - 365.25*3               # simulate an erolment date 3 years prior to this interval
s1$age.int <- s1$age.enr                     # actually, this is the age at beginning of interval, not enrolment
s1$age.enr <- floor((s1$enrd - s1$dob)/365.25)

pyears(sobj ~ tcut(age.enr, c(0, 25, 35, 45, 55, 65,999), scale=365.25), data=s1)$pyears # incorrect
pyears(sobj ~ tcut(age.int, c(0, 25, 35, 45, 55, 65,999), scale=365.25), data=s1)$pyears # correct