R 以区间格式重塑宽数据集_R_Reshape

R 以区间格式重塑宽数据集

R 以区间格式重塑宽数据集,r,reshape,R,Reshape,我正在处理一个“宽”数据集，现在我想使用一个特定的包（-msSurv-，用于非参数多状态模型），它需要区间形式的数据我的当前数据集的特点是每个人有一行： dat <- read.table(text = " id cohort t0 s1 t1 s2 t2 s3 t3 1 2 0 1 50 2 70 4 100 2 1

我正在处理一个“宽”数据集，现在我想使用一个特定的包（

-msSurv-

，用于非参数多状态模型），它需要区间形式的数据

我的当前数据集的特点是每个人有一行：

dat <- read.table(text = "

   id    cohort   t0    s1     t1     s2      t2     s3    t3
    1      2      0      1     50      2      70     4     100
    2      1      0      2     15      3      100    0     0   

", header=TRUE)

我希望这个例子足够清楚，否则请让我知道，我将尝试进一步澄清

您将如何使此重塑自动化？考虑到我有相当数量的（模拟）个体，大约100万。

非常感谢您的帮助。

我想我能理解。这行吗

require(data.table)
dt <- data.table(dat, key=c("id", "cohort"))
dt.out <- dt[,  list(t.start=c(t0,t1,t2), t.stop=c(t1,t2,t3), 
                     start.s=c(s1,s2,s3), end.s=c(s2,s3,s3)), 
                     by = c("id", "cohort")]

#    id cohort t.start t.stop start.s end.s
# 1:  1      2       0     50       1     2
# 2:  1      2      50     70       2     4
# 3:  1      2      70    100       4     4
# 4:  2      1       0     15       2     3
# 5:  2      1      15    100       3     0
# 6:  2      1     100      0       0     0

require（data.table）
dt=start.s总是
不，不，不
#id队列t.start t.stop start.s end.s
# 1:  1      2       0     50       1     2
# 2:  1      2      50     70       2     4
# 3:  1      2      70    100       4     4
# 4:  2      1       0     15       2     3
# 5:  2      1      15    100       3     3

如果您使用的是*nix机器，我建议您看看awk。

require(data.table)
dt <- data.table(dat, key=c("id", "cohort"))
dt.out <- dt[,  list(t.start=c(t0,t1,t2), t.stop=c(t1,t2,t3), 
                     start.s=c(s1,s2,s3), end.s=c(s2,s3,s3)), 
                     by = c("id", "cohort")]

#    id cohort t.start t.stop start.s end.s
# 1:  1      2       0     50       1     2
# 2:  1      2      50     70       2     4
# 3:  1      2      70    100       4     4
# 4:  2      1       0     15       2     3
# 5:  2      1      15    100       3     0
# 6:  2      1     100      0       0     0

# remove rows where start.s and end.s are both 0
dt.out <- dt.out[, .SD[start.s > 0 | end.s > 0], by=1:nrow(dt.out)]
# replace end.s values with corresponding start.s values where end.s == 0
# it can be easily done with max(start.s, end.s) because end.s >= start.s ALWAYS
dt.out <- dt.out[, end.s := max(start.s, end.s), by=1:nrow(dt.out)]
dt.out[, nrow:=NULL]

> dt.out
#    id cohort t.start t.stop start.s end.s
# 1:  1      2       0     50       1     2
# 2:  1      2      50     70       2     4
# 3:  1      2      70    100       4     4
# 4:  2      1       0     15       2     3
# 5:  2      1      15    100       3     3