R 从宽到长重塑，反之亦然（多状态/生存分析数据集）_R_Aggregate_Reshape

R 从宽到长重塑，反之亦然（多状态/生存分析数据集）

R 从宽到长重塑，反之亦然（多状态/生存分析数据集）,r,aggregate,reshape,R,Aggregate,Reshape,我正在尝试使用重塑（）重塑以下数据集，但没有太多结果起始数据集是“宽”格式的，每个id通过一行进行描述。该数据集旨在用于进行多状态分析（生存分析的推广）每个人都会在给定的总时间跨度内进行记录。在此期间，主体可以经历多个状态之间的转换（为了简单起见，让我们将可访问的不同状态的最大数量固定为两个）。第一次访问的状态是s1=1,2,3,4。此人在dur1时间段内停留在该状态，第二次访问状态s2也同样如此： id cohort s1 dur1 s2 dur2

我正在尝试使用

重塑（）

重塑以下数据集，但没有太多结果

起始数据集是“宽”格式的，每个id通过一行进行描述。该数据集旨在用于进行多状态分析（生存分析的推广）

每个人都会在给定的总时间跨度内进行记录。在此期间，主体可以经历多个状态之间的转换（为了简单起见，让我们将可访问的不同状态的最大数量固定为两个）。第一次访问的状态是

s1=1,2,3,4

。此人在

dur1

时间段内停留在该状态，第二次访问状态

s2

也同样如此：

   id    cohort    s1     dur1     s2     dur2     
     1      1        3      4       2      5       
     2      0        1      4       4      3

我希望获得的长格式数据集是：

id    cohort    s    
1       1       3
1       1       3
1       1       3
1       1       3
1       1       2
1       1       2
1       1       2
1       1       2
1       1       2
2       0       1
2       0       1
2       0       1
2       0       1
2       0       4
2       0       4
2       0       4

实际上，每个id都有

dur1+dur2

行，

s1

和

s2

被融化在单个变量

中

您将如何进行此转换？另外，您将如何恢复原始数据集的“宽”形式

非常感谢

dat <- cbind(id=c(1,2), cohort=c(1, 0), s1=c(3, 1), dur1=c(4, 4), s2=c(2, 4), dur2=c(5, 3))

dat可能有更好的方法，但这可能有效
df <- read.table(text = '
   id    cohort    s1     dur1     s2     dur2     
     1      1        3      4       2      5       
     2      0        1      4       4      3',
header=TRUE)

hist <- matrix(0, nrow=2, ncol=9)
hist

for(i in 1:nrow(df)) {

hist[i,] <- c(rep(df[i,3], df[i,4]), rep(df[i,5], df[i,6]), rep(0, (9 - df[i,4] - df[i,6])))

}

hist

hist2 <- cbind(df[,1:2], hist)
colnames(hist2) <- c('id', 'cohort', paste('x', seq_along(1:9), sep=''))

library(reshape2)

hist3 <- melt(hist2, id.vars=c('id', 'cohort'), variable.name='x', value.name='state')

hist4 <- hist3[order(hist3$id, hist3$cohort),]
hist4

hist4 <- hist4[ , !names(hist4) %in% c("x")]

hist4 <- hist4[!(hist4[,2]==0 & hist4[,3]==0),]

当然，如果每个id有两个以上的状态，则必须对其进行修改（如果有两个以上的队列，则可能必须对其进行修改）。例如，我假设有9个采样周期，一个人可能处于以下状态序列中：
1 3 2 4 3 4 1 1 2

也许有更好的方法，但这可能有效
df <- read.table(text = '
   id    cohort    s1     dur1     s2     dur2     
     1      1        3      4       2      5       
     2      0        1      4       4      3',
header=TRUE)

hist <- matrix(0, nrow=2, ncol=9)
hist

for(i in 1:nrow(df)) {

hist[i,] <- c(rep(df[i,3], df[i,4]), rep(df[i,5], df[i,6]), rep(0, (9 - df[i,4] - df[i,6])))

}

hist

hist2 <- cbind(df[,1:2], hist)
colnames(hist2) <- c('id', 'cohort', paste('x', seq_along(1:9), sep=''))

library(reshape2)

hist3 <- melt(hist2, id.vars=c('id', 'cohort'), variable.name='x', value.name='state')

hist4 <- hist3[order(hist3$id, hist3$cohort),]
hist4

hist4 <- hist4[ , !names(hist4) %in% c("x")]

hist4 <- hist4[!(hist4[,2]==0 & hist4[,3]==0),]

当然，如果每个id有两个以上的状态，则必须对其进行修改（如果有两个以上的队列，则可能必须对其进行修改）。例如，我假设有9个采样周期，一个人可能处于以下状态序列中：
1 3 2 4 3 4 1 1 2

第一步可以使用重塑（）
，但接下来需要做更多的工作。另外，reformate（）
需要一个data.frame（）
作为其输入，但是示例数据是一个矩阵
以下是如何继续：
重塑（）
您的数据从宽到长：
dat2 <- reshape(data.frame(dat), direction = "long", 
                idvar = c("id", "cohort"),
                varying = 3:ncol(dat), sep = "")
dat2
#       id cohort time s dur
# 1.1.1  1      1    1 3   4
# 2.0.1  2      0    1 1   4
# 1.1.2  1      1    2 2   5
# 2.0.2  2      0    2 4   3


通过使用rownames（dat3）您也可以摆脱那些时髦的行名称。您可以在第一步中使用restrape（）
，但接下来需要做更多的工作。另外，reformate（）
需要一个data.frame（）
作为其输入，但是示例数据是一个矩阵
以下是如何继续：
重塑（）
您的数据从宽到长：
dat2 <- reshape(data.frame(dat), direction = "long", 
                idvar = c("id", "cohort"),
                varying = 3:ncol(dat), sep = "")
dat2
#       id cohort time s dur
# 1.1.1  1      1    1 3   4
# 2.0.1  2      0    1 1   4
# 1.1.2  1      1    2 2   5
# 2.0.2  2      0    2 4   3


通过使用rownames（dat3）非常简洁的答案，您也可以摆脱时髦的行名称，非常感谢！请问如何进行“反向”操作？（即从初始的dat3
矩阵开始获取dat
。@Stezzo，请参阅我的更新。它基本上涉及到使用aggregate（）
和reformate（）
，但是在创建“dat3”时不需要删除任何列。答案很完美，再次感谢。我已经发布了“反转”问题（长-->宽），以防我们没有用于使用上述aggregate（）
策略的起始宽数据集。非常简洁的回答，非常感谢！请问如何进行“反向”操作？（即从初始的dat3
矩阵开始获取dat
。@Stezzo，请参阅我的更新。它基本上涉及到使用aggregate（）
和reformate（）
，但是在创建“dat3”时不需要删除任何列。答案很完美，再次感谢。我已经发布了“反转”问题（长-->宽），以防我们没有用于使用上述aggregate（）策略的起始宽数据集。
reshape(aggregate(cbind(s, dur) ~ ., dat3, unique), 
        direction = "wide", idvar = c("id", "cohort"))
#   id cohort s.1 dur.1 s.2 dur.2
# 1  2      0   1     4   4     3
# 2  1      1   3     4   2     5