R 从长到宽进行聚合和重塑
我较早前曾提出这项质询,但得到的答覆与我的意愿不符。当时我用斯塔做这项工作。然而,由于我经常使用这些数据,我希望使用R来创建我想要的。我有一个按年龄、性别和诊断的每日住院数据集。我希望将数据从长到宽进行聚合和重塑。我怎样才能达到这个目标?示例数据和所需输出如下所示。列标题指定性别、年龄和诊断的前缀。 谢谢 样本数据R 从长到宽进行聚合和重塑,r,aggregate,reshape,R,Aggregate,Reshape,我较早前曾提出这项质询,但得到的答覆与我的意愿不符。当时我用斯塔做这项工作。然而,由于我经常使用这些数据,我希望使用R来创建我想要的。我有一个按年龄、性别和诊断的每日住院数据集。我希望将数据从长到宽进行聚合和重塑。我怎样才能达到这个目标?示例数据和所需输出如下所示。列标题指定性别、年龄和诊断的前缀。 谢谢 样本数据 structure(list(diag = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L,
structure(list(diag = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L), .Label = c("card", "cere"), class = "factor"), sex = structure(c(1L,
1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L,
1L, 2L, 2L, 1L, 1L, 2L, 2L), .Label = c("Female", "Male"), class = "factor"),
age = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("35-64",
"65-74"), class = "factor"), admissions = c(1L, 1L, 0L, 0L,
6L, 6L, 6L, 1L, 4L, 0L, 0L, 0L, 4L, 6L, 5L, 2L, 2L, 4L, 1L,
0L, 6L, 5L, 6L, 4L), bdate = structure(c(1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L), .Label = c("1987-01-01", "1987-01-02",
"1987-01-03"), class = "factor")), .Names = c("diag", "sex",
"age", "admissions", "bdate"), row.names = c(NA, -24L), class = "data.frame")
所需输出
structure(list(date = structure(1:3, .Label = c("01jan1987",
"02jan1987", "03jan1987"), class = "factor"), f3564card = c(1L,
4L, 2L), f6574card = c(1L, 0L, 4L), m3564card = c(0L, 0L, 1L),
m6574card = c(0L, 0L, 0L), f3564cere = c(6L, 4L, 6L), f6574cere = c(6L,
6L, 5L), m3564cere = c(6L, 5L, 6L), m6574cere = c(1L, 2L,
4L)), .Names = c("date", "f3564card", "f6574card", "m3564card",
"m6574card", "f3564cere", "f6574cere", "m3564cere", "m6574cere"
), class = "data.frame", row.names = c(NA, -3L))
您的数据已经是一种长格式,可以通过“重塑2”轻松使用,如下所示:
library(reshape)
dcast(df, bdate ~ sex + age + diag, value.var = "admissions")
# bdate Female_35-64_card Female_35-64_cere Female_65-74_card Female_65-74_cere
# 1 1987-01-01 1 6 1 6
# 2 1987-01-02 4 4 0 6
# 3 1987-01-03 2 6 4 5
# Male_35-64_card Male_35-64_cere Male_65-74_card Male_65-74_cere
# 1 0 6 0 1
# 2 0 5 0 2
# 3 1 6 0 4
我看不到您的示例输出中有任何聚合,但是如果需要聚合,您可以使用
fun.aggregate
函数在dcast
中实现这一点。您可以在您的问题中包括所需的输出是什么样子吗?我将使用s1进行下注
library(dplyr)
df %.%
group_by(date, sex, age) %.%
summarise(vcvd = sum(cvd),
vacs = sum(ACS))
Source: local data frame [111 x 5]
Groups: date, sex
date sex age vcvd vacs
1 01 Jul 91 female 35-64 0 0
2 01 Jul 91 female 65-74 0 0
3 01 Jul 91 male 35-64 1 1
4 02 Aug 91 female 35-64 0 0
5 02 Jul 91 female 65-74 1 0
6 02 Jul 91 male 65-74 0 0
7 03 Aug 91 female 65-74 0 0
8 03 Jul 91 female 35-64 0 0
9 04 Jul 91 male 35-64 1 0
10 04 Jul 91 male 65-74 0 0
.. ... ... ... ... ...
library(reshape)
dcast(df, bdate ~ sex + age + diag, value.var = "admissions")
# bdate Female_35-64_card Female_35-64_cere Female_65-74_card Female_65-74_cere
# 1 1987-01-01 1 6 1 6
# 2 1987-01-02 4 4 0 6
# 3 1987-01-03 2 6 4 5
# Male_35-64_card Male_35-64_cere Male_65-74_card Male_65-74_cere
# 1 0 6 0 1
# 2 0 5 0 2
# 3 1 6 0 4