R data.table中的一个复杂求和,它涉及到查看其他列
我有一个数据表,其中变量R data.table中的一个复杂求和,它涉及到查看其他列,r,data.table,grouping,R,Data.table,Grouping,我有一个数据表,其中变量v1和v2的每个值都有一个关联的“类型”,编码在一个单独的列中。这里是一个MWE: X <- data.table(id = 1:5, group = c(1,1,2,2,2), v1 = c(10,12,14,16,18), type_v1 = c("t1","t2","t1","t1","t2"), v2 = c(3,NA,NA,7,8), type_v2 = c
v1
和v2
的每个值都有一个关联的“类型”,编码在一个单独的列中。这里是一个MWE:
X <- data.table(id = 1:5, group = c(1,1,2,2,2), v1 = c(10,12,14,16,18), type_v1 = c("t1","t2","t1","t1","t2"), v2 = c(3,NA,NA,7,8), type_v2 = c("t2", "", "", "t3","t3"))
print(X)
id group v1 type_v1 v2 type_v2
1: 1 1 10 t1 3 t2
2: 2 1 12 t2 NA
3: 3 2 14 t1 NA
4: 4 2 16 t1 7 t3
5: 5 2 18 t2 8 t3
有很多不同的“类型”,并不是所有的类型都出现在所有的群体中。我可能需要创建变量v3
,v4
,等等(请注意,在我的示例中,一个额外的列是如何容纳组2中的t1、t2和t3的)
我的数据当前为长格式。如果可能的话,我不想把它改成宽幅格式。我对不涉及创建列“t1”、“t2”等的解决方案感兴趣。这是因为“t1”、“t2”和“t3”实际上是很长的字符串
编辑:输入所需的输出您可以
melt
将数据转换为长格式
library(data.table)
X1 <-
melt(
X,
id.vars = "group",
# we melt multiple value vars simultaneously,
# those starting with "v" and those starting
# with "type_v" followed by 1 or more digit
measure.vars = patterns(c("^v\\d+$", "^type_v\\d+$")),
value.name = c("value", "type")
)
X1
# group variable value type
# 1: 1 1 10 t1
# 2: 1 1 12 t2
# 3: 2 1 14 t1
# 4: 2 1 16 t1
# 5: 2 1 18 t2
# 6: 1 2 3 t2
# 7: 1 2 NA
# 8: 2 2 NA
# 9: 2 2 7 t3
#10: 2 2 8 t3
library(data.table)
X1 <-
melt(
X,
id.vars = "group",
# we melt multiple value vars simultaneously,
# those starting with "v" and those starting
# with "type_v" followed by 1 or more digit
measure.vars = patterns(c("^v\\d+$", "^type_v\\d+$")),
value.name = c("value", "type")
)
X1
# group variable value type
# 1: 1 1 10 t1
# 2: 1 1 12 t2
# 3: 2 1 14 t1
# 4: 2 1 16 t1
# 5: 2 1 18 t2
# 6: 1 2 3 t2
# 7: 1 2 NA
# 8: 2 2 NA
# 9: 2 2 7 t3
#10: 2 2 8 t3
tmp <- X1[type!="", .("v" = sum(value)), by=.(group, type)]
tmp
# group type v
#1: 1 t1 10
#2: 1 t2 15
#3: 2 t1 30
#4: 2 t2 18
#5: 2 t3 15
out <- dcast(tmp, group ~ rowid(group), value.var = c("v", "type"))
out
# group v_1 v_2 v_3 type_1 type_2 type_3
#1: 1 10 15 NA t1 t2 <NA>
#2: 2 30 18 15 t1 t2 t3
tmp2 <- setdiff(names(out), "group")
# create a vector based on the order of the numeric part of 'tmp2'
idx <- order(as.numeric(gsub("\\D", "", tmp2)))
setcolorder(out, c("group", tmp2[idx]))
out
# group v_1 type_1 v_2 type_2 v_3 type_3
#1: 1 10 t1 15 t2 NA <NA>
#2: 2 30 t1 18 t2 15 t3