R:在使用data.table时处理时间不匹配
嘿,我正在尝试从使用ddply过渡到使用data.table,我很快就明白了,但是我仍然需要做一些轻微的调整。以下是我尝试使用玩具数据集所做工作的摘要: 比如说,我有两种产品数周的销售数据R:在使用data.table时处理时间不匹配,r,data.table,R,Data.table,嘿,我正在尝试从使用ddply过渡到使用data.table,我很快就明白了,但是我仍然需要做一些轻微的调整。以下是我尝试使用玩具数据集所做工作的摘要: 比如说,我有两种产品数周的销售数据 x <- structure(list(week = c(1, 1, 2, 3, 1, 2, 2, 3, 4), product = c("a", "a", "a", "a", "b", "b", "b", "b", "b"), sold = c(10, 15, 20, 25
x <- structure(list(week = c(1, 1, 2, 3, 1, 2, 2, 3, 4), product = c("a",
"a", "a", "a", "b", "b", "b", "b", "b"), sold = c(10, 15, 20,
25, 30, 35, 40, 45, 50)), .Names = c("week", "product", "sold"
), row.names = c(NA, -9L), class = c("data.table", "data.frame"
), sorted = c("product", "week"))
week product sold
1: 1 a 10
2: 1 a 15
3: 2 a 20
4: 3 a 25
5: 1 b 30
6: 2 b 35
7: 2 b 40
8: 3 b 45
9: 4 b 50
xx1
产品周V1
1:125
2:125
3:A20
4:a 3 25
5:B130
6:B275
7:B275
8:B345
9:b 450
问题是我不确定如何删除重复的行,即第2行是多余的。此外,我还想在产品未销售的几周内加入NA,即第4周产品的一行值为NA
我相信这是一个简单的问题,我知道如何在ddply中做到这一点,但我无法通过搜索找到我想要的。如果有人能帮我或者把我链接到正确的页面,如果这是重复的,那就太好了 以下是如何通过独特的产品和所有周交叉加入,然后对每组进行汇总
> x[CJ(unique(product), 1:4), sum(sold), by=.EACHI]
product week V1
1: a 1 25
2: a 2 20
3: a 3 25
4: a 4 NA
5: b 1 30
6: b 2 75
7: b 3 45
8: b 4 50
如果您使用的是data.table version
,重塑形状可能是另一种选择:
require(reshape2); require(data.table)
(dt2 <- dcast.data.table(dt, product ~ week, fun.aggregate = sum, value.var = "sold", fill = NA, drop = FALSE))
# product 1 2 3 4
# 1: a 25 20 25 NA
# 2: b 30 75 45 50
(dt3 <- melt(dt2, id.vars = "product", variable.name = "week", value.name = "sold"))
# product week sold
# 1: a 1 25
# 2: b 1 30
# 3: a 2 20
# 4: b 2 75
# 5: a 3 25
# 6: b 3 45
# 7: a 4 NA
# 8: b 4 50
require(重塑2);要求(数据表)
(dt2完美!这正是我所想的,我不知道有交叉连接。
require(reshape2); require(data.table)
(dt2 <- dcast.data.table(dt, product ~ week, fun.aggregate = sum, value.var = "sold", fill = NA, drop = FALSE))
# product 1 2 3 4
# 1: a 25 20 25 NA
# 2: b 30 75 45 50
(dt3 <- melt(dt2, id.vars = "product", variable.name = "week", value.name = "sold"))
# product week sold
# 1: a 1 25
# 2: b 1 30
# 3: a 2 20
# 4: b 2 75
# 5: a 3 25
# 6: b 3 45
# 7: a 4 NA
# 8: b 4 50