总结使用dplyr给出错误结果
我有以下数据集:总结使用dplyr给出错误结果,r,group-by,dplyr,R,Group By,Dplyr,我有以下数据集: structure(list(id = c(2004938L, 2107410L, 2119255L, 2129457L, 2141169L, 2172051L), date = structure(c(17725, 17732, 17733, 17734, 17734, 17736), class = "Date"), hour = c(20, 22, 18, 12, 21, 22), store_name = c("Www Cigarsindia In India S
structure(list(id = c(2004938L, 2107410L, 2119255L, 2129457L,
2141169L, 2172051L), date = structure(c(17725, 17732, 17733,
17734, 17734, 17736), class = "Date"), hour = c(20, 22, 18, 12,
21, 22), store_name = c("Www Cigarsindia In India S Largest And Trusted Online Cigar Store Since 1998",
"Www Cigarsindia In India S Largest And Trusted Online Cigar Store Since 1998",
"Www Cigarsindia In India S Largest And Trusted Online Cigar Store Since 1998",
"Www Cigarsindia In India S Largest And Trusted Online Cigar Store Since 1998",
"Www Cigarsindia In India S Largest And Trusted Online Cigar Store Since 1998",
"Www Cigarsindia In India S Largest And Trusted Online Cigar Store Since 1998"
), area = c("Indiranagar, EGL", "Indiranagar, EGL", "Indiranagar, EGL",
"Indiranagar, EGL", "Indiranagar, EGL", "Indiranagar, EGL"),
amount = c(900, 2400, 2700, 380, 150, 100)), row.names = c(6264L,
10841L, 11355L, 11892L, 12348L, 13570L), class = "data.frame")
让我们称之为“e”
我想总结如下:
f = e %>%
dplyr::group_by(date, store_name, area) %>%
dplyr::summarize(amount = sum(amount, na.rm = TRUE), amount_after_8 = sum(amount[hour >= 20], na.rm = TRUE))
这使得输出“f”为:
现在这个输出是错误的,因为“e”中的第5行包含一个150的量值,该值也满足hour>=20的条件,但它在输出数据集“f”中显示为0
我在这里做错了什么?以下方法可行:
e %>%
dplyr::group_by(date, store_name, area) %>%
dplyr::summarize(
amount_after_8 = sum(amount[hour >= 20], na.rm = TRUE), amount = sum(amount, na.rm = TRUE)
)
问题是,
summary
按顺序工作,因此当它到达amount\u之后的8
时,amount\u已经是一个摘要输出。在计算amount\u之后的8
之前,您修改了amount
,请尝试不同的输出名称。我现在正忙着呢!成功了!非常感谢!
e %>%
dplyr::group_by(date, store_name, area) %>%
dplyr::summarize(
amount_after_8 = sum(amount[hour >= 20], na.rm = TRUE), amount = sum(amount, na.rm = TRUE)
)