按月和除以R中的组进行聚合
这是我的数据按月和除以R中的组进行聚合,r,dplyr,data.table,lubridate,R,Dplyr,Data.table,Lubridate,这是我的数据 mydata=structure(list(doc_date = structure(c(7L, 9L, 4L, 10L, 2L, 5L, 8L, 1L, 3L, 6L), .Label = c("01.06.2018", "06.04.2018", "08.07.2018", "14.03.2018", "20.04.2018", "21.09.2018", "24.01.2018", "25.05.2018", "28.02.2018", "28.03.2018"), c
mydata=structure(list(doc_date = structure(c(7L, 9L, 4L, 10L, 2L, 5L,
8L, 1L, 3L, 6L), .Label = c("01.06.2018", "06.04.2018", "08.07.2018",
"14.03.2018", "20.04.2018", "21.09.2018", "24.01.2018", "25.05.2018",
"28.02.2018", "28.03.2018"), class = "factor"), shop_id = c(67885L,
67885L, 67885L, 67885L, 67885L, 67885L, 67885L, 67885L, 67885L,
67885L), shop_code = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L), .Label = "02293НСК", class = "factor"), product_id = c(11622L,
11622L, 11622L, 11622L, 11622L, 11622L, 11622L, 11622L, 11622L,
11622L), product_group_id = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L), city_id = c(9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L),
fin_centre_id = c(15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L,
15L, 15L), return_count = c(2L, 3L, 1L, 1L, 1L, 1L, 3L, 1L,
3L, 2L)), .Names = c("doc_date", "shop_id", "shop_code",
"product_id", "product_group_id", "city_id", "fin_centre_id",
"return_count"), class = "data.frame", row.names = c(NA, -10L
))
我如何为组店铺代码+产品id聚合返回列计数
对于每个月,按轴格式求和
即产出
jan feb march apr may jun jul aug sept oct nov dec
1 2 3 2 2 3 1 3 0 2 0 0 0
这个话题并不复杂
因为我需要pivot格式
> require(tidyverse)
> mydata$months <- months(dmy(mydata$doc_date))
> my <- mydata %>% group_by( months) %>% summarise(re_count = sum(return_count,na.rm = T))
> my
# A tibble: 8 x 2
months re_count
<chr> <int>
1 April 2
2 Februar 3
3 Januar 2
4 Juli 3
5 Juni 1
6 Mai 3
7 März 2
8 September 2
>
>mydata$months我的%group\u by(months)%%>%Summary(re\u count=sum(return\u count,na.rm=T))
>我的
#一个tibble:8x2
月复计数
4月1日至2日
2月3日
1月3日至2日
4朱利3
5 Juni 1
6月3日
7马尔兹2
9月8日2
>
这将是我使用
tidyverse
方法提出的解决方案。(对不起,不管什么原因,我的月份都是德语的)。以下是数据。表方法:
编辑以包括结果中计数为0的月份
library(data.table)
library(lubridate)
setDT(mydata)
# First make a variable storing the month
mydata[, month := lubridate::month(as.Date(doc_date, format = "%d.%m.%y"), label = TRUE)]
# Then sum return_count by the product id, group id and month. Keep only rows that are unique by month
mydata <- unique(mydata[, sum := sum(return_count), by = .(product_id, product_group_id, month), ], by = "month")
# Now we need to make sure any months with 0 counts are included
all_months <- data.table(month = lubridate::month(1:12, label = TRUE) )
mydata <- merge(mydata[, .(month, sum)], all_months, by = "month", all.y = TRUE)
mydata[is.na(sum), sum := 0]
## output
month sum
1: Jan 2
2: Feb 3
3: Mar 2
4: Apr 2
5: May 3
6: Jun 1
7: Jul 3
8: Aug 0
9: Sep 2
10: Oct 0
11: Nov 0
12: Dec 0
库(data.table)
图书馆(lubridate)
setDT(mydata)
#首先创建一个存储月份的变量
mydata[,month:=lubridate::month(as.Date(doc_Date,format=“%d.%m.%y”),label=TRUE)]
#然后按产品id、组id和月份对返回计数求和。仅保留按月唯一的行
mydata问题是同一个月的日期不同(这也是一个因素),因此首先我们将在月份级别进行总结,然后我们可以重点讨论。试试这个:
mydata$new_date <- dmy(mydata$doc_date) # convert to date format)
mydata$month <- month(mydata$new_date) # extract month from date
mydata <- mydata %>% group_by(shop_code,product_id,month) %>% summarise(return_count= sum(return_count)) # group at your required level
mydata_1 <- dcast(setDT(mydata), shop_code + product_id ~ month , fun.aggregate = sum, value.var = c("return_count")) # Pivot up using dcast
mydata$new\u date您尝试了什么?可能是重复的?您能澄清一下“pivot格式”是什么意思吗?我的月份也是德语的。您可以尝试Sys.setlocale(“LC\u TIME”,“English”)