Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/83.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
是否使用dplyr向分组数据添加行?_R_Dataframe_Dplyr - Fatal编程技术网

是否使用dplyr向分组数据添加行?

是否使用dplyr向分组数据添加行?,r,dataframe,dplyr,R,Dataframe,Dplyr,我的数据采用data.frame格式,如以下示例数据: data <- structure(list(Article = structure(c(1L, 1L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L ), .Label = c("10004", "10006", "10007"), class = "factor"), Demand = c(26L, 780L, 2

我的数据采用data.frame格式,如以下示例数据:

data <- 
structure(list(Article = structure(c(1L, 1L, 3L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L
), .Label = c("10004", "10006", "10007"), class = "factor"), 
Demand = c(26L, 780L, 2L, 181L, 228L, 214L, 219L, 291L, 104L, 
72L, 155L, 237L, 182L, 148L, 52L, 227L, 2L, 355L, 2L, 432L, 
1L, 156L), Week = c("2013-W01", "2013-W01", "2013-W01", "2013-W01", 
"2013-W01", "2013-W02", "2013-W02", "2013-W02", "2013-W02", 
"2013-W02", "2013-W03", "2013-W03", "2013-W03", "2013-W03", 
"2013-W03", "2013-W04", "2013-W04", "2013-W04", "2013-W04", 
"2013-W04", "2013-W04", "2013-W04")), .Names = c("Article", 
"Demand", "Week"), class = "data.frame", row.names = c(NA, -22L))
我试过了

WeekSums %>%
  group_by(Article) %>%
  if(n()< 4) rep(rbind(c(Article,NA,NA)), 4 - n() )
WeekSums%>%
按(物品)分组%>%
if(n()<4)rep(rbind(c(Article,NA,NA)),4-n()

但这不起作用。在我最初的方法中,我通过为每篇文章将第1-4周的数据帧与我的rawdata文件合并来解决这个问题。这样,我每篇文章有4周(行),但使用for循环的实现效率非常低,因此我尝试使用dplyr(或任何其他更高效的包/函数)也这样做。任何建议都将不胜感激

没有dplyr,可以这样做:

  Article     Week WeekDemand
1   10004 2013-W01       1215
2   10004 2013-W02        900
3   10004 2013-W03        774
4   10004 2013-W04       1170
5   10006 2013-W01        0
6   10006 2013-W02        0
7   10006 2013-W03        0
8   10006 2013-W04         5
9   10007 2013-W01         2
10   10007 2013-W02        0
11   10007 2013-W03        0
12   10007 2013-W04        0
as.data.frame(xtabs(Demand ~ Week + Article, data))
data %>% xtabs(formula = Demand ~ Week + Article) %>% as.data.frame()
给予:

       Week Article Freq
1  2013-W01   10004 1215
2  2013-W02   10004  900
3  2013-W03   10004  774
4  2013-W04   10004 1170
5  2013-W01   10006    0
6  2013-W02   10006    0
7  2013-W03   10006    0
8  2013-W04   10006    5
9  2013-W01   10007    2
10 2013-W02   10007    0
11 2013-W03   10007    0
12 2013-W04   10007    0
这可以重写为magrittr或dplyr管道,如下所示:

  Article     Week WeekDemand
1   10004 2013-W01       1215
2   10004 2013-W02        900
3   10004 2013-W03        774
4   10004 2013-W04       1170
5   10006 2013-W01        0
6   10006 2013-W02        0
7   10006 2013-W03        0
8   10006 2013-W04         5
9   10007 2013-W01         2
10   10007 2013-W02        0
11   10007 2013-W03        0
12   10007 2013-W04        0
as.data.frame(xtabs(Demand ~ Week + Article, data))
data %>% xtabs(formula = Demand ~ Week + Article) %>% as.data.frame()

如果需要广泛形式的解决方案,可以省略结尾处的
as.data.frame()

我想我会提供一个
dplyr
式的解决方案

  • 使用
    expand.grid()
    生成要查找的成对组合
  • 使用
    left\u join()
    加入需求数据(用NAs填充其余数据)
解决方案:
full\u data对于这种情况,您还可以使用
dcast
melt

   library(dplyr)
   library(reshape2)
   data %>%
      dcast(Article ~ Week, value.var = "Demand", fun.aggregate = sum) %>%
      melt(id = "Article") %>%
      arrange(Article, variable)

由于
dplyr
正在积极开发中,我想我会发布一个更新,其中也包含
tidyr

library(dplyr)
library(tidyr)

data %>%
  expand(Article, Week) %>%
  left_join(data) %>%
  group_by(Article, Week) %>%
  summarise(WeekDemand = sum(Demand, na.rm=TRUE))
产生:

   Article     Week WeekDemand
1    10004 2013-W01       1215
2    10004 2013-W02        900
3    10004 2013-W03        774
4    10004 2013-W04       1170
5    10006 2013-W01          0
6    10006 2013-W02          0
7    10006 2013-W03          0
8    10006 2013-W04          5
9    10007 2013-W01          2
10   10007 2013-W02          0
11   10007 2013-W03          0
12   10007 2013-W04          0

使用tidyr>=0.3.1,现在可以写成:

data %>% 
  complete(Article, Week) %>%  
  group_by(Article, Week) %>% 
  summarise(Demand = sum(Demand, na.rm = TRUE))

xtabs
使用指定的公式创建一个类为
“table”
的对象,其维度为右侧变量,单元格为左侧变量之和,如果单元格为空,则为零
as.data.frame
应用于表格时,会将其重塑为长格式。感谢您演示解决此问题的另一种方法!我必须承认我喜欢
xtabs
解决方案的简单性,但这也会产生期望的结果(+1)
   Article     Week WeekDemand
1    10004 2013-W01       1215
2    10004 2013-W02        900
3    10004 2013-W03        774
4    10004 2013-W04       1170
5    10006 2013-W01          0
6    10006 2013-W02          0
7    10006 2013-W03          0
8    10006 2013-W04          5
9    10007 2013-W01          2
10   10007 2013-W02          0
11   10007 2013-W03          0
12   10007 2013-W04          0
data %>% 
  complete(Article, Week) %>%  
  group_by(Article, Week) %>% 
  summarise(Demand = sum(Demand, na.rm = TRUE))