Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/apache-flex/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用select、group_by和mutate对dplyr组的条件行求和_R_Dplyr - Fatal编程技术网

使用select、group_by和mutate对dplyr组的条件行求和

使用select、group_by和mutate对dplyr组的条件行求和,r,dplyr,R,Dplyr,问题:我正在做一个汽车市场的总市场份额变量,共售出286种不同车型,总共售出501辆汽车。该组份额仅基于汽车特性:cat=紧凑型、中型、大型和yr=77,78,79,80,81,以及份额,一个小的双变量;市场上共有15个集团 我找到的最接近的答案是:mishabalyasin on community.rstudio:使用tidyeval?计算行总数和比例 应用“选择分离式联合收割机”原则是我最接近得到正确答案的方法,即15组15 x 3cat,yr,s: df<- blp %>%

问题:我正在做一个汽车市场的总市场份额变量,共售出286种不同车型,总共售出501辆汽车。该组份额仅基于汽车特性:cat=紧凑型、中型、大型和yr=77,78,79,80,81,以及份额,一个小的双变量;市场上共有15个集团

我找到的最接近的答案是:mishabalyasin on community.rstudio:使用tidyeval?计算行总数和比例

应用“选择分离式联合收割机”原则是我最接近得到正确答案的方法,即15组15 x 3cat,yr,s:

df<- blp %>% 
  select(cat,yr,s) %>%
  group_by(cat,yr) %>% 
  summarise(group_share = sum(s))

#in my actual data, this is what fills by group share to get what I want, but this isn't the desired pipele-based answer
blp$group_share=0 #initializing the group_share, the 50th col
for(i in 1:501){
  for(j in 1:15){
    if((blp[i,31]==df[j,1])&&(blp[i,3]==df[j,2])){ #if(sameCat & sameYr){blpGS=dfGS}
      blp[i,50]=df[j,3]
      }
  }
}
这很好,但我知道这可以一下子完成。。。希望从我上面的描述中可以清楚地看到这个想法。一个简单的修复可能是一个循环,由cat和yr上的条件设置,这会有所帮助,但我真的在尝试更好地与dplyr进行数据争用,因此,沿着这条路线获得管道化答案的任何见解都将非常棒

站点示例:下面的示例不适用于我提供的代码,但这是我数据的外观。份额作为一个因素存在问题

#45 obs, 3 cats, 5 yrs
cat=c( "compact","midsize","large","compact","midsize","large","compact","midsize","large","compact","midsize","large","compact","midsize","large","compact","midsize","large","compact","midsize","large","compact","midsize","large","compact","midsize","large","compact","midsize","large","compact","midsize","large","compact","midsize","large","compact","midsize","large","compact","midsize","large","compact","midsize","large")
yr=c(77,78,79,80,81,77,78,79,80,81,77,78,79,80,81,77,78,79,80,81,77,78,79,80,81,77,78,79,80,81,77,78,79,80,81,77,78,79,80,81,77,78,79,80,81)
s=c(.001,.0005,.002,.0001,.0002,.001,.0005,.002,.0001,.0002,.001,.0005,.002,.0001,.0002,.001,.0005,.002,.0001,.0002,.001,.0005,.002,.0001,.0002,.001,.0005,.002,.0001,.0002,.001,.0005,.002,.0001,.0002,.001,.0005,.002,.0001,.0002,.001,.0005,.002,.0001,.0002)

blp=as.data.frame(cbind(unlist(lapply(cat,as.character,stringsAsFactors=FALSE)),as.numeric(yr),unlist(as.numeric(s))))

names(blp)<-c("cat","yr","s")
head(blp)

#note: one example of a group share would be summing the share from
(group_share.blp.large.81.s=(blp[cat== "large" &yr==81,]))

#works thanks to akrun: applying the code I provided for what leads to the 15 groups 
df <- blp %>% 
    select(cat,yr,s) %>%
    group_by(cat,yr) %>% 
    summarise(group_share = sum(as.numeric(as.character(s)))) 
#manually filling doesn't work, but this is what I'd want if I didn't want pipelining
blp$group_share=0
for(i in 1:45){
        if( ((blp[i,1])==(df[j,1])) && (as.numeric(blp[i,2])==as.numeric(df[j,2]))){ #if(sameCat & sameYr){blpGS=dfGS}
          blp[i,4]=df[j,3];
    }
  }


如果我正确地理解了你的问题,这将非常有帮助! 这里唯一的区别是,与使用summary不同,summary将自动生成分组列和汇总列,您可以使用mutate保留原始列并向其中添加聚合列

# Sample input
## 45 obs, 3 cats, 5 yrs
cat <- c( "compact","midsize","large","compact","midsize","large","compact","midsize","large","compact","midsize","large","compact","midsize","large","compact","midsize","large","compact","midsize","large","compact","midsize","large","compact","midsize","large","compact","midsize","large","compact","midsize","large","compact","midsize","large","compact","midsize","large","compact","midsize","large","compact","midsize","large")

yr <- c(77,78,79,80,81,77,78,79,80,81,77,78,79,80,81,77,78,79,80,81,77,78,79,80,81,77,78,79,80,81,77,78,79,80,81,77,78,79,80,81,77,78,79,80,81)

s <- c(.001,.0005,.002,.0001,.0002,.001,.0005,.002,.0001,.0002,.001,.0005,.002,.0001,.0002,.001,.0005,.002,.0001,.0002,.001,.0005,.002,.0001,.0002,.001,.0005,.002,.0001,.0002,.001,.0005,.002,.0001,.0002,.001,.0005,.002,.0001,.0002,.001,.0005,.002,.0001,.0002)

# Calculation
blp <- 
  data.frame(cat, yr, s, stringsAsFactors = FALSE) %>% # To create dataframe
  group_by(cat, yr) %>% # Grouping by category and year
  mutate(group_share = sum(s, na.rm = TRUE)) %>% # Calculating sum share per category/year 
  ungroup()
预期产量

@akrun I删除了s=c…末尾多余的“,”,在最后一段代码中,您正在创建一个以data.frame作为输出的列。此外,它是一个因子列SummasGroup_share=sumas.numericas。characters@akrun,它复制了15个组的摘要代码。更好的方法是更有效地进行管道传输,因此我使用适当的组和创建了group_share变量,您对此有何想法?我的for循环将在我的实际数据集上实现,我将在这里输入循环的样子,例如,对于blp数据共享,您的预期输出是什么?共有15个组;其中的一个子样本是5个大集团的份额,这将是大的,77=.003,大的,78=.0015,大的,79=.006,大的,80=.0003,大的,81=.0006太好了,谢谢。考虑到我真正的问题是要提出一个更大的问题,这就完成了工作,这样我就可以在那里注入select来完美地工作。谢谢