如何使用group_by语句在R中执行行除法?
我有以下数据框如何使用group_by语句在R中执行行除法?,r,dataframe,dplyr,R,Dataframe,Dplyr,我有以下数据框 Year Category TotalSales AverageCount 1 2013 Beverages 102074.29 22190.06 2 2013 Condiments 55277.56 14173.73 3 2013 Confections 36415.75 12138.58 4 2013 Dairy Products 30337.39 24400.
Year Category TotalSales AverageCount
1 2013 Beverages 102074.29 22190.06
2 2013 Condiments 55277.56 14173.73
3 2013 Confections 36415.75 12138.58
4 2013 Dairy Products 30337.39 24400.00
5 2013 Seafood 53019.98 27905.25
6 2014 Beverages 81338.06 35400.00
7 2014 Condiments 55948.82 19981.72
8 2014 Confections 44478.36 24710.00
9 2014 Dairy Products 84412.36 32466.00
10 2014 Seafood 65544.19 14565.37
我计算了TotalSales的累计金额,按照以下方法按年份分组
dat <-within(dat, {
RunningTotal <- ave(dat$TotalSales, dat$Year, FUN = cumsum)
})
如何计算行RunningTotal
中元素的分组比率(RunningTotal[I+1]和RunningTotal[I]
之间的比率)
我试过使用dplyr中的mutate
require(dplyr)
dat<-mutate(dat, Ratio = lag(RunningTotal)/RunningTotal)
如何获得如下所示的所需输出
Year Category TotalSales AverageCount RunningTotal Ratio
2013 Beverages 102074.29 22190.06 102074.29 1.5415424393
2013 Condiments 55277.56 14173.73 157351.85 1.2314288011
2013 Confections 36415.75 12138.58 193767.6 1.1565658552
2013 Dairy Products 30337.39 24400 224104.99 1.2365854504
2013 Seafood 53019.98 27905.25 277124.97 0.2935067887
2014 Beverages 81338.06 35400 81338.06 1.6878553533
2014 Condiments 55948.82 19981.72 137286.88 1.3239811408
2014 Confections 44478.36 24710 181765.24 1.4644032049
2014 Dairy Products 84412.36 32466 266177.6 1.2462423209
2014 Seafood 65544.19 14565.37 331721.79 0
样本数据:
dat <- structure(list(Year = c(2013L, 2013L, 2013L, 2013L, 2013L, 2014L,
2014L, 2014L, 2014L, 2014L), Category = structure(c(1L, 2L, 3L,
4L, 5L, 1L, 2L, 3L, 4L, 5L), .Label = c("Beverages", "Condiments",
"Confections", "Dairy Products", "Seafood"), class = "factor"),
TotalSales = c(102074.29, 55277.56, 36415.75, 30337.39, 53019.98,
81338.06, 55948.82, 44478.36, 84412.36, 65544.19), AverageCount = c(22190.06,
14173.73, 12138.58, 24400, 27905.25, 35400, 19981.72, 24710,
32466, 14565.37)), .Names = c("Year", "Category", "TotalSales",
"AverageCount"), class = "data.frame", row.names = c(NA, -10L
)
dat执行第一次操作的dplyr
方法是:
dat <- dat %>%
group_by(Year) %>%
mutate(RunningTotal = cumsum(TotalSales)) %>%
ungroup
虽然我很想把最后一个值NA
,而不是0
。2013年海鲜的比例(0.2935067887
)也没有任何意义。要消除这种情况,您不希望执行解组。比如说:
dat %>%
group_by(Year) %>%
mutate(
RunningTotal = cumsum(TotalSales),
Ratio = c(RunningTotal[-1] / RunningTotal[-n()], NA)
)
你得到的结果正好相反。换句话说,只需将最后一行倒转为mutate(dat,Ratio=RunningTotal/lag(RunningTotal))
好吧,我得到了介于两者之间的NAs
<代码>dat$比率
给出NA 1.541542 1.231429 1.156566 1.236585 NA 1.687855 1.323981 1 1.464403 1.246242
。我如何避免这种情况?如果我写了一个函数来除以两个数,请告诉我如何使用R的聚合函数适当地传递它。提前感谢。您应该得到一个NA
,因为您使用的是lag
,但是在您的数据上,我只得到了一次,并且我得到了您所需输出中的所有值。嗯。。如果我编写了一个名为divide(x,y)
的函数,那么如何使用in()
函数调用它?我得到一个错误,说没有找到模式“功能”的对象“乐趣”
,或者只是稍微修改了OPs代码Ratio=c((RunningTotal/lag(RunningTotal))[-1L],NA)
@Richie-yeah。管道内衬!在很多方面都有帮助!干杯,先生@戴维登堡很好,也很有效。非常感谢,先生!
dat <- dat %>%
group_by(Year) %>%
mutate(RunningTotal = cumsum(TotalSales)) %>%
ungroup
dat %>%
mutate(Ratio = c(RunningTotal[-1] / RunningTotal[-n()], 0))
dat %>%
group_by(Year) %>%
mutate(
RunningTotal = cumsum(TotalSales),
Ratio = c(RunningTotal[-1] / RunningTotal[-n()], NA)
)