R 计算按用户ID分组的多个变量之间的增量
如何计算“长”数据帧中按用户ID分组的多个变量之间的增量 数据格式:R 计算按用户ID分组的多个变量之间的增量,r,dplyr,tidyr,R,Dplyr,Tidyr,如何计算“长”数据帧中按用户ID分组的多个变量之间的增量 数据格式: d1 <- data.frame( id = rep(c(1, 2, 3, 4, 5), each = 2), purchased = c(rep(c(T, F), 3), F, T, T, F), product = rep(c("A", "B"), 5), grade = c(1, 2, 1, 2, 2, 3, 7, 5, 1, 2), rate = c(10, 12, 1
d1 <- data.frame(
id = rep(c(1, 2, 3, 4, 5), each = 2),
purchased = c(rep(c(T, F), 3), F, T, T, F),
product = rep(c("A", "B"), 5),
grade = c(1, 2, 1, 2, 2, 3, 7, 5, 1, 2),
rate = c(10, 12, 10, 12, 12, 14, 22, 18, 10, 12),
fee = rep(c(1, 2), 5))
d1我们可以通过收集/传播
来实现这一点。使用collect
将数据从“宽”改为“长”,按“id”、“Var”分组,我们根据逻辑列“purchased”得到“产品”,得到“产品”的“Val”差值,即“B”和“A”,并spread
将其从“长”改为“宽”格式
library(dplyr)
library(tidyr)
gather(d1, Var, Val, grade:fee) %>%
group_by(id, Var) %>%
summarise(purchased = product[purchased],
Val = Val[product == 'B'] - Val[product == 'A'])%>%
spread(Var, Val)
# id purchased fee grade rate
# <dbl> <fctr> <dbl> <dbl> <dbl>
#1 1 A 1 1 2
#2 2 A 1 1 2
#3 3 A 1 1 2
#4 4 B 1 -2 -4
#5 5 A 1 1 2
library(dplyr)
library(tidyr)
gather(d1, Var, Val, grade:fee) %>%
group_by(id, Var) %>%
summarise(purchased = product[purchased],
Val = Val[product == 'B'] - Val[product == 'A'])%>%
spread(Var, Val)
# id purchased fee grade rate
# <dbl> <fctr> <dbl> <dbl> <dbl>
#1 1 A 1 1 2
#2 2 A 1 1 2
#3 3 A 1 1 2
#4 4 B 1 -2 -4
#5 5 A 1 1 2
d3
# id purchased dGrade dRate dFee
#1 1 A 1 2 1
#2 2 A 1 2 1
#3 3 A 1 2 1
#4 4 B -2 -4 1
#5 5 A 1 2 1