R-按组名求和并重新计算列
我使用R从Google Analytics API获取一些数据。在这个特定的场景中,我获得了按性别和年龄组划分的用户的亲和力兴趣信息。我得到的数据结构与以下类似:R-按组名求和并重新计算列,r,data.table,R,Data.table,我使用R从Google Analytics API获取一些数据。在这个特定的场景中,我获得了按性别和年龄组划分的用户的亲和力兴趣信息。我得到的数据结构与以下类似: gender ageGroup interest sessions male 18-24 Autos 4 male 18-24 Autos/Luxury 1 male 18-24 Autos/Vans 1 male 25-34 Autos
gender ageGroup interest sessions
male 18-24 Autos 4
male 18-24 Autos/Luxury 1
male 18-24 Autos/Vans 1
male 25-34 Autos 8
male 25-34 Autos/Luxury 2
male 25-34 Autos/Vans 2
male 25-34 Autos/Compacts 1
...
female 65+ Fashion 20
然而,这种结构的问题是,作为主要兴趣的汽车也包含子类别的会话,如果我在透视表中使用这些数据,我将得到错误的信息
因此,我将子类别“通才”添加到每个主要类别中,作为其自己的子类别,并将此列一分为二:
for (i2 in 1:nrow(ga.genderAgeAffinityTable) ) {
# main categories <- chrFound = integer(0)
chrFound <- grep("[/]", ga.genderAgeAffinityTable$interest[i2] )
if (length(chrFound) < 1) {
ga.genderAgeAffinityTable$interest[i2] <-
sprintf("%s/Generalists", ga.genderAgeAffinityTable$interest[i2])
}
ga.genderAgeAffinityTable <- as.data.frame
(cSplit(ga.genderAgeAffinityTable, "interest", sep = "/"))
}
View(ga.genderAgeAffinityTable)
gender ageGroup interest subcategory sessions
male 18-24 Autos Generalists 4
male 18-24 Autos Luxury 1
male 18-24 Autos Vans 1
male 25-34 Autos Generalists 8
male 25-34 Autos Luxury 2
male 25-34 Autos Vans 2
male 25-34 Autos Compacts 1
...
female 65+ Fashion Generalists 20
for(i2 in 1:nrow(ga.genderAgeAffinityTable)){
#主要类别您看过data.table
包了吗?它具有惊人的汇总功能,可能会对您有所帮助
e、 g
解决方案
分阶段构建,以帮助解释和展示它的强大。这将得到除多面手之外的所有东西的总和
notgensum <- DT[subcategory != "Generalists", mysum := sum(sessions),
by = .(gender, ageGroup, interest)]
gender ageGroup interest subcategory sessions mysum
1: male 18-24 Autos Generalists 4 NA
2: male 18-24 Autos Luxury 1 2
3: male 18-24 Autos Vans 1 2
4: male 25-34 Autos Generalists 8 NA
5: male 25-34 Autos Luxury 2 5
6: male 25-34 Autos Vans 2 5
7: male 25-34 Autos Compacts 1 5
最后,如果您想摆脱mysum临时列,语法是
genadjsum3[, mysum := NULL]
你会喜欢没有循环!我想这里的很多人都会乐意帮助你,但你的例子并不完全是一个最小可重复性的例子
library(data.table)
DT <- data.table(gender = c("male", "male", "male", "male", "male","male", "male"),
ageGroup = c("18-24", "18-24", "18-24", "25-34","25-34", "25-34", "25-34"),
interest = c("Autos", "Autos", "Autos","Autos", "Autos", "Autos", "Autos"),
subcategory = c("Generalists","Luxury", "Vans", "Generalists", "Luxury", "Vans", "Compacts"),
sessions = c(4L, 1L, 1L, 8L, 2L, 2L, 1L) )
notgensum <- DT[subcategory != "Generalists", mysum := sum(sessions),
by = .(gender, ageGroup, interest)]
gender ageGroup interest subcategory sessions mysum
1: male 18-24 Autos Generalists 4 NA
2: male 18-24 Autos Luxury 1 2
3: male 18-24 Autos Vans 1 2
4: male 25-34 Autos Generalists 8 NA
5: male 25-34 Autos Luxury 2 5
6: male 25-34 Autos Vans 2 5
7: male 25-34 Autos Compacts 1 5
genadjsum2 <- notgensum[, myadjsessions := (sessions - mean(mysum, na.rm = T)),
by = .(gender, ageGroup, interest)]
# gender ageGroup interest subcategory sessions mysum myadjsessions
#1: male 18-24 Autos Generalists 4 NA 2
#2: male 18-24 Autos Luxury 1 2 -1
#3: male 18-24 Autos Vans 1 2 -1
#4: male 25-34 Autos Generalists 8 NA 3
#5: male 25-34 Autos Luxury 2 5 -3
#6: male 25-34 Autos Vans 2 5 -3
#7: male 25-34 Autos Compacts 1 5 -4
genadjsum3 <- notgensum[,
myadjsessions := (sessions - mean(mysum, na.rm = T)),
by = .(gender, ageGroup, interest)][subcategory == "Generalists"]
# gender ageGroup interest subcategory sessions mysum myadjsessions
#1: male 18-24 Autos Generalists 4 NA 2
#2: male 25-34 Autos Generalists 8 NA 3
genadjsum3[, mysum := NULL]