R-按组名求和并重新计算列

R-按组名求和并重新计算列,r,data.table,R,Data.table,我使用R从Google Analytics API获取一些数据。在这个特定的场景中,我获得了按性别和年龄组划分的用户的亲和力兴趣信息。我得到的数据结构与以下类似: gender ageGroup interest sessions male 18-24 Autos 4 male 18-24 Autos/Luxury 1 male 18-24 Autos/Vans 1 male 25-34 Autos

我使用R从Google Analytics API获取一些数据。在这个特定的场景中,我获得了按性别和年龄组划分的用户的亲和力兴趣信息。我得到的数据结构与以下类似:

gender ageGroup interest        sessions
male   18-24    Autos           4
male   18-24    Autos/Luxury    1
male   18-24    Autos/Vans      1
male   25-34    Autos           8
male   25-34    Autos/Luxury    2
male   25-34    Autos/Vans      2
male   25-34    Autos/Compacts  1
...
female 65+      Fashion         20
然而,这种结构的问题是,作为主要兴趣的汽车也包含子类别的会话,如果我在透视表中使用这些数据,我将得到错误的信息

因此,我将子类别“通才”添加到每个主要类别中,作为其自己的子类别,并将此列一分为二:

for (i2 in 1:nrow(ga.genderAgeAffinityTable) ) {

# main categories <- chrFound = integer(0)            
chrFound <- grep("[/]", ga.genderAgeAffinityTable$interest[i2] )

if (length(chrFound) < 1) {
ga.genderAgeAffinityTable$interest[i2] <- 
sprintf("%s/Generalists", ga.genderAgeAffinityTable$interest[i2])
}

ga.genderAgeAffinityTable <- as.data.frame
(cSplit(ga.genderAgeAffinityTable, "interest", sep = "/"))

}

View(ga.genderAgeAffinityTable)

            gender ageGroup interest        subcategory        sessions
            male   18-24    Autos           Generalists        4
            male   18-24    Autos           Luxury             1
            male   18-24    Autos           Vans               1
            male   25-34    Autos           Generalists        8
            male   25-34    Autos           Luxury             2
            male   25-34    Autos           Vans               2
            male   25-34    Autos           Compacts           1
            ...
            female 65+      Fashion         Generalists        20
for(i2 in 1:nrow(ga.genderAgeAffinityTable)){

#主要类别您看过
data.table
包了吗?它具有惊人的汇总功能,可能会对您有所帮助

e、 g

解决方案 分阶段构建,以帮助解释和展示它的强大。这将得到除多面手之外的所有东西的总和

notgensum <- DT[subcategory  != "Generalists", mysum := sum(sessions),
                by = .(gender, ageGroup, interest)]

    gender ageGroup interest subcategory sessions mysum
1:   male    18-24    Autos Generalists        4    NA
2:   male    18-24    Autos      Luxury        1     2
3:   male    18-24    Autos        Vans        1     2
4:   male    25-34    Autos Generalists        8    NA
5:   male    25-34    Autos      Luxury        2     5
6:   male    25-34    Autos        Vans        2     5
7:   male    25-34    Autos    Compacts        1     5
最后,如果您想摆脱mysum临时列,语法是

genadjsum3[, mysum := NULL]

你会喜欢没有循环!

我想这里的很多人都会乐意帮助你,但你的例子并不完全是一个
最小可重复性的例子
library(data.table)
DT <- data.table(gender = c("male", "male", "male", "male", "male","male", "male"), 
ageGroup = c("18-24", "18-24", "18-24", "25-34","25-34", "25-34", "25-34"),
interest = c("Autos", "Autos", "Autos","Autos", "Autos", "Autos", "Autos"),
subcategory = c("Generalists","Luxury", "Vans", "Generalists", "Luxury", "Vans", "Compacts"), 
sessions = c(4L, 1L, 1L, 8L, 2L, 2L, 1L) )
notgensum <- DT[subcategory  != "Generalists", mysum := sum(sessions),
                by = .(gender, ageGroup, interest)]

    gender ageGroup interest subcategory sessions mysum
1:   male    18-24    Autos Generalists        4    NA
2:   male    18-24    Autos      Luxury        1     2
3:   male    18-24    Autos        Vans        1     2
4:   male    25-34    Autos Generalists        8    NA
5:   male    25-34    Autos      Luxury        2     5
6:   male    25-34    Autos        Vans        2     5
7:   male    25-34    Autos    Compacts        1     5
genadjsum2 <- notgensum[, myadjsessions := (sessions - mean(mysum, na.rm = T)),
                        by = .(gender, ageGroup, interest)]

#   gender ageGroup interest subcategory sessions mysum myadjsessions   
#1:   male    18-24    Autos Generalists        4    NA             2
#2:   male    18-24    Autos      Luxury        1     2            -1
#3:   male    18-24    Autos        Vans        1     2            -1
#4:   male    25-34    Autos Generalists        8    NA             3
#5:   male    25-34    Autos      Luxury        2     5            -3
#6:   male    25-34    Autos        Vans        2     5            -3
#7:   male    25-34    Autos    Compacts        1     5            -4
genadjsum3 <- notgensum[, 
             myadjsessions := (sessions - mean(mysum, na.rm = T)),
             by = .(gender, ageGroup, interest)][subcategory  == "Generalists"]

#  gender ageGroup interest subcategory sessions mysum myadjsessions
#1:   male    18-24    Autos Generalists        4    NA             2
#2:   male    25-34    Autos Generalists        8    NA             3
genadjsum3[, mysum := NULL]