R:将一列转换为两列标题,并平均正确的分组数据

R:将一列转换为两列标题,并平均正确的分组数据,r,R,以下是输入: X1 = c("aaa", "aaa", "aaa", "qqq", "qqq", "qqq") X2 = c("bbb", "bbb", "bbb", "rrr", "rrr", "rrr") X3 = c("ccc", "ccc", "ccc", "ttt", "ttt", "ttt") X4 = c("usa", "can", "usa", "ger", "rus", "ger") X5 = c(400, 888, 500, 300, 456, 500) df <- d

以下是输入:

X1 = c("aaa", "aaa", "aaa", "qqq", "qqq", "qqq")
X2 = c("bbb", "bbb", "bbb", "rrr", "rrr", "rrr")
X3 = c("ccc", "ccc", "ccc", "ttt", "ttt", "ttt")
X4 = c("usa", "can", "usa", "ger", "rus", "ger")
X5 = c(400, 888, 500, 300, 456, 500)
df <- data.frame(X1,X2,X3,X4,X5)
我尝试了双重聚合和分组,希望避免for循环,但仍然无法管理它。

我不知道为什么您希望像您这样输出。我会按国家分组

df$averages = ave(df[,"X5"], df[c("X1", "X2", "X3", "X4")], FUN = mean)
aggregate(averages~., df[c("averages", "X1", "X2", "X3")], range)
#   X1  X2  X3 averages.1 averages.2
#1 aaa bbb ccc        450        888
#2 qqq rrr ttt        400        456
library(dplyr)

> df %>% group_by(X4, X1, X2, X3) %>% summarise(i = sum(X5))
# A tibble: 4 x 5
# Groups:   X4, X1, X2 [?]
      X4     X1     X2     X3     i
  <fctr> <fctr> <fctr> <fctr> <dbl>
1    can    aaa    bbb    ccc   888
2    ger    qqq    rrr    ttt   800
3    rus    qqq    rrr    ttt   456
4    usa    aaa    bbb    ccc   900
库(dplyr)
>df%>%分组依据(X4,X1,X2,X3)%>%总结(i=sum(X5))
#一个tibble:4x5
#分组:X4、X1、X2[?]
X4 X1 X2 X3 i
1罐aaa bbb ccc 888
2 ger qqq rrr ttt 800
3 rus qqq rrr ttt 456
4美国aaa bbb ccc 900
我不太明白您为什么要这样输出。我会按国家分组

library(dplyr)

> df %>% group_by(X4, X1, X2, X3) %>% summarise(i = sum(X5))
# A tibble: 4 x 5
# Groups:   X4, X1, X2 [?]
      X4     X1     X2     X3     i
  <fctr> <fctr> <fctr> <fctr> <dbl>
1    can    aaa    bbb    ccc   888
2    ger    qqq    rrr    ttt   800
3    rus    qqq    rrr    ttt   456
4    usa    aaa    bbb    ccc   900
库(dplyr)
>df%>%分组依据(X4,X1,X2,X3)%>%总结(i=sum(X5))
#一个tibble:4x5
#分组:X4、X1、X2[?]
X4 X1 X2 X3 i
1罐aaa bbb ccc 888
2 ger qqq rrr ttt 800
3 rus qqq rrr ttt 456
4美国aaa bbb ccc 900

您有4个唯一的国家/地区,其中有2行的列名为非信息列名….:-)我和j将如何帮助你?您如何知道哪个值对应于哪个国家?您将usa和ger分组,但将can和rus粘贴在分组的其他变量旁边的新列中?您的输出没有意义。您有4个唯一的国家/地区,并且您得到了两行非信息性列名….:-)我和j将如何帮助你?您如何知道哪个值对应于哪个国家?您将usa和ger分组,但将can和rus粘贴在分组的其他变量旁边的新列中?您的输出没有意义。