r聚合多因素变量_R - Fatal编程技术网

r聚合多因素变量

r聚合多因素变量,r,R,我有这样一个数据框： data.frame(home=c("A","B","C","A","C"),weight=c(0.1,0.25,0.36,0.14,0.2),region=c("north","south","east","North","south")) Home Weigth region A 0.1 North B 0.25 South C 0.36 East A 0.14 Nort

我有这样一个数据框：

    data.frame(home=c("A","B","C","A","C"),weight=c(0.1,0.25,0.36,0.14,0.2),region=c("north","south","east","North","south"))

Home Weigth  region  
A     0.1     North      
B     0.25    South    
C     0.36    East   
A     0.14    North
C     0.2     South

我想要一个聚合我的data.frame，下面是两个因子变量，然后对第三个进行求和。结果将是：

    data.frame(home=c("A","B","C"),north=c(0.24,0,0),south=c(0,0.25,0.2),east=c(0.36,0,0))

Home North  South  East
A     0.24   0      0
B     0      0.25    0
C     0      0.2    0.36

我正在尝试使用一个快速简单的函数，如聚合或其他，但可能唯一的解决方案是使用我想要的手动创建data.frame。基本上有两个步骤：（1）聚合总和；（2）将结果转换为双向表

library(dplyr)
df <-  data.frame(home=c("A","B","C","A","C"),weight=c(0.1,0.25,0.36,0.14,0.2),region=c("north","south","east","North","south"))
df$region <- Hmisc::capitalize(as.character(df$region))

df_sum <- df %>% group_by(home, region) %>% summarize(weight_sum = sum(weight, na.rm=TRUE))

reshape2::dcast(df_sum, home ~ region, function(V) sum(V, na.rm=TRUE))

库（dplyr）
df我想这样就可以了，其中h01是您想要的结果
x00<-data.frame(home=c("A","B","C","A","C"),weight=c(0.1,0.25,0.36,0.14,0.2),
                region=c("north","South","East","North","South"),stringsAsFactors = F)
x00$region<-tolower(x00$region)
x01<-ddply(x00,.(region,home),summarize,result=sum(weight))
h01<-data.frame(north=c(0,0,0),south=c(0,0,0),east=c(0,0,0),row.names = c("A","B","C"))
for (x in 1:nrow(x01)){
  h01[x01$home[x],x01$region[x]]=x01$result[x]
}

h01$Home=row.names(h01)
row.names(h01)<-c()

x00数据
df <-  data.frame(
    home = c("A", "B", "C", "A", "C"),
    weight = c(0.1, 0.25, 0.36, 0.14, 0.2),
    region = c("north", "south", "east", "North", "south")
  )

重塑2
library(reshape2)
dcast(df, home ~ region, value.var = "weight", fill = 0)

基础
# xtabs
xtabs(weight ~ home + region, data = df) 

# reshape
df_wide <-reshape(df, idvar ='home', timevar ='region', direction ='wide')
df_wide[is.na(df_wide)] <- 0

tidyr版本不起作用，对于行的重复标识符，以及重新格式化2版本计算行数，而不是求和，XTAB起作用perfectly@Aurélien我对你的评论感到困惑，因为我得到了与上面使用提供的数据框的所有方法相同的输出。不管怎样，我很高兴它成功了。
# xtabs
xtabs(weight ~ home + region, data = df) 

# reshape
df_wide <-reshape(df, idvar ='home', timevar ='region', direction ='wide')
df_wide[is.na(df_wide)] <- 0

  home east north North south
1    A 0.00   0.1  0.14  0.00
2    B 0.00   0.0  0.00  0.25
3    C 0.36   0.0  0.00  0.20