Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/oracle/9.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 如何按群集汇总数据_R_Data.table - Fatal编程技术网

R 如何按群集汇总数据

R 如何按群集汇总数据,r,data.table,R,Data.table,假设我有以下数据: library(data.table) set.seed(200) data <- data.table(income=runif(20, 1000,8000), gender=sample(0:1,20, T), asset=runif(20, 10000,80000),education=sample(1:4,20,T), cluster = sample(1:4, 20, T)) 我认为我的代码效率不高 您能给我一些如何处理这种情况的建议吗?您可以对分类

假设我有以下数据:

library(data.table)    
set.seed(200)
data <- data.table(income=runif(20, 1000,8000), gender=sample(0:1,20, T), asset=runif(20, 10000,80000),education=sample(1:4,20,T), cluster = sample(1:4, 20, T))
我认为我的代码效率不高


您能给我一些如何处理这种情况的建议吗?

您可以对
分类变量使用
for
循环

res <- list()
for(i in c('gender', 'education')){
   res[[i]] <- prop.table(table(cbind(data[,'cluster'], data[, ..i])), margin=1)
}

res
我会这样做:

data[, .N, by=.(gender, cluster)][, .(gender, ratio = N/sum(N)), by=cluster]
data[, .N, by=.(education, cluster)][, .(education, ratio = N/sum(N)), by=cluster]
lapply(data[,c('gender','education'), with=FALSE], function(x)
         prop.table(table(cbind(data[,'cluster', with=FALSE],x)), margin=1))
data[, .N, by=.(gender, cluster)][, .(gender, ratio = N/sum(N)), by=cluster]
data[, .N, by=.(education, cluster)][, .(education, ratio = N/sum(N)), by=cluster]