Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/76.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R(或Python)创建一个带有渐进分段的矩形树映射_Python_R_Ggplot2_Treeview_Visualization - Fatal编程技术网

R(或Python)创建一个带有渐进分段的矩形树映射

R(或Python)创建一个带有渐进分段的矩形树映射,python,r,ggplot2,treeview,visualization,Python,R,Ggplot2,Treeview,Visualization,我想要一些关于如何处理这个有趣问题的想法(至少对我来说)。假设我有一个包含3个不同特征变量的总体,并对总体进行了一些定量评级。示例如下所示: df income expense education gender residence 1 153 2989 NoCollege F Own 2 289 872 College F Rent 3 551 98 NoCollege M

我想要一些关于如何处理这个有趣问题的想法(至少对我来说)。假设我有一个包含3个不同特征变量的总体,并对总体进行了一些定量评级。示例如下所示:

df

   income expense education gender residence
1   153      2989 NoCollege      F       Own
2   289       872   College      F      Rent
3   551        98 NoCollege      M      Rent
4   286       320   College      M      Rent
5   259       372 NoCollege      M      Rent
6   631       221 NoCollege      M       Own
7   729       105   College      M      Rent
8   582       450 NoCollege      M       Own
9   570       253   College      F      Rent
10 1380       635 NoCollege      F      Rent
11  409       425 NoCollege      M      Rent
12  569       232 NoCollege      F       Own
13  317       856   College      M      Rent
14  199       283   College      F       Own
15  624       564 NoCollege      M       Own
16 1064       504 NoCollege      M       Own
17  821       169 NoCollege      F      Rent
18  402       175   College      M       Own
19  602       285   College      M      Rent
20  433       264   College      M      Rent
21  670       985 NoCollege      F       Own
我可以计算三个特征变量(教育、性别和居住)所定义的各部分的支出与收入比率(SIR)。因此,在第一级,没有进行分割,SIR是:

df %>% summarise(count=n(), spending_ratio=sum(expense)/sum(income)*100)
>>   count spending_ratio
   1    21           95.8
然后我将人口分为男性和女性群体,得出:

df %>% group_by(gender) %>% summarise(count=n(), spending_ratio=sum(expense)/sum(income)*100) 
>>   gender count spending_ratio
   1      F     8          138.0
   2      M    13           67.3
我们通过引入教育继续这一过程:

df %>% group_by(gender, education) %>% summarise(count=n(), spending_ratio=sum(expense)/sum(income)*100)
>>   gender education count spending_ratio
   1      F   College     3          133.1
   2      F NoCollege     5          139.4
   3      M   College     6           72.4
   4      M NoCollege     7           63.9
最后添加
住宅

df %>% group_by(gender, education, residence) %>% summarise(count=n(), spending_ratio=sum(expense)/sum(income)*100)
>>  gender education residence count spending_ratio
  1      F   College       Own     1          142.2
  2      F   College      Rent     2          131.0
  3      F NoCollege       Own     3          302.2
  4      F NoCollege      Rent     2           36.5
  5      M   College       Own     1           43.5
  6      M   College      Rent     5           77.3
  7      M NoCollege       Own     4           59.9
  8      M NoCollege      Rent     3           73.4
我想实现的是生成一个包含所有上述信息的。但正如你所看到的,树形图与我想要的相去甚远。我想要得到的是一个类似于顶部图像的地图,其中每个矩形的大小表示计数,颜色表示SIR,树的所有级别都包括在内


非常感谢您的帮助。

您可以使用
treemap
包在不同级别上进行聚合,但是输出需要格式化。当
treemap
进行连续聚合时,它会删除data.table中的所有附加变量。因此,由于聚合函数需要额外的变量,我创建了一些伪变量。变量“索引”用于将每个子集的“费用”和“收入”编入索引。这是你可以做到的

library(treemap)
library(data.table)

## Some dummy variables to aggregate by: ALL, i, and index
dat <- as.data.table(df)[, `:=`(total = factor("ALL"), i = 1, index = 1:.N)][]
indexList <- c('total', 'gender', 'education', 'residence')  # order or aggregation

## Function to aggregate at each grouping level (SIR)
agg <- function(index, ...) {
    dots <- list(...)
    expense <- dots[["expense"]][index]
    income <- dots[["income"]][index]
    sum(expense) / sum(income) * 100
}

## Get treemap data
res <- treemap(dat, index=indexList, vSize='i', vColor='index',
               type="value", fun.aggregate = "agg",
               palette = 'RdYlBu',
               income=dat[["income"]],
               expense=dat[["expense"]])  # ... args get passed to fun.aggregate

## The useful variables: level (corresponds to indexList), vSize (bar size), vColor(SIR)
## Create a label variable that is the value of the variable in indexList at each level
out <- res$tm
out$label <- out[cbind(1:nrow(out), out$level)]
out$label <- with(out, ifelse(level==4, substring(label, 1, 1), label))  # shorten labels
out$level <- factor(out$level, levels=sort(unique(out$level), TRUE))     # factor levels

## Time to find label positions, scale to [0, 1] first
## x-value is cumsum by group,  y will just be the level
out$xlab <- out$vSize / max(aggregate(vSize ~ level, data=out, sum)$vSize)
split(out$xlab, out$level) <- lapply(split(out$xlab, out$level), function(x) cumsum(x) - x/2)

## Make plot
library(ggplot2)
ggplot(out, aes(x=level, y=vSize, fill=color, group=interaction(level, label))) +
  geom_bar(stat='identity', position='fill') +  # add another for black rectangles but not legend
  geom_bar(stat='identity', position='fill', color="black", show_guide=FALSE) +
  geom_text(data=out, aes(x=level, y=xlab, label=label, ymin=0, ymax=1), size=6, font=2,
            inherit.aes=FALSE) +
  coord_flip() +
  scale_fill_discrete('SIR', breaks=out$color, labels = round(out$vColor)) +
  theme_minimal() +  # Then just some formatting 
  xlab("") + ylab("") +
  theme(axis.text.x = element_blank(),
        axis.ticks.x = element_blank(),
        axis.text.y = element_blank(),
        axis.ticks.y = element_blank())

这看起来很棒!我现在就试着去适应它。一旦我确定我理解了,我会接受这个答案。是的,很抱歉,这可能很难理解,我试图记录它,但意识到它相当密集。关于
treemap
函数如何使用其聚合参数的文档非常稀少。我正在考虑把它包装成更易于使用的东西,我喜欢树形图的这种布局。顺便说一下,
itreemap
函数和
d3treeR
库非常酷
## Make plot with gradient color for SIR
library(ggplot2)
ggplot(out, aes(x=level, y=vSize, fill=vColor, group=interaction(level, label))) +
  geom_bar(stat='identity', position='fill') +  # add another for black rectangles but not legend
  geom_bar(stat='identity', position='fill', color="black", show_guide=FALSE) +
  geom_text(data=out, aes(x=level, y=xlab, label=label, ymin=0, ymax=1), size=6, font=2,
            inherit.aes=FALSE) +
  coord_flip() +
  scale_fill_gradientn(colours = c("white", "red")) +
  theme_minimal() +  # Then just some formatting 
  xlab("") + ylab("") +
  theme(axis.text.x = element_blank(),
        axis.ticks.x = element_blank(),
        axis.text.y = element_blank(),
        axis.ticks.y = element_blank())