R 用ggplot展开密度图

R 用ggplot展开密度图,r,plot,ggplot2,R,Plot,Ggplot2,我在fivethirty看到了这幅很棒的图,不同学院的密度图略有重叠。退房 如何使用ggplot2复制此绘图 具体地说,你是如何得到这种轻微的重叠的,facet_wrap是行不通的 TestFrame <- data.frame( Score = c(rnorm(100, 0, 1) ,rnorm(100, 0, 2) ,rnorm(100, 0, 3) ,rnorm(100, 0, 4) ,rno

我在fivethirty看到了这幅很棒的图,不同学院的密度图略有重叠。退房

如何使用ggplot2复制此绘图

具体地说,你是如何得到这种轻微的重叠的,facet_wrap是行不通的

TestFrame <-  
  data.frame(
    Score =
      c(rnorm(100, 0, 1)
        ,rnorm(100, 0, 2)
        ,rnorm(100, 0, 3)
        ,rnorm(100, 0, 4)
        ,rnorm(100, 0, 5))
    ,Group =
      c(rep('Ones', 100)
        ,rep('Twos', 100)
        ,rep('Threes', 100)
        ,rep('Fours', 100)
        ,rep('Fives', 100))
  )

ggplot(TestFrame, aes(x = Score, group = Group)) +
  geom_density(alpha = .75, fill = 'black')

虽然已经有一个很好且被接受的答案——我完成了我的贡献,作为一种替代途径,而无需重新格式化数据


与ggplot一样,关键是以正确的格式获取数据,然后绘图就非常简单了。我确信还有另一种方法可以做到这一点,但我的方法是使用密度进行密度估计,然后使用geom_ribbon制作一种手动geom_密度,它采用ymin和ymax,这是将形状移出x轴所必需的

剩下的挑战是正确的打印顺序,因为ggplot似乎会先打印最宽的色带。最后,需要最庞大代码的部分是四分位数的生成

我还制作了一些与原始数据更一致的数据

library(ggplot2)
library(dplyr)
library(broom)
rawdata <- data.frame(Score = rnorm(1000, seq(1, 0, length.out = 10), sd = 1),
                  Group = rep(LETTERS[1:10], 10000))

df <- rawdata %>% 
  mutate(GroupNum = rev(as.numeric(Group))) %>% #rev() means the ordering will be from top to bottom
  group_by(Group, GroupNum) %>% 
  do(tidy(density(.$Score, bw = diff(range(.$Score))/20))) %>% #The original has quite a large bandwidth
  group_by() %>% 
  mutate(ymin = GroupNum * (max(y) / 1.5), #This constant controls how much overlap between groups there is
         ymax = y + ymin,
         ylabel = ymin + min(ymin)/2,
         xlabel = min(x) - mean(range(x))/2) #This constant controls how far to the left the labels are

#Get quartiles
labels <- rawdata %>% 
  mutate(GroupNum = rev(as.numeric(Group))) %>% 
  group_by(Group, GroupNum) %>% 
  mutate(q1 = quantile(Score)[2],
         median = quantile(Score)[3],
         q3 = quantile(Score)[4]) %>%
  filter(row_number() == 1) %>% 
  select(-Score) %>% 
  left_join(df) %>% 
  mutate(xmed = x[which.min(abs(x - median))],
         yminmed = ymin[which.min(abs(x - median))],
         ymaxmed = ymax[which.min(abs(x - median))]) %>% 
  filter(row_number() == 1)

p <- ggplot(df, aes(x, ymin = ymin, ymax = ymax)) + geom_text(data = labels, aes(xlabel, ylabel, label = Group)) +


geom_vline(xintercept = 0, size = 1.5, alpha = 0.5, colour = "#626262") + 
  geom_vline(xintercept = c(-2.5, -1.25, 1.25, 2.5), size = 0.75, alpha = 0.25, colour = "#626262") + 
  theme(panel.grid = element_blank(),
        panel.background = element_rect(fill = "#F0F0F0"),
        axis.text.y = element_blank(),
        axis.ticks = element_blank(),
        axis.title = element_blank())
for (i in unique(df$GroupNum)) {
  p <- p + geom_ribbon(data = df[df$GroupNum == i,], aes(group = GroupNum), colour = "#F0F0F0", fill = "black") +
    geom_segment(data = labels[labels$GroupNum == i,], aes(x = xmed, xend = xmed, y = yminmed, yend = ymaxmed), colour = "#F0F0F0", linetype = "dashed") +
    geom_segment(data = labels[labels$GroupNum == i,], x = min(df$x), xend = max(df$x), aes(y = ymin, yend = ymin), size = 1.5, lineend = "round") 
}
p <- p + geom_text(data = labels[labels$Group == "A",], aes(xmed - xlabel/50, ylabel), 
                   label = "Median", colour = "#F0F0F0", hjust = 0, fontface = "italic", size = 4)  
编辑 我注意到原来的版本实际上有点含糊其辞,它用一条水平线将每个分布展开,如果你仔细看,你可以看到一个连接。。。。我在循环中添加了与第二个geom_段类似的内容

使用来自以下公司的专用geom_joy:


我觉得你必须用网格自己编程。如果坚持使用一组刚性的标签、轴等选项,这将不会非常复杂。但这是可行的。从长远来看,网格将是实现这一点的优雅方式,但从短期来看,使用基本R工具“密度+多边形”可以更轻松地实现这一点。你会接受这样的回答吗?我们在报告的封面上做了同样的事情:。我会看看我是否能得到共享代码的许可,否则我会模拟一些东西。这看起来真的很尖锐。你知道如何加总的中位数和四分位数吗?你一定也在思考这个问题。欢乐情节似乎已成为主流。
library(ggplot2)
library(dplyr)
library(broom)
rawdata <- data.frame(Score = rnorm(1000, seq(1, 0, length.out = 10), sd = 1),
                  Group = rep(LETTERS[1:10], 10000))

df <- rawdata %>% 
  mutate(GroupNum = rev(as.numeric(Group))) %>% #rev() means the ordering will be from top to bottom
  group_by(Group, GroupNum) %>% 
  do(tidy(density(.$Score, bw = diff(range(.$Score))/20))) %>% #The original has quite a large bandwidth
  group_by() %>% 
  mutate(ymin = GroupNum * (max(y) / 1.5), #This constant controls how much overlap between groups there is
         ymax = y + ymin,
         ylabel = ymin + min(ymin)/2,
         xlabel = min(x) - mean(range(x))/2) #This constant controls how far to the left the labels are

#Get quartiles
labels <- rawdata %>% 
  mutate(GroupNum = rev(as.numeric(Group))) %>% 
  group_by(Group, GroupNum) %>% 
  mutate(q1 = quantile(Score)[2],
         median = quantile(Score)[3],
         q3 = quantile(Score)[4]) %>%
  filter(row_number() == 1) %>% 
  select(-Score) %>% 
  left_join(df) %>% 
  mutate(xmed = x[which.min(abs(x - median))],
         yminmed = ymin[which.min(abs(x - median))],
         ymaxmed = ymax[which.min(abs(x - median))]) %>% 
  filter(row_number() == 1)

p <- ggplot(df, aes(x, ymin = ymin, ymax = ymax)) + geom_text(data = labels, aes(xlabel, ylabel, label = Group)) +


geom_vline(xintercept = 0, size = 1.5, alpha = 0.5, colour = "#626262") + 
  geom_vline(xintercept = c(-2.5, -1.25, 1.25, 2.5), size = 0.75, alpha = 0.25, colour = "#626262") + 
  theme(panel.grid = element_blank(),
        panel.background = element_rect(fill = "#F0F0F0"),
        axis.text.y = element_blank(),
        axis.ticks = element_blank(),
        axis.title = element_blank())
for (i in unique(df$GroupNum)) {
  p <- p + geom_ribbon(data = df[df$GroupNum == i,], aes(group = GroupNum), colour = "#F0F0F0", fill = "black") +
    geom_segment(data = labels[labels$GroupNum == i,], aes(x = xmed, xend = xmed, y = yminmed, yend = ymaxmed), colour = "#F0F0F0", linetype = "dashed") +
    geom_segment(data = labels[labels$GroupNum == i,], x = min(df$x), xend = max(df$x), aes(y = ymin, yend = ymin), size = 1.5, lineend = "round") 
}
p <- p + geom_text(data = labels[labels$Group == "A",], aes(xmed - xlabel/50, ylabel), 
                   label = "Median", colour = "#F0F0F0", hjust = 0, fontface = "italic", size = 4)  
library(ggjoy)

ggplot(TestFrame, aes(Score, Group)) +
  geom_joy()
# dummy data
set.seed(1)
TestFrame <-  
  data.frame(
    Score =
      c(rnorm(100, 0, 1)
        ,rnorm(100, 0, 2)
        ,rnorm(100, 0, 3)
        ,rnorm(100, 0, 4)
        ,rnorm(100, 0, 5))
    ,Group =
      c(rep('Ones', 100)
        ,rep('Twos', 100)
        ,rep('Threes', 100)
        ,rep('Fours', 100)
        ,rep('Fives', 100))
  )

head(TestFrame)
#        Score Group
# 1 -0.6264538  Ones
# 2  0.1836433  Ones
# 3 -0.8356286  Ones
# 4  1.5952808  Ones
# 5  0.3295078  Ones
# 6 -0.8204684  Ones