R 使用ggplot绘制线条图和条形图(线条图带有次轴) 问题

R 使用ggplot绘制线条图和条形图(线条图带有次轴) 问题,r,ggplot2,R,Ggplot2,我两天前刚开始做R。我已经完成了一些基本的R教程,我能够绘制二维数据。我从Oracle数据库中提取数据。现在,我在尝试使用次轴合并两种图形类型(直线和条形)时遇到了问题 我没有问题,在Excel上绘制这些数据。图为: 我无法在R上绘制它。我搜索了一些相关示例,但无法根据我的要求调整它() 代码 以下是我用来分别绘制条形图和线形图的代码: 酒吧: ggplot是一个“高级”绘图库,这意味着它是为了表达数据中的清晰关系而构建的,而不是用于绘制形状的简单系统。它的一个基本假设是,二级或双数据轴通常是

我两天前刚开始做R。我已经完成了一些基本的R教程,我能够绘制二维数据。我从Oracle数据库中提取数据。现在,我在尝试使用次轴合并两种图形类型(直线和条形)时遇到了问题

我没有问题,在Excel上绘制这些数据。图为:

我无法在R上绘制它。我搜索了一些相关示例,但无法根据我的要求调整它()

代码 以下是我用来分别绘制条形图和线形图的代码:

酒吧:


ggplot是一个“高级”绘图库,这意味着它是为了表达数据中的清晰关系而构建的,而不是用于绘制形状的简单系统。它的一个基本假设是,二级或双数据轴通常是一个坏主意;这些图形将多个关系绘制到同一空间中,但不能保证两个轴实际上共享有意义的关系(参见示例)

综上所述,ggplot确实具有定义次轴的能力,尽管出于您所描述的目的使用它是有意的困难。实现目标的一种方法是将数据集拆分为两个单独的数据集,然后在同一ggplot对象中绘制这些数据集。这当然是可能的,但要注意要产生你想要的效果需要多少额外的代码

library(tidyverse)
library(scales)

df.base <- df[c('MONTHS', 'BASE')] %>% 
  mutate(MONTHS = factor(MONTHS, MONTHS, ordered = T))

df.percent <- df[c('MONTHS', 'INTERNETPERCENTAGE', 'SMARTPHONEPERCENTAGE')] %>% 
  gather(variable, value, -MONTHS)

g <- ggplot(data = df.base, aes(x = MONTHS, y = BASE)) +
  geom_col(aes(fill = 'BASE')) +
  geom_line(data = df.percent, aes(x = MONTHS, y = value / 40 * 12500000 + 33500000, color = variable, group = variable)) +
  geom_point(data = df.percent, aes(x = MONTHS, y = value / 40 * 12500000 + 33500000, color = variable)) +
  geom_label(data = df.percent, aes(x = MONTHS, y = value / 40 * 12500000 + 33500000, fill = variable, label = sprintf('%i%%', value)), color = 'white', vjust = 1.6, size = 4) +
  scale_y_continuous(sec.axis = sec_axis(~(. - 33500000) / 12500000 * 40, name = 'PERCENT'), labels = comma) +
  scale_fill_manual(values = c('lightblue', 'red', 'darkgreen')) +
  scale_color_manual(values = c('red', 'darkgreen')) +
  coord_cartesian(ylim = c(33500000, 45500000)) +
  labs(fill = NULL, color = NULL) +
  theme_minimal()
print(g)
库(tidyverse)
图书馆(比例尺)
df.base%
突变(月=因子(月,月,顺序=T))
df.1%的百分比
聚集(变量、值、月份)

请注意,我的回答是基于您原始的“未清理”数据(我附在我的帖子底部)

这里的关键是转换百分比值,使其使用与
BASE
相同的范围。然后,我们应用变换的逆运算,将原始百分比值显示为第二个y轴

(个人)注意事项:副轴是。就我个人而言,我会使用facet或两个单独的图来避免图的混乱和过载。还要注意的是,哈德利本人也是如此,因此双轴的
ggplot2
支持受到(合理地)限制

除此之外,还有一个选择:

  • 首先,让我们清理数据(删除千位分隔符、百分号等)


  • 样本数据
    df
    您需要有一个类似于最大y轴1与最大y轴2之比的变换因子。这里,次y轴应该比主y轴小100000倍。因此:

    代码

    资料
    p和p1工作得很好,我无法合并这两个图。而,将线图的主y轴作为合并后的第二y轴。我无法形式化任何可以返回此合并类型的任何绘图的代码,因此在我的问题中,我没有发布任何绘图。您能否使用
    dput(head(df,20))
    向您的问题添加数据?;什么都不做,我想这只是一个习惯。我也可以清理它,但在数据库方面。比如删除“%”符号或删除“,”为什么要投否决票?如果有不清楚的地方,我很乐意解释。我没有否决投票。事实上,我从你的回答中学到了不止一件事。非常感谢您的帮助和时间。您的回答证明非常有帮助。没有更多的事情,我们可以改变红色标签的位置向上,以避免重叠。谢谢
    p1 <- ggplot(data = df, aes(x = MONTHS, y = df$INTERNETPERCENTAGE, group = 1)) + 
        geom_line() + 
        geom_point()
    
    > dput(head(df,20))
    structure(list(MONTHS = structure(c(11L, 10L, 3L, 5L, 4L, 8L, 
    1L, 9L, 7L, 6L, 2L, 13L, 12L), .Label = c("Apr-18", "Aug-18", 
    "Dec-17", "Feb-18", "Jan-18", "Jul-18", "Jun-18", "Mar-18", "May-18", 
    "Nov-17", "Oct-17", "Oct-18", "Sep-18"), class = "factor"), BASE = c(40756228L, 
    41088219L, 41642601L, 42017111L, 42439446L, 42847468L, 43375319L, 
    43440484L, 43464735L, 43326823L, 43190949L, 43015301L, 42780071L
    ), INTERNETUSERGREATERTHAN0KB = c(13380576L, 13224502L, 14044105L, 
    14239169L, 14011423L, 14736043L, 14487827L, 14460410L, 14632695L, 
    14896654L, 15019329L, 14141766L, 14209288L), INTERNETPERCENTAGE = c(33L, 
    32L, 34L, 34L, 33L, 34L, 33L, 33L, 34L, 34L, 35L, 33L, 33L), 
        SMARTPHONE = c(11610216L, 11875033L, 12225965L, 12412010L, 
        12760251L, 12781082L, 13142400L, 13295826L, 13422476L, 13408216L, 
        13504339L, 13413596L, 13586438L), SMARTPHONEPERCENTAGE = c(28L, 
        29L, 29L, 30L, 30L, 30L, 30L, 31L, 31L, 31L, 31L, 31L, 32L
        ), INTERNETUSAGEGREATERTHAN0KB4G = c(829095L, 969531L, 1181411L, 
        1339620L, 1474300L, 1733027L, 1871816L, 1967129L, 2117418L, 
        2288215L, 2453243L, 2624865L, 2817199L)), row.names = c(NA, 
    13L), class = "data.frame")
    
    library(tidyverse)
    library(scales)
    
    df.base <- df[c('MONTHS', 'BASE')] %>% 
      mutate(MONTHS = factor(MONTHS, MONTHS, ordered = T))
    
    df.percent <- df[c('MONTHS', 'INTERNETPERCENTAGE', 'SMARTPHONEPERCENTAGE')] %>% 
      gather(variable, value, -MONTHS)
    
    g <- ggplot(data = df.base, aes(x = MONTHS, y = BASE)) +
      geom_col(aes(fill = 'BASE')) +
      geom_line(data = df.percent, aes(x = MONTHS, y = value / 40 * 12500000 + 33500000, color = variable, group = variable)) +
      geom_point(data = df.percent, aes(x = MONTHS, y = value / 40 * 12500000 + 33500000, color = variable)) +
      geom_label(data = df.percent, aes(x = MONTHS, y = value / 40 * 12500000 + 33500000, fill = variable, label = sprintf('%i%%', value)), color = 'white', vjust = 1.6, size = 4) +
      scale_y_continuous(sec.axis = sec_axis(~(. - 33500000) / 12500000 * 40, name = 'PERCENT'), labels = comma) +
      scale_fill_manual(values = c('lightblue', 'red', 'darkgreen')) +
      scale_color_manual(values = c('red', 'darkgreen')) +
      coord_cartesian(ylim = c(33500000, 45500000)) +
      labs(fill = NULL, color = NULL) +
      theme_minimal()
    print(g)
    
    library(tidyverse)
    df.clean <- df %>%
        mutate_if(is.factor, as.character) %>%
        gather(USAGE, PERCENTAGE, INTERNETPERCENTAGE, SMARTPHONEPERCENTAGE) %>%
        mutate(
            MONTHS = factor(MONTHS, levels = df$MONTHS),
            BASE = as.numeric(str_replace_all(BASE, ",", "")),
            PERCENTAGE = as.numeric(str_replace(PERCENTAGE, "%", "")))
    
    y1 <- min(df.clean$BASE)
    y2 <- max(df.clean$BASE)
    x1 <- min(df.clean$PERCENTAGE)
    x2 <- max(df.clean$PERCENTAGE)
    b <- (y2 - y1) / (x2 - x1)
    a <- y1 - b * x1
    
    df.clean %>%
        mutate(perc.scaled = a + b * PERCENTAGE) %>%
        ggplot(aes(MONTHS, BASE)) +
        geom_col(
            data = df.clean %>% distinct(MONTHS, .keep_all = TRUE),
            aes(MONTHS, BASE), fill = "dodgerblue4") +
        geom_point(aes(MONTHS, perc.scaled, colour = USAGE, group = USAGE)) +
        geom_line(aes(MONTHS, perc.scaled, colour = USAGE, group = USAGE)) +
        geom_label(
            aes(MONTHS, perc.scaled, label = PERCENTAGE, fill = USAGE),
            vjust = 1.4,
            show.legend = F) +
        scale_y_continuous(
                name =  "BASE",
                sec.axis = sec_axis(~ (. - a) / b, name = "Percentage")) +
        coord_cartesian(ylim = c(0.99 * min(df.clean$BASE), max(df.clean$BASE))) +
        theme_minimal() +
        theme(legend.position = "bottom")
    
    df <- structure(list(MONTHS = structure(c(11L, 10L, 3L, 5L, 4L, 8L,
    1L, 9L, 7L, 6L, 2L, 13L, 12L), .Label = c("APR-18", "AUG-18",
    "DEC-17", "FEB-18", "JAN-18", "JUL-18", "JUN-18", "MAR-18", "MAY-18",
    "NOV-17", "OCT-17", "OCT-18", "SEP-18"), class = "factor"), BASE = structure(c(1L,
    2L, 3L, 4L, 5L, 7L, 11L, 12L, 13L, 10L, 9L, 8L, 6L), .Label = c("40,756,228",
    "41,088,219", "41,642,601", "42,017,111", "42,439,446", "42,780,071",
    "42,847,468", "43,015,301", "43,190,949", "43,326,823", "43,375,319",
    "43,440,484", "43,464,735"), class = "factor"), INTERNETUSERGREATERTHAN0KB = structure(c(2L,
    1L, 4L, 7L, 3L, 11L, 9L, 8L, 10L, 12L, 13L, 5L, 6L), .Label = c("13,224,502",
    "13,380,576", "14,011,423", "14,044,105", "14,141,766", "14,209,288",
    "14,239,169", "14,460,410", "14,487,827", "14,632,695", "14,736,043",
    "14,896,654", "15,019,329"), class = "factor"), INTERNETPERCENTAGE = structure(c(2L,
    1L, 3L, 3L, 2L, 3L, 2L, 2L, 3L, 3L, 4L, 2L, 2L), .Label = c("32%",
    "33%", "34%", "35%"), class = "factor"), SMARTPHONE = structure(c(1L,
    2L, 3L, 4L, 5L, 6L, 7L, 8L, 11L, 9L, 12L, 10L, 13L), .Label = c("11,610,216",
    "11,875,033", "12,225,965", "12,412,010", "12,760,251", "12,781,082",
    "13,142,400", "13,295,826", "13,408,216", "13,413,596", "13,422,476",
    "13,504,339", "13,586,438"), class = "factor"), SMARTPHONEPERCENTAGE = structure(c(1L,
    2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L), .Label = c("28%",
    "29%", "30%", "31%", "32%"), class = "factor"), INTERNETUSAGEGREATERTHAN0KB4G = structure(c(12L,
    13L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L), .Label = c("1,181,411 ",
    "1,339,620 ", "1,474,300 ", "1,733,027 ", "1,871,816 ", "1,967,129 ",
    "2,117,418 ", "2,288,215 ", "2,453,243 ", "2,624,865 ", "2,817,199 ",
    "829,095 ", "969,531 "), class = "factor")), row.names = c(NA,
    13L), class = "data.frame")
    
    ggplot(df) + 
        geom_col(aes(x = MONTHS, y = BASE)) +
        # apply transformation factor to line plot
        geom_line(aes(x = MONTHS, y = INTERNETPERCENTAGE/0.000001, group = 1), 
                  color = "red", size = 1) +
        theme_minimal() +
        geom_text(aes(x = MONTHS, y = BASE, label=BASE), 
                  vjust=1.6, color="White", size=2.5) +
        # add secondary y-axis that is 100,000 times smaller
        scale_y_continuous(sec.axis = sec_axis(~.*0.000001, name = "Internet Percentage in %")) +
        labs(y = "Base", x = "Months")
    
    df <- structure(list(MONTHS = structure(c(17440, 17471, 17501, 17532, 17563, 17591, 17622, 17652, 17683, 17713, 17744, 17775, 17805), class = "Date"), BASE = c(40756228L, 41088219L, 41642601L, 42017111L, 42439446L, 42847468L, 43375319L, 43440484L, 43464735L, 43326823L, 43190949L, 43015301L, 42780071L), INTERNETUSERGREATERTHAN0KB = c(13380576L, 13224502L, 14044105L, 14239169L, 14011423L, 14736043L, 14487827L, 14460410L, 14632695L, 14896654L, 15019329L, 14141766L, 14209288L), INTERNETPERCENTAGE = c(33L, 32L, 34L, 34L, 33L, 34L, 33L, 33L, 34L, 34L, 35L, 33L, 33L), SMARTPHONE = c(11610216L, 11875033L, 12225965L, 12412010L, 12760251L, 12781082L, 13142400L, 13295826L, 13422476L, 13408216L, 13504339L, 13413596L, 13586438L), SMARTPHONEPERCENTAGE = c(28L, 29L, 29L, 30L, 30L, 30L, 30L, 31L, 31L, 31L, 31L, 31L, 32L), INTERNETUSAGEGREATERTHAN0KB4G = c(829095L, 969531L, 1181411L, 1339620L, 1474300L, 1733027L, 1871816L, 1967129L, 2117418L, 2288215L, 2453243L, 2624865L, 2817199L)), row.names = c(NA, 13L), class = "data.frame")
    
    > ggplot_build(p)[[1]][[2]]
              y     x group PANEL colour size linetype alpha
    1  33000000 17440     1     1    red    1        1    NA
    2  32000000 17471     1     1    red    1        1    NA
    3  34000000 17501     1     1    red    1        1    NA
    4  34000000 17532     1     1    red    1        1    NA
    5  33000000 17563     1     1    red    1        1    NA
    6  34000000 17591     1     1    red    1        1    NA
    7  33000000 17622     1     1    red    1        1    NA
    8  33000000 17652     1     1    red    1        1    NA
    9  34000000 17683     1     1    red    1        1    NA
    10 34000000 17713     1     1    red    1        1    NA
    11 35000000 17744     1     1    red    1        1    NA
    12 33000000 17775     1     1    red    1        1    NA
    13 33000000 17805     1     1    red    1        1    NA