R 在分类变量图表中显示百分比而不是计数_R_Ggplot2

R 在分类变量图表中显示百分比而不是计数

R 在分类变量图表中显示百分比而不是计数,r,ggplot2,R,Ggplot2,我正在绘制一个分类变量，而不是显示每个类别值的计数我正在寻找一种方法来获取ggplot以显示该类别中的值的百分比。当然，可以用计算出的百分比创建另一个变量并绘制该变量，但我必须做几十次，我希望通过一个命令实现这一点我在做类似的实验 qplot(mydataf) + stat_bin(aes(n = nrow(mydataf), y = ..count../n)) + scale_y_continuous(formatter = "percent") 但我一定是用错了，因为我有错误

我正在绘制一个分类变量，而不是显示每个类别值的计数

我正在寻找一种方法来获取

ggplot

以显示该类别中的值的百分比。当然，可以用计算出的百分比创建另一个变量并绘制该变量，但我必须做几十次，我希望通过一个命令实现这一点

我在做类似的实验

qplot(mydataf) +
  stat_bin(aes(n = nrow(mydataf), y = ..count../n)) +
  scale_y_continuous(formatter = "percent")

但我一定是用错了，因为我有错误

为了方便地重现设置，下面是一个简化的示例：

mydata <- c ("aa", "bb", NULL, "bb", "cc", "aa", "aa", "aa", "ee", NULL, "cc");
mydataf <- factor(mydata);
qplot (mydataf); #this shows the count, I'm looking to see % displayed.

但所有4个都给出：

对于以下简单情况，也会出现相同的错误：

ggplot (data=mydataf, aes(levels(mydataf))) +
  geom_bar()

因此，这显然是关于

ggplot

如何与单个向量交互的问题。我搔搔头，用谷歌搜索那个错误给出了一个简单的答案

这个修改过的代码应该可以工作

p = ggplot(mydataf, aes(x = foo)) + 
    geom_bar(aes(y = (..count..)/sum(..count..))) + 
    scale_y_continuous(formatter = 'percent')

如果您的数据有NAs，并且不希望它们包含在绘图中，请将na.omit（mydataf）作为参数传递给ggplot

希望这能有所帮助。

自从回答了这个问题后，

ggplot

语法有了一些有意义的变化。总结上述评论中的讨论：

 require(ggplot2)
 require(scales)

 p <- ggplot(mydataf, aes(x = foo)) +  
        geom_bar(aes(y = (..count..)/sum(..count..))) + 
        ## version 3.0.0
        scale_y_continuous(labels=percent)

这个问题目前在谷歌上是“ggplot count vs percentage histogram”的热门话题，因此希望这有助于提取当前包含在对公认答案的评论中的所有信息

备注：如果未将

hp

设置为因子，则ggplot返回：

如果希望在y轴上使用百分比标签但实际为N，请尝试以下操作：

    library(scales)
perbar=function(xx){
      q=ggplot(data=data.frame(xx),aes(x=xx))+
      geom_bar(aes(y = (..count..)),fill="orange")
       q=q+    geom_text(aes(y = (..count..),label = scales::percent((..count..)/sum(..count..))), stat="bin",colour="darkgreen") 
      q
    }
    perbar(mtcars$disp)

如果要在y轴上显示百分比并在条形图上标记百分比，请执行以下操作：

library(ggplot2)
library(scales)
ggplot(mtcars, aes(x = as.factor(am))) +
  geom_bar(aes(y = (..count..)/sum(..count..))) +
  geom_text(aes(y = ((..count..)/sum(..count..)), label = scales::percent((..count..)/sum(..count..))), stat = "count", vjust = -0.25) +
  scale_y_continuous(labels = percent) +
  labs(title = "Manual vs. Automatic Frequency", y = "Percent", x = "Automatic Transmission")

添加条形标签时，您可能希望通过在末尾添加以下内容来忽略更清晰图表的y轴：

  theme(
        axis.text.y=element_blank(), axis.ticks=element_blank(),
        axis.title.y=element_blank()
  )

对于ggplot2 2.1.0版，它是

+ scale_y_continuous(labels = scales::percent)

截至2017年3月，对于

ggplot2

2.2.1，我认为最好的解决方案在Hadley Wickham的R for data science一书中进行了解释：

ggplot(mydataf) + stat_count(mapping = aes(x=foo, y=..prop.., group=1))

stat\u count

计算两个变量：默认情况下使用

count

，但您可以选择使用显示比例的

prop

。

以下是分面数据的解决方法。（在这种情况下，@Andrew接受的答案不起作用。）想法是使用dplyr计算百分比值，然后使用geom_col创建绘图

library(ggplot2)
library(scales)
library(magrittr)
library(dplyr)

binwidth <- 30

mtcars.stats <- mtcars %>%
  group_by(cyl) %>%
  mutate(bin = cut(hp, breaks=seq(0,400, binwidth), 
               labels= seq(0+binwidth,400, binwidth)-(binwidth/2)),
         n = n()) %>%
  group_by(cyl, bin) %>%
  summarise(p = n()/n[1]) %>%
  ungroup() %>%
  mutate(bin = as.numeric(as.character(bin)))

ggplot(mtcars.stats, aes(x = bin, y= p)) +  
  geom_col() + 
  scale_y_continuous(labels = percent) +
  facet_grid(cyl~.)

库（ggplot2）
图书馆（比例尺）
图书馆（magrittr）
图书馆（dplyr）
箱宽%
突变（料仓=切割（hp，断裂=序列（0400，料仓宽度），
标签=序号（0+binwidth，400，binwidth）-（binwidth/2）），
n=n（））%>%
组员（气缸、气缸组）%>%
总结（p=n（）/n[1]）%>%
解组（）%>%
变异（bin=as.numeric（as.character（bin）））
ggplot（mtcars.stats，aes（x=bin，y=p））+
geom_col（）+
连续缩放（标签=百分比）+
平面网格（圆柱体~）

这是情节：

请注意，如果变量是连续的，则必须使用geom_histogram（），因为函数将按“bin”对变量进行分组

df因为，我们可以访问方便的after_stat（）
函数
我们可以做一些类似于@Andrew的回答的事情，但不需要使用。
语法：
#原始示例数据
mydata数据应该是一个数据帧，而不是一个裸因子。添加到hadley的注释中，使用mydataf=Data.frame（mydataf）将数据转换为数据帧，并将其重命名为names（mydataf）=foo，这将使ggplot2版本0.9.0中的格式化程序
参数不再起作用。相反，您需要类似于labels=percent\u format（））
的东西。在使用0.9.0时，您需要在使用percent\u format（）
之前加载scales
库，否则它将无法工作。0.9.0不再自动加载支持包。请参阅？统计箱
。它显示了通过ggplot2
向数据框添加的其他列。所有额外列的形式都是.variable..
。将aes（y=（…count..）/sum（…count..）
替换为aes（y=…density..）
？从视觉上看，它给出了非常相似（但仍然不同）的图片。在ggplot 0.9.3.1.0中，您需要首先加载缩放
库，然后使用缩放连续（标签=百分比）
，如前所述，感谢您的回答。关于如何在类的层面上做这件事有什么想法吗？正如@WAF所建议的，这个答案不适用于刻面数据。在中查看@Erwan的评论，您可能需要在它所来自的包中添加前缀percent
，以使上述内容正常工作（我做了）<代码>ggplot（mtcars，aes（x=因子（hp））+几何图形条（aes（y=（…计数…）/总和（…计数…）+比例y连续（标签=比例：：百分比）
要避免使用面，请使用几何图形条（aes（y=（…计数…）/tapply（…计数……面板…，总和）[面板…]）
。每个方面的总和应为100%。变量周围是否有“.”替换为stat（）-命令？这是截至2017年6月的最佳答案，适用于按组填充和刻面。出于某种原因，这不允许我使用填充
映射（没有抛出错误，但没有添加填充颜色）。@MaxCandocia为了获得填充映射，我不得不删除组=1
。如果我删除group
参数，它可能会有所帮助，但是，它不会显示正确的百分比，因为对于每个唯一的x值，所有东西都属于它自己的组。很好的解决方案。但是你忘了乘以100来得到%，也就是说，geom_直方图（aes（y=100*（…count..）/sum（…count..）。对。我正在编辑答案。
+ scale_y_continuous(labels = scales::percent)

ggplot(mydataf) + stat_count(mapping = aes(x=foo, y=..prop.., group=1))

library(ggplot2)
library(scales)
library(magrittr)
library(dplyr)

binwidth <- 30

mtcars.stats <- mtcars %>%
  group_by(cyl) %>%
  mutate(bin = cut(hp, breaks=seq(0,400, binwidth), 
               labels= seq(0+binwidth,400, binwidth)-(binwidth/2)),
         n = n()) %>%
  group_by(cyl, bin) %>%
  summarise(p = n()/n[1]) %>%
  ungroup() %>%
  mutate(bin = as.numeric(as.character(bin)))

ggplot(mtcars.stats, aes(x = bin, y= p)) +  
  geom_col() + 
  scale_y_continuous(labels = percent) +
  facet_grid(cyl~.)

df <- data.frame(V1 = rnorm(100))

ggplot(df, aes(x = V1)) +  
  geom_histogram(aes(y = 100*(..count..)/sum(..count..))) 

# if you use geom_bar(), with factor(V1), each value of V1 will be treated as a
# different category. In this case this does not make sense, as the variable is 
# really continuous. With the hp variable of the mtcars (see previous answer), it 
# worked well since hp was not really continuous (check unique(mtcars$hp)), and one 
# can want to see each value of this variable, and not to group it in bins.
ggplot(df, aes(x = factor(V1))) +  
  geom_bar(aes(y = (..count..)/sum(..count..)))