百分比计数的R聚合_R_Ggplot2

百分比计数的R聚合

百分比计数的R聚合,r,ggplot2,R,Ggplot2,使用数据框df1如下-我需要通过make +-----------------------------------------+ |reg |make |model |year|abs |gears|fm| +-----------------------------------------+ |ax1234|Toyota|Corolla|1999|true |6 |0 | |ax1235|Toyota|Corolla|1999|false|5 |0 | |ax1236|T

使用数据框

df1

如下-我需要通过

make

+-----------------------------------------+
|reg   |make  |model  |year|abs  |gears|fm|
+-----------------------------------------+
|ax1234|Toyota|Corolla|1999|true |6    |0 |
|ax1235|Toyota|Corolla|1999|false|5    |0 |
|ax1236|Toyota|Corolla|1992|false|4    |NA|
|ax1237|Toyota|Camry  |2001|true |7    |1 |
|ax1238|Honda |Civic  |1994|true |5    |NA|
|ax1239|Honda |Civic  |2000|false|6    |0 |
|ax1240|Honda |Accord |1992|false|4    |NA|
|ax1241|Nissan|Sunny  |2001|true |6    |0 |
|ax1242|      |       |1998|false|6    |0 |
|ax1243|NA    |NA     |1992|false|4    |NA|
+-----------------------------------------+

我需要通过

make

找到汽车的百分比-我的操作如下

df2 <- aggregate(reg ~ addNA(make), df1, function(x){ return (length(x)/nrow(df1))})
> df2
  addNA(make) reg
1             0.1
2       Honda 0.3
3      Nissan 0.1
4      Toyota 0.4
5        <NA> 0.1
>

df3 <- df2[order(-df2[,2]),]
ggplot(df3, aes(x=df3[,1], y=df3[,2])) + geom_bar(stat = "identity") +
  xlab("make") + scale_y_continuous(labels = percent, name="perc")

df2-df2
addNA（make）注册
1             0.1
2本田0.3
3日产0.1
4丰田0.4
5         0.1
>

最后，我绘制了一个条形图，如下所示

df2 <- aggregate(reg ~ addNA(make), df1, function(x){ return (length(x)/nrow(df1))})
> df2
  addNA(make) reg
1             0.1
2       Honda 0.3
3      Nissan 0.1
4      Toyota 0.4
5        <NA> 0.1
>

df3 <- df2[order(-df2[,2]),]
ggplot(df3, aes(x=df3[,1], y=df3[,2])) + geom_bar(stat = "identity") +
  xlab("make") + scale_y_continuous(labels = percent, name="perc")

df3我建议您在线阅读许多ggplot2
教程。有许多方法可以简化代码，或者使用不同的工具，但随着时间的推移和持续使用，这些方法也会出现
我相信您能做的最大改进是使用调用中传递的数据中的简单变量名。您还可以通过对不同的geom

您面临的第二个问题是，在大多数情况下，ggplot
根据数据确定变量的顺序。对于要传递的离散数据，您需要按照适当的顺序创建一个因子。你差不多到了，已经正确地整理了数据。您只需要从中获得“创建因子”
您的电话：
df2 <- aggregate(reg ~ addNA(make), df, function(x){ return (length(x)/nrow(df))})

df3 <- df2[order(-df2[,2]),]

ggplot(df3, aes(x=df3[,1], y=df3[,2])) + geom_bar(stat = "identity") +
  xlab("make") + scale_y_continuous(labels = percent, name="perc")

df2对于条的顺序，如上所述，诀窍是使用因子，因为ggplot2将按字母顺序对字符进行排序
您可以利用dplyr
包来操作数据，而不必在每一步使用管道%%>%%
操作符来存储数据，该操作符将数据作为下一个函数的第一个参数传递。

以下是我的版本：
library(dplyr)    # For data manipulation and the pipe (%>%) operator
library(forcats)  # For factor handling (here fct_reorder())
library(ggplot2)  # For plots
library(scales)   # For percent scale

# Start with the data frame and pass it with the pipe to the next function
df1 %>% 
  # Then we group it by make
  group_by(make) %>% 
  # We summarise by vreating a prop variable, n() returns the number of row by group
  summarise(prop = n()/nrow(df1)) %>% 
  # We then transform the make variable into a factor, the order of the level
  # given by -prop (to have it in decreasing order)
  mutate(make = fct_reorder(make, -prop)) %>% 
  # And we pass it to the plot
  # Notice the transition to + instead of %>% 
  ggplot(aes(x = make, y = prop)) +
    geom_col() +
    scale_y_continuous(label = percent) +
    labs(x = "Make", y = "Percent")

还请注意，对于我来说，NAs存储为NA
，而不是字符串“NA”
，因此无论值如何，都将作为最后一个条形图打印
 这很有效。然而，当我使用df2的子集时，如df2[1:10]，排序丢失，即使用类似于-ggplot（df2[1:10，]，aes（x=make，y=reg））
-我们如何通过子集中的reg
获得排序顺序，以及仔细检查数据和代码。如果我运行像df2[c（2,5），]
这样的子集并绘制它，它会保持顺序。当我使用ggplot（df2，aes（x=make，y=reg））+…
时，ggplot按顺序显示条形图-丰田，本田，“，日产，这很好。然而，当我通过调用ggplot（df2[1:3]，aes（x=make，y=reg））+…
对数据进行子集划分，表示我只想要按计数百分比排名前3位的制造商时，ggplot显示Honda，“，Nissan
——正如我所期望的那样显示Toyota，Honda，”