如何通过dplyr在R上生成频率表并用ggplot绘制其值?
我需要从两个分类变量列中创建一个频率表,其中一个是5岁年龄组,另一个是brfss2013数据集中的健康状况(五个状态),从中我通过以下方式提取了感兴趣的列:如何通过dplyr在R上生成频率表并用ggplot绘制其值?,r,ggplot2,dplyr,R,Ggplot2,Dplyr,我需要从两个分类变量列中创建一个频率表,其中一个是5岁年龄组,另一个是brfss2013数据集中的健康状况(五个状态),从中我通过以下方式提取了感兴趣的列: > hlthgrpq1 <- brfss2013 %>% select(genhlth, X_ageg5yr) 我可以使用“按”功能生成汇总表: > by(hlthgrpq1$genhlth, hlthgrpq1$X_ageg5yr, summary) hlthgrpq1$X_ageg5yr: Age 18 to
> hlthgrpq1 <- brfss2013 %>% select(genhlth, X_ageg5yr)
我可以使用“按”功能生成汇总表:
> by(hlthgrpq1$genhlth, hlthgrpq1$X_ageg5yr, summary)
hlthgrpq1$X_ageg5yr: Age 18 to 24
Excellent Very good Good Fair Poor NA's
6896 10266 7795 1873 303 69
----------------------------------------------------------------------------------------------------------------
hlthgrpq1$X_ageg5yr: Age 25 to 29
Excellent Very good Good Fair Poor NA's
5779 8488 6521 1751 325 46
----------------------------------------------------------------------------------------------------------------
hlthgrpq1$X_ageg5yr: Age 30 to 34
Excellent Very good Good Fair Poor NA's
6412 9958 7977 2295 496 75
----------------------------------------------------------------------------------------------------------------
hlthgrpq1$X_ageg5yr: Age 35 to 39
Excellent Very good Good Fair Poor NA's
6366 10169 8236 2637 638 61
----------------------------------------------------------------------------------------------------------------
hlthgrpq1$X_ageg5yr: Age 40 to 44
Excellent Very good Good Fair Poor NA's
6689 11130 9193 3334 1067 95
----------------------------------------------------------------------------------------------------------------
hlthgrpq1$X_ageg5yr: Age 45 to 49
Excellent Very good Good Fair Poor NA's
7051 12278 10611 4343 1815 112
----------------------------------------------------------------------------------------------------------------
hlthgrpq1$X_ageg5yr: Age 50 to 54
Excellent Very good Good Fair Poor NA's
8545 15254 13761 6354 3120 139
----------------------------------------------------------------------------------------------------------------
hlthgrpq1$X_ageg5yr: Age 55 to 59
Excellent Very good Good Fair Poor NA's
8500 16759 15394 7643 3998 197
----------------------------------------------------------------------------------------------------------------
hlthgrpq1$X_ageg5yr: Age 60 to 64
Excellent Very good Good Fair Poor NA's
8283 16825 16266 8101 3955 229
----------------------------------------------------------------------------------------------------------------
hlthgrpq1$X_ageg5yr: Age 65 to 69
Excellent Very good Good Fair Poor NA's
7479 15764 15600 7749 3200 205
----------------------------------------------------------------------------------------------------------------
hlthgrpq1$X_ageg5yr: Age 70 to 74
Excellent Very good Good Fair Poor NA's
5491 11943 13125 6491 2721 196
----------------------------------------------------------------------------------------------------------------
hlthgrpq1$X_ageg5yr: Age 75 to 79
Excellent Very good Good Fair Poor NA's
3320 8501 10128 5545 2426 173
----------------------------------------------------------------------------------------------------------------
hlthgrpq1$X_ageg5yr: Age 80 or older
Excellent Very good Good Fair Poor NA's
3697 10285 14400 8116 3695 322
这就是我被卡住的地方。我已经试了好几个小时试图到达这里:
谢谢你的帮助
(这是针对特定的作业,因此我只能使用dplyr和ggplot2,因此,不需要重新整形2或tidyr。)首先:对于将来的发布,最好始终包含样本数据。请参见包含示例数据 碱基R中的溶液
as.data.frame.matrix(t(table(df)));
# Fair Good Very good
#Age 50 to 54 0 1 0
#Age 55 to 59 0 1 0
#Age 60 to 64 1 0 1
#Age 65 to 69 0 1 0
或者像这样的
tidyverse
方法
library(tidyverse);
df %>% count(genhlth, X_ageg5yr) %>% spread(genhlth, n);
## A tibble: 4 x 4
# X_ageg5yr Fair Good `Very good`
# <fct> <int> <int> <int>
#1 Age 50 to 54 NA 1 NA
#2 Age 55 to 59 NA 1 NA
#3 Age 60 to 64 1 NA 1
#4 Age 65 to 69 NA 1 NA
这基本上可以归结为一个从宽到长的改型,因此围绕该主题进行了大量讨论(例如)
样本数据
df查看dplyr
动词groupby()
和summary()
。
library(tidyverse);
df %>% count(genhlth, X_ageg5yr) %>% spread(genhlth, n);
## A tibble: 4 x 4
# X_ageg5yr Fair Good `Very good`
# <fct> <int> <int> <int>
#1 Age 50 to 54 NA 1 NA
#2 Age 55 to 59 NA 1 NA
#3 Age 60 to 64 1 NA 1
#4 Age 65 to 69 NA 1 NA
df2 <- df %>%
count(genhlth, X_ageg5yr);
df2 <- as.data.frame.matrix(xtabs(n ~ X_ageg5yr + genhlth, data = df2));
# Fair Good Very good
#Age 50 to 54 0 1 0
#Age 55 to 59 0 1 0
#Age 60 to 64 1 0 1
#Age 65 to 69 0 1 0
df <- read.table(text =
"genhlth X_ageg5yr
Fair 'Age 60 to 64'
Good 'Age 50 to 54'
Good 'Age 55 to 59'
'Very good' 'Age 60 to 64'
Good 'Age 65 to 69'", header = T)