堆叠条形图将ggplot2 R中不相关变量的变量转换为基于存在与否的百分比
下面是一个示例数据帧堆叠条形图将ggplot2 R中不相关变量的变量转换为基于存在与否的百分比,r,plot,ggplot2,bar-chart,R,Plot,Ggplot2,Bar Chart,下面是一个示例数据帧 df <- data.frame(SampleID = c(1, 2, 3, 4, 5, 6, 7, 8), Var1 = c(0.1 , 0.5, 0.7, 0, 0, 0, 0.5, 0.2), Var1PA = c("Present", "Present", "Present", "Absent", "Absent", "Absent", "Present", "Pr
df <- data.frame(SampleID = c(1, 2, 3, 4, 5, 6, 7, 8),
Var1 = c(0.1 , 0.5, 0.7, 0, 0, 0, 0.5, 0.2),
Var1PA = c("Present", "Present", "Present", "Absent", "Absent", "Absent", "Present", "Present"),
Var2 = c(0, 0, 0, 0, 0.1, 0.5, 0.7, 0.2),
Var2PA = c("Absent", "Absent", "Absent", "Absent", "Present", "Present", "Present", "Present"))
df <- data.frame(SampleID = c(1, 2, 3, 4, 5, 6, 7, 8),
Var1 = c(0.1 , 0.5, 0.7, 0, 0, 0, 0.5, 0.2),
Var1PA = c("Present", "Present", "Present", "Absent", "Absent", "Absent", "Present", "Present"),
Var2 = c(0, 0, 0, 0, 0.1, 0.5, 0.7, 0.2),
Var2PA = c("Absent", "Absent", "Absent", "Absent", "Present", "Present", "Present", "Present"),
Disease = c("Case", "Control", "Case", "Control", "Case", "Control", "Case", "Control"))
但在绘图时如何将其转换为百分比?
我查看了许多熔体选项,但对于这些变量,没有统一的标准来创建公共X轴
最后,如果我想从1000个这样的列变量组成的数据框架中绘制5个变量,那么如何回答上述问题
编辑:谢谢你迄今为止的答案!我对这个问题略作修改
我刚刚在数据框中添加了一个变量
df <- data.frame(SampleID = c(1, 2, 3, 4, 5, 6, 7, 8),
Var1 = c(0.1 , 0.5, 0.7, 0, 0, 0, 0.5, 0.2),
Var1PA = c("Present", "Present", "Present", "Absent", "Absent", "Absent", "Present", "Present"),
Var2 = c(0, 0, 0, 0, 0.1, 0.5, 0.7, 0.2),
Var2PA = c("Absent", "Absent", "Absent", "Absent", "Present", "Present", "Present", "Present"))
df <- data.frame(SampleID = c(1, 2, 3, 4, 5, 6, 7, 8),
Var1 = c(0.1 , 0.5, 0.7, 0, 0, 0, 0.5, 0.2),
Var1PA = c("Present", "Present", "Present", "Absent", "Absent", "Absent", "Present", "Present"),
Var2 = c(0, 0, 0, 0, 0.1, 0.5, 0.7, 0.2),
Var2PA = c("Absent", "Absent", "Absent", "Absent", "Present", "Present", "Present", "Present"),
Disease = c("Case", "Control", "Case", "Control", "Case", "Control", "Case", "Control"))
df这应该可以很好地概括。当然,你可以对你选择的变量更具选择性
library(dplyr)
library(tidyr)
mdf = df %>% select(SampleID, ends_with("PA")) %>%
gather(key = Var, value = PA, -SampleID) %>%
mutate(PA = factor(PA, levels = c("Present", "Absent")))
ggplot(mdf, aes(x = Var, fill = PA)) +
geom_bar(position = "fill") +
scale_y_continuous(labels = scales::percent)
您可以将百分比列添加到长数据框中:
mdf %>% group_by(Var) %>%
mutate(p_present = mean(PA == "Present"),
p_absent = mean(PA == "Absent"))
# Source: local data frame [16 x 5]
# Groups: Var [2]
#
# SampleID Var PA p_present p_absent
# <dbl> <chr> <fctr> <dbl> <dbl>
# 1 1 Var1PA Present 0.625 0.375
# 2 2 Var1PA Present 0.625 0.375
# 3 3 Var1PA Present 0.625 0.375
# 4 4 Var1PA Absent 0.625 0.375
# 5 5 Var1PA Absent 0.625 0.375
# 6 6 Var1PA Absent 0.625 0.375
# 7 7 Var1PA Present 0.625 0.375
# 8 8 Var1PA Present 0.625 0.375
# 9 1 Var2PA Absent 0.500 0.500
# 10 2 Var2PA Absent 0.500 0.500
应以百分比的形式列出出席/缺席情况,谢谢!抱歉,不能测试vars一种温和的黑客方式:library(tidyverse);df%%>%gather(var,pa,以('pa')结尾)%%>%groupby(var)%%>%do(pa=names(table(.pa)),pct=prop.table(table(.pa))*100)%%>%unnest()%%>%ggplot(aes(var,pct,fill=pa))+geom_-bar(stat='identity')
@rawr很抱歉回复太晚了,谢谢你的回答,这对我帮助很大!如果我将另一个可变疾病添加到df中,使用prop.table是否可以轻松地分别获取每个变量中的病例(存在和不存在)和对照组(存在和不存在)的百分比?嗨,非常感谢您的回答,非常抱歉回复太晚。生活接管了我尝试了你的解决方案,但是我有点难以理解,因为聚集和变异都是我以前从未使用过的新函数。另外,我想在我绘制的最后一个表格(mdf)中实际查看频率计数,以供您回答。我已经编辑了一点问题,请让我知道,如果你有任何相同的建议。我对你的答案投了赞成票,谢谢!mutate
所做的就是创建新列,gather
或多或少地等同于melt
。(gather
的功能稍少,但语法更简单。)我将添加几行,将百分比放入数据中。我在这里发布了一个新问题:。谢谢你的时间!