Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/64.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
堆叠条形图将ggplot2 R中不相关变量的变量转换为基于存在与否的百分比_R_Plot_Ggplot2_Bar Chart - Fatal编程技术网

堆叠条形图将ggplot2 R中不相关变量的变量转换为基于存在与否的百分比

堆叠条形图将ggplot2 R中不相关变量的变量转换为基于存在与否的百分比,r,plot,ggplot2,bar-chart,R,Plot,Ggplot2,Bar Chart,下面是一个示例数据帧 df <- data.frame(SampleID = c(1, 2, 3, 4, 5, 6, 7, 8), Var1 = c(0.1 , 0.5, 0.7, 0, 0, 0, 0.5, 0.2), Var1PA = c("Present", "Present", "Present", "Absent", "Absent", "Absent", "Present", "Pr

下面是一个示例数据帧

df <- data.frame(SampleID = c(1, 2, 3, 4, 5, 6, 7, 8),
                 Var1 = c(0.1 , 0.5,    0.7,    0,  0,  0,  0.5,    0.2), 
                 Var1PA = c("Present", "Present", "Present", "Absent", "Absent", "Absent",  "Present", "Present"), 
                 Var2 = c(0, 0, 0, 0, 0.1, 0.5, 0.7, 0.2), 
                 Var2PA = c("Absent", "Absent", "Absent", "Absent", "Present", "Present", "Present", "Present"))
df <- data.frame(SampleID = c(1, 2, 3, 4, 5, 6, 7, 8),
             Var1 = c(0.1 , 0.5,    0.7,    0,  0,  0,  0.5,    0.2), 
             Var1PA = c("Present", "Present", "Present", "Absent", "Absent", "Absent",  "Present", "Present"), 
             Var2 = c(0, 0, 0, 0, 0.1, 0.5, 0.7, 0.2), 
             Var2PA = c("Absent", "Absent", "Absent", "Absent", "Present", "Present", "Present", "Present"),
             Disease = c("Case", "Control", "Case", "Control", "Case", "Control", "Case", "Control"))
但在绘图时如何将其转换为百分比? 我查看了许多熔体选项,但对于这些变量,没有统一的标准来创建公共X轴

最后,如果我想从1000个这样的列变量组成的数据框架中绘制5个变量,那么如何回答上述问题

编辑:谢谢你迄今为止的答案!我对这个问题略作修改 我刚刚在数据框中添加了一个变量

df <- data.frame(SampleID = c(1, 2, 3, 4, 5, 6, 7, 8),
                 Var1 = c(0.1 , 0.5,    0.7,    0,  0,  0,  0.5,    0.2), 
                 Var1PA = c("Present", "Present", "Present", "Absent", "Absent", "Absent",  "Present", "Present"), 
                 Var2 = c(0, 0, 0, 0, 0.1, 0.5, 0.7, 0.2), 
                 Var2PA = c("Absent", "Absent", "Absent", "Absent", "Present", "Present", "Present", "Present"))
df <- data.frame(SampleID = c(1, 2, 3, 4, 5, 6, 7, 8),
             Var1 = c(0.1 , 0.5,    0.7,    0,  0,  0,  0.5,    0.2), 
             Var1PA = c("Present", "Present", "Present", "Absent", "Absent", "Absent",  "Present", "Present"), 
             Var2 = c(0, 0, 0, 0, 0.1, 0.5, 0.7, 0.2), 
             Var2PA = c("Absent", "Absent", "Absent", "Absent", "Present", "Present", "Present", "Present"),
             Disease = c("Case", "Control", "Case", "Control", "Case", "Control", "Case", "Control"))

df这应该可以很好地概括。当然,你可以对你选择的变量更具选择性

library(dplyr)
library(tidyr)
mdf = df %>% select(SampleID, ends_with("PA")) %>%
    gather(key = Var, value = PA, -SampleID) %>%
    mutate(PA = factor(PA, levels = c("Present", "Absent")))

ggplot(mdf, aes(x = Var, fill = PA)) +
    geom_bar(position = "fill") +
    scale_y_continuous(labels = scales::percent)

您可以将百分比列添加到长数据框中:

mdf %>% group_by(Var) %>%
    mutate(p_present = mean(PA == "Present"),
           p_absent = mean(PA == "Absent"))
# Source: local data frame [16 x 5]
# Groups: Var [2]
# 
#    SampleID    Var      PA p_present p_absent
#       <dbl>  <chr>  <fctr>     <dbl>    <dbl>
# 1         1 Var1PA Present     0.625    0.375
# 2         2 Var1PA Present     0.625    0.375
# 3         3 Var1PA Present     0.625    0.375
# 4         4 Var1PA  Absent     0.625    0.375
# 5         5 Var1PA  Absent     0.625    0.375
# 6         6 Var1PA  Absent     0.625    0.375
# 7         7 Var1PA Present     0.625    0.375
# 8         8 Var1PA Present     0.625    0.375
# 9         1 Var2PA  Absent     0.500    0.500
# 10        2 Var2PA  Absent     0.500    0.500

应以百分比的形式列出出席/缺席情况,谢谢!抱歉,不能测试
vars一种温和的黑客方式:
library(tidyverse);df%%>%gather(var,pa,以('pa')结尾)%%>%groupby(var)%%>%do(pa=names(table(.pa)),pct=prop.table(table(.pa))*100)%%>%unnest()%%>%ggplot(aes(var,pct,fill=pa))+geom_-bar(stat='identity')
@rawr很抱歉回复太晚了,谢谢你的回答,这对我帮助很大!如果我将另一个可变疾病添加到df中,使用prop.table是否可以轻松地分别获取每个变量中的病例(存在和不存在)和对照组(存在和不存在)的百分比?嗨,非常感谢您的回答,非常抱歉回复太晚。生活接管了我尝试了你的解决方案,但是我有点难以理解,因为聚集和变异都是我以前从未使用过的新函数。另外,我想在我绘制的最后一个表格(mdf)中实际查看频率计数,以供您回答。我已经编辑了一点问题,请让我知道,如果你有任何相同的建议。我对你的答案投了赞成票,谢谢!
mutate
所做的就是创建新列,
gather
或多或少地等同于
melt
。(
gather
的功能稍少,但语法更简单。)我将添加几行,将百分比放入数据中。我在这里发布了一个新问题:。谢谢你的时间!