R geom_直方图数据的线性模型

R geom_直方图数据的线性模型,r,ggplot2,R,Ggplot2,我正在使用数据集,其中有连续变量x和分类变量y和z。大概是这样的: set.seed(222) df = data.frame(x = c(0, c(1:99) + rnorm(99, mean = 0, sd = 0.5), 100), y = rep(50, times = 101)-(seq(0, 50, by = 0.5))+rnorm(101, mean = 30, sd = 20), z = rnorm(101, mean = 50, sd

我正在使用数据集,其中有连续变量x和分类变量y和z。大概是这样的:

set.seed(222)
df = data.frame(x = c(0, c(1:99) + rnorm(99, mean = 0, sd = 0.5), 100),
           y = rep(50, times = 101)-(seq(0, 50, by = 0.5))+rnorm(101, mean = 30, sd = 20),
           z = rnorm(101, mean = 50, sd= 10))

df$positive.y = sapply(df$y,
                         function(x){
                           if (x >= 50){"Yes"} else {"No"}
                         })

df$positive.z = sapply(df$z,
                       function(x){
                         if (x >= 50){"Yes"} else {"No"}
                       })
df$bin = sapply(df$x,
                function(x){
                  if (x <= 10){1}
                  else if (x > 10 & <= 20) {20}
                  else if .......
                })
然后使用这个数据集,我可以创建直方图来查看变量x和正.y(z)之间是否存在相关性。对于10个箱子,很明显x与正.y相关,但与正.z不相关:

ggplot(df,
       aes(x = x, fill = positive.y))+
  geom_histogram(position = "fill", bins = 10)

ggplot(df,
       aes(x = x, fill = positive.z))+
  geom_histogram(position = "fill", bins = 10)

现在我想要两件事:

  • 提取实际数据点,将其提供给corr.test()函数或类似的函数

  • 将geom_smooth(method=“lm”)添加到我的绘图中

  • 我尝试将“bin”列添加到df中,如下所示:

    set.seed(222)
    df = data.frame(x = c(0, c(1:99) + rnorm(99, mean = 0, sd = 0.5), 100),
               y = rep(50, times = 101)-(seq(0, 50, by = 0.5))+rnorm(101, mean = 30, sd = 20),
               z = rnorm(101, mean = 50, sd= 10))
    
    df$positive.y = sapply(df$y,
                             function(x){
                               if (x >= 50){"Yes"} else {"No"}
                             })
    
    df$positive.z = sapply(df$z,
                           function(x){
                             if (x >= 50){"Yes"} else {"No"}
                           })
    
    df$bin = sapply(df$x,
                    function(x){
                      if (x <= 10){1}
                      else if (x > 10 & <= 20) {20}
                      else if .......
                    })
    
    df$bin=sapply(df$x,
    功能(x){
    
    如果(x 10&我看不到添加
    lm
    行的好理由。逻辑回归是合适的模型,不需要装箱:

    df$positive.y <- factor(df$positive.y)
    mod <- glm(positive.y ~ x, data = df, family = "binomial")
    summary(mod)
    anova(mod)
    
    library(ggplot2)
    ggplot(df,
           aes(x = x, fill = positive.y))+
      geom_histogram(position = "fill", bins = 10) +
      stat_function(fun = function(x) predict(mod, newdata = data.frame(x = x), 
                                        type = "response"), 
        size = 2)
    

    使用
    hist
    函数?我不会添加
    lm
    行。逻辑回归似乎是一个更好、更合适的模型。你不需要任何分类。