R geom_直方图数据的线性模型
我正在使用数据集,其中有连续变量x和分类变量y和z。大概是这样的:R geom_直方图数据的线性模型,r,ggplot2,R,Ggplot2,我正在使用数据集,其中有连续变量x和分类变量y和z。大概是这样的: set.seed(222) df = data.frame(x = c(0, c(1:99) + rnorm(99, mean = 0, sd = 0.5), 100), y = rep(50, times = 101)-(seq(0, 50, by = 0.5))+rnorm(101, mean = 30, sd = 20), z = rnorm(101, mean = 50, sd
set.seed(222)
df = data.frame(x = c(0, c(1:99) + rnorm(99, mean = 0, sd = 0.5), 100),
y = rep(50, times = 101)-(seq(0, 50, by = 0.5))+rnorm(101, mean = 30, sd = 20),
z = rnorm(101, mean = 50, sd= 10))
df$positive.y = sapply(df$y,
function(x){
if (x >= 50){"Yes"} else {"No"}
})
df$positive.z = sapply(df$z,
function(x){
if (x >= 50){"Yes"} else {"No"}
})
df$bin = sapply(df$x,
function(x){
if (x <= 10){1}
else if (x > 10 & <= 20) {20}
else if .......
})
然后使用这个数据集,我可以创建直方图来查看变量x和正.y(z)之间是否存在相关性。对于10个箱子,很明显x与正.y相关,但与正.z不相关:
ggplot(df,
aes(x = x, fill = positive.y))+
geom_histogram(position = "fill", bins = 10)
ggplot(df,
aes(x = x, fill = positive.z))+
geom_histogram(position = "fill", bins = 10)
现在我想要两件事:
set.seed(222)
df = data.frame(x = c(0, c(1:99) + rnorm(99, mean = 0, sd = 0.5), 100),
y = rep(50, times = 101)-(seq(0, 50, by = 0.5))+rnorm(101, mean = 30, sd = 20),
z = rnorm(101, mean = 50, sd= 10))
df$positive.y = sapply(df$y,
function(x){
if (x >= 50){"Yes"} else {"No"}
})
df$positive.z = sapply(df$z,
function(x){
if (x >= 50){"Yes"} else {"No"}
})
df$bin = sapply(df$x,
function(x){
if (x <= 10){1}
else if (x > 10 & <= 20) {20}
else if .......
})
df$bin=sapply(df$x,
功能(x){
如果(x 10&我看不到添加lm
行的好理由。逻辑回归是合适的模型,不需要装箱:
df$positive.y <- factor(df$positive.y)
mod <- glm(positive.y ~ x, data = df, family = "binomial")
summary(mod)
anova(mod)
library(ggplot2)
ggplot(df,
aes(x = x, fill = positive.y))+
geom_histogram(position = "fill", bins = 10) +
stat_function(fun = function(x) predict(mod, newdata = data.frame(x = x),
type = "response"),
size = 2)
使用hist
函数?我不会添加lm
行。逻辑回归似乎是一个更好、更合适的模型。你不需要任何分类。