R 如何使数据集变量的柱状图与它重叠';s正态分布更简单更奇特?
好的,我正在使用R中内置的经典虹膜数据集。 我使用的是可变萼片。宽度在鸢尾属“刚毛”亚种中。我绘制了概率直方图,它看起来可以遵循正态分布(钟形曲线),所以我收集了萼片宽度的平均值和萼片宽度的标准偏差,这就是绘制曲线所需的全部内容 正态/高斯分布公式: f(x)=1/[σ*√(2*π)]e^[-(x-μ)^2/(2*σ^2)] 式中,e=欧拉数,π=π,σ=标准偏差,μ=平均值 这是我的代码:R 如何使数据集变量的柱状图与它重叠';s正态分布更简单更奇特?,r,R,好的,我正在使用R中内置的经典虹膜数据集。 我使用的是可变萼片。宽度在鸢尾属“刚毛”亚种中。我绘制了概率直方图,它看起来可以遵循正态分布(钟形曲线),所以我收集了萼片宽度的平均值和萼片宽度的标准偏差,这就是绘制曲线所需的全部内容 正态/高斯分布公式: f(x)=1/[σ*√(2*π)]e^[-(x-μ)^2/(2*σ^2)] 式中,e=欧拉数,π=π,σ=标准偏差,μ=平均值 这是我的代码: > hist(iris$Sepal.Width[iris$Species == "setosa"]
> hist(iris$Sepal.Width[iris$Species == "setosa"], probability = TRUE,
main = "Histogram of Sepal Width for Setosa Species Overlayed with Normal Distribution",
xlab = "Sepal Width (cm)", ylab = "Probability", cex.main = 0.9)
> sepal_width_setosa <- iris$Sepal.Width[iris$Species == "setosa"]
> sepal_width_setosa_mean <- mean(sepal_width_setosa)
> sepal_width_setosa_mean
[1] 3.428
> sepal_width_setosa_sd <- sd(sepal_width_setosa)
> sepal_width_setosa_sd
[1] 0.3790644
> sepal_width_setosa_variance = var(sepal_width_setosa)
> range(sepal_width_setosa)
[1] 2.3 4.4
> sepal_width_setosa_gaussian_distribution <- function(x){
1 / (sepal_width_setosa_sd*sqrt(2*pi))*exp(-(x - sepal_width_setosa_mean)^2 / (2 * sepal_width_setosa_variance))
}
> curve(sepal_width_setosa_gaussian_distribution, from = 2.0, to = 4.5,
col = "darkblue", lwd = 2, add = TRUE)
和正态性的统计检验
> library(nortest)
> ad.test(sepal_width_setosa)
安德森-达林正态性检验
数据:萼片\宽度\刚毛A=0.49096,p值=0.2102 哇!p值大于0.05,所以我猜变量不是正态分布的
问题:有没有一种方法可以把它放在一个循环中,得到所有鸢尾属物种的萼片宽度的直方图及其叠加的相关正态分布,然后并排显示图?如何将标准偏差标记1添加到3 为了回答您关于for循环和直方图的问题,我编写了一些我通常会做的示例代码
species_list <- split(iris, iris$Species) # split data.frame into lists by Species
par(mfrow = c(1,length(species_list))) # set the grid of the plot to be 1 row, 3 columns
for(i in 1:length(species_list)){ # using length(species_list) for the # of species
# subsetting lists with double brackets `[[`
hist(species_list[[i]]$Sepal.Width, probability = T,
main = "", xlab = "Sepal Width (cm)", ylab = "Probability", # `main` is empty
col = c("cyan","green3","yellow")[i]) # picks a new color each time (here out of 3)
# my favorite function for plot titles
mtext(toupper(names(species_list)[i]), # upper case species name for title
side = 3, # 1 is bottom, 2 is left, 3 is top, 4 is right
line = 0, # do not shift up or down
font = 2) # 2 is bold
# Simpler than building from scratch use the built in function:
curve(dnorm(x, mean = mean(species_list[[i]]$Sepal.Width),
sd = sd(species_list[[i]]$Sepal.Width)),
yaxt = "n", add = TRUE, col = "darkblue", lwd = 2)
}
# At the end post a new title
mtext("Histograms of Sepal Width by Species Overlayed with a Normal Distribution",
outer = TRUE, # this puts it outside of the 3 plots
side = 3, line = -1, # shift it down so we can see it
font = 2)
par(mfrow = c(1,1)) # set the plot parameters back to a single plot when finished
species\u list为了回答你关于for循环和直方图的问题,我写了一些我通常会做的示例代码
species_list <- split(iris, iris$Species) # split data.frame into lists by Species
par(mfrow = c(1,length(species_list))) # set the grid of the plot to be 1 row, 3 columns
for(i in 1:length(species_list)){ # using length(species_list) for the # of species
# subsetting lists with double brackets `[[`
hist(species_list[[i]]$Sepal.Width, probability = T,
main = "", xlab = "Sepal Width (cm)", ylab = "Probability", # `main` is empty
col = c("cyan","green3","yellow")[i]) # picks a new color each time (here out of 3)
# my favorite function for plot titles
mtext(toupper(names(species_list)[i]), # upper case species name for title
side = 3, # 1 is bottom, 2 is left, 3 is top, 4 is right
line = 0, # do not shift up or down
font = 2) # 2 is bold
# Simpler than building from scratch use the built in function:
curve(dnorm(x, mean = mean(species_list[[i]]$Sepal.Width),
sd = sd(species_list[[i]]$Sepal.Width)),
yaxt = "n", add = TRUE, col = "darkblue", lwd = 2)
}
# At the end post a new title
mtext("Histograms of Sepal Width by Species Overlayed with a Normal Distribution",
outer = TRUE, # this puts it outside of the 3 plots
side = 3, line = -1, # shift it down so we can see it
font = 2)
par(mfrow = c(1,1)) # set the plot parameters back to a single plot when finished
物种列表确定有一种方法。。。到目前为止你试过什么?您是否检查了绘图
功能?您以前是否对
-循环使用过。您可以使用par
并排放置图形。或者使用pdf
打印多页,等等。是的,我以前使用过循环,但在R中没有做过很多。我不太熟悉“par”的用法函数,谢谢你提到它。你似乎误解了p值。我对p值的理解有什么错?正态性AD检验的无效假设是:数据是正态分布的。替代假设:数据不是正态分布的。当p值为.21时,我们无法拒绝无效假设,并得出数据是正态分布的结论。。。到目前为止你试过什么?您是否检查了绘图
功能?您以前是否对
-循环使用过。您可以使用par
并排放置图形。或者使用pdf
打印多页,等等。是的,我以前使用过循环,但在R中没有做过很多。我不太熟悉“par”的用法函数,谢谢你提到它。你似乎误解了p值。我对p值的理解有什么错?正态性AD检验的无效假设是:数据是正态分布的。替代假设:数据不是正态分布的。当p值为.21时,我们无法拒绝无效假设,并得出数据为正态分布的结论。