如何使用由2个以上变量定义的数据子集运行shapiro测试
我有一个具有以下结构的数据框:如何使用由2个以上变量定义的数据子集运行shapiro测试,r,testing,subset,R,Testing,Subset,我有一个具有以下结构的数据框: str(Ehen) 'data.frame': 412 obs. of 5 variables: $ DATE : Date, format: "2012-09-11" "2012-09-19" ... $ Population: Factor w/ 9 levels "Brathay","Clun",..: 4 4 4 4 4 4 4 4 4 4 ... $ Fish : Factor w/ 3 levels "C","S","T
str(Ehen)
'data.frame': 412 obs. of 5 variables:
$ DATE : Date, format: "2012-09-11" "2012-09-19" ...
$ Population: Factor w/ 9 levels "Brathay","Clun",..: 4 4 4 4 4 4 4 4 4 4 ...
$ Fish : Factor w/ 3 levels "C","S","T": 2 2 2 2 2 2 2 2 2 2 ...
$ Length : int NA 70 70 80 70 60 70 60 60 70 ...
$ Width : int NA 60 50 70 60 50 60 50 50 60 ...
我想测试每个种群的长度是否为正态分布,按日期和鱼对数据进行分组
我试过:
aggregate(Ehen$Length ~ Ehen$Fish + Ehen$DATE, FUN =shapiro.test)
Ehen$Fish Ehen$DATE Ehen$Length
1 C 2012-09-19 0.7975819
2 S 2012-09-19 0.8164554
3 S 2012-09-25 0.7935195
4 S 2012-10-04 0.9006435
5 C 2012-10-09 0.8411583
6 S 2012-10-09 0.913051
7 S 2012-10-11 0.8525953
8 C 2012-10-18 0.9084524
9 S 2012-10-18 0.9415459
10 C 2012-10-24 0.9592422
11 S 2012-10-24 0.9774688
12 C 2012-11-02 0.9536037
13 S 2012-11-02 0.9607917
14 C 2012-11-12 0.9570341
15 S 2012-11-12 0.9728865
这或多或少就是我想要的,但是,我如何得到夏皮罗测试的p值而不是W值呢
我可以一个日期接一个日期:
shapiro.test(Ehen$Length[Ehen$DATE=="2012-10-24"])
data: Ehen$Length[Ehen$DATE == "2012-10-24"]
W = 0.9761, p-value = 0.2868
但这是不够的。。。所以我试着:
lapply(split(Ehen$Length, Ehen$Fish, drop = TRUE),shapiro.test)
$C
Shapiro-Wilk normality test
data: X[[1L]]
W = 0.9219, p-value = 1.548e-07
$S
Shapiro-Wilk normality test
data: X[[2L]]
W = 0.9201, p-value = 2.056e-10
但是,我不知道如何将日期作为变量包含在测试数据子集中
我可能一直都错了,或者我可能已经接近答案了!!提前谢谢你你可以试试
res <- aggregate(cbind(P.value=Length) ~ Fish + DATE, Ehen,
FUN = function(x) shapiro.test(x)$p.value)
head(res,3)
# Fish DATE P.value
#1 C 2012-09-19 0.25510132 #####
#2 S 2012-09-19 0.11941675
#3 C 2012-09-20 0.04459457
shapiro.test(Ehen$Length[Ehen$DATE=='2012-09-19' & Ehen$Fish=='C'])
# Shapiro-Wilk normality test
#data: Ehen$Length[Ehen$DATE == "2012-09-19" & Ehen$Fish == "C"]
# W = 0.9414, p-value = 0.2551 ######
res
set.seed(25)
Ehen <- data.frame(DATE= sample(seq(as.Date('2012-09-19'), length.out=10,
by='1 day'), 412, replace=TRUE), Fish= sample(c("C", "S"), 412,
replace=TRUE), Length=sample(c(NA,60:80), 412,replace=TRUE))