如何使用由2个以上变量定义的数据子集运行shapiro测试

如何使用由2个以上变量定义的数据子集运行shapiro测试,r,testing,subset,R,Testing,Subset,我有一个具有以下结构的数据框: str(Ehen) 'data.frame': 412 obs. of 5 variables: $ DATE : Date, format: "2012-09-11" "2012-09-19" ... $ Population: Factor w/ 9 levels "Brathay","Clun",..: 4 4 4 4 4 4 4 4 4 4 ... $ Fish : Factor w/ 3 levels "C","S","T

我有一个具有以下结构的数据框:

str(Ehen)
'data.frame':   412 obs. of  5 variables:
 $ DATE      : Date, format: "2012-09-11" "2012-09-19" ...
 $ Population: Factor w/ 9 levels "Brathay","Clun",..: 4 4 4 4 4 4 4 4 4 4 ...
 $ Fish      : Factor w/ 3 levels "C","S","T": 2 2 2 2 2 2 2 2 2 2 ...
 $ Length    : int  NA 70 70 80 70 60 70 60 60 70 ...
 $ Width     : int  NA 60 50 70 60 50 60 50 50 60 ...
我想测试每个种群的长度是否为正态分布,按日期和鱼对数据进行分组

我试过:

aggregate(Ehen$Length ~ Ehen$Fish + Ehen$DATE, FUN =shapiro.test) 


  Ehen$Fish  Ehen$DATE Ehen$Length
1          C 2012-09-19   0.7975819
2          S 2012-09-19   0.8164554
3          S 2012-09-25   0.7935195
4          S 2012-10-04   0.9006435
5          C 2012-10-09   0.8411583
6          S 2012-10-09    0.913051
7          S 2012-10-11   0.8525953
8          C 2012-10-18   0.9084524
9          S 2012-10-18   0.9415459
10         C 2012-10-24   0.9592422
11         S 2012-10-24   0.9774688
12         C 2012-11-02   0.9536037
13         S 2012-11-02   0.9607917
14         C 2012-11-12   0.9570341
15         S 2012-11-12   0.9728865
这或多或少就是我想要的,但是,我如何得到夏皮罗测试的p值而不是W值呢

我可以一个日期接一个日期:

shapiro.test(Ehen$Length[Ehen$DATE=="2012-10-24"])
data:  Ehen$Length[Ehen$DATE == "2012-10-24"]
W = 0.9761, p-value = 0.2868
但这是不够的。。。所以我试着:

lapply(split(Ehen$Length, Ehen$Fish, drop = TRUE),shapiro.test)

$C
        Shapiro-Wilk normality test

data:  X[[1L]]
W = 0.9219, p-value = 1.548e-07

$S
        Shapiro-Wilk normality test

data:  X[[2L]]
W = 0.9201, p-value = 2.056e-10
但是,我不知道如何将日期作为变量包含在测试数据子集中

我可能一直都错了,或者我可能已经接近答案了!!提前谢谢你

你可以试试

res <- aggregate(cbind(P.value=Length) ~ Fish + DATE, Ehen,
             FUN = function(x) shapiro.test(x)$p.value)

head(res,3)
#  Fish       DATE    P.value
#1    C 2012-09-19 0.25510132 #####
#2    S 2012-09-19 0.11941675
#3    C 2012-09-20 0.04459457

shapiro.test(Ehen$Length[Ehen$DATE=='2012-09-19' & Ehen$Fish=='C'])

#   Shapiro-Wilk normality test

#data:  Ehen$Length[Ehen$DATE == "2012-09-19" & Ehen$Fish == "C"]
# W = 0.9414, p-value = 0.2551 ######
res
set.seed(25)
Ehen <- data.frame(DATE= sample(seq(as.Date('2012-09-19'), length.out=10,
   by='1 day'), 412, replace=TRUE), Fish= sample(c("C", "S"), 412,
   replace=TRUE), Length=sample(c(NA,60:80), 412,replace=TRUE))