如何对R中的截尾变量应用Kruskal-Wallis检验_R_Statistics

如何对R中的截尾变量应用Kruskal-Wallis检验

r statistics

如何对R中的截尾变量应用Kruskal-Wallis检验,r,statistics,R,Statistics,我有一套不同品牌瓶装水中铬的测量数据。我想应用Kruskal-Wallis H检验来确定铬品牌之间是否存在统计上的显著差异，但测量数据中存在许多截尾值有没有办法对这个截尾变量应用Kruskal-Wallis H检验。我们的数据集df粘贴在下面： df <- structure(list( Brand = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L

我有一套不同品牌瓶装水中铬的测量数据。我想应用Kruskal-Wallis H检验来确定铬品牌之间是否存在统计上的显著差异，但测量数据中存在许多截尾值

有没有办法对这个截尾变量应用Kruskal-Wallis H检验。我们的数据集

df

粘贴在下面：

df <- structure(list(
       Brand = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
           1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
           2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
           3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
           4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
           5L, 5L, 5L, 5L), .Label = c("B1", "B2", "B3", "B4", "B5"), class = "factor"), 
       Chromium = c(0.4, 0.4, 0.4, 0.9, 0.4, 1.3, 1.3, 0.4, 2.6, 
           0.4, 0.6, 0.6, 0.4, 2.1, 0.4, 0.4, 0.4, 0.4, 0.6, 0.4, 1.3, 
           1.3, 0.4, 2.6, 0.4, 0.7, 0.7, 0.4, 1.7, 0.4, 0.6, 0.4, 0.4,            
           0.4, 0.4, 1.3, 1.3, 0.4, 2.6, 0.4, 1.1, 1, 0.4, 1.5, 0.4, 
           0.7, 0.4, 0.4, 1, 0.4, 1.3, 1.3, 0.4, 2.6, 0.4, 1, 1.1, 0.4, 
           2.2, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 1.3, 1.3, 0.4, 2.6, 0.4, 
           0.6, 0.7, 0.4, 1.8, 0.4)), .Names = c("Brand", "Chromium"), 
           class = "data.frame", row.names = c(NA, -75L))

head(df)
#   Brand Chromium
# 1    B1      0.4
# 2    B1      0.4
# 3    B1      0.4
# 4    B1      0.9
# 5    B1      0.4
# 6    B1      1.3

df<0.4
是Chromium
可以接收数据的最小值。Kruskal-Wallis测试依赖于值的等级，而不是实际值。这意味着您可以简单地将<0.4
替换为0.39
，因为它们将具有与以前相同的等级。实际上，您可以用任何小于0.4的值替换<0.4

在代码中，这将是：
df$Chromium[df$Chromium == "< 0.4"] <- 0.4

虽然我同意第一个答案（代替< P>），你可以考虑“CeDIFF”函数。根据文献，它相当于Gehan Wilcoxon测试的广义Peto & Peto修正（广义Wilcoxon检验）。。这是一个分数测试，旨在使用生存分析处理在多个报告限制下审查的数据。Dennis R.Helsel的书《使用Minitab和R的审查环境数据统计》第9.4节有更广泛的描述你想忽略删失变量吗？不，艾利，我想考虑删失的值。我看你的观点是降低功耗。然而，是否不再需要“人为地审查”数据，并且丢失更多的信息而不是已经丢失了？
kruskal.test(Chromium ~ Brand, 
         data = df)
# Kruskal-Wallis rank sum test

# data:  Chromium by Brand
# Kruskal-Wallis chi-squared = 0.51334, df = 4, p-value = 0.9722

xm<-rbind(c(8,8,4),c(7,7,1))
dimnames(xm)<-list(scores=c("Low","High"),brand=c("B1","B2","B3"))
print(xm)
(xmcs<-chisq.test(xm,simulate.p.value = T))

> print(xm)
      brand
scores B1 B2 B3
  Low   8  8  4
  High  7  7  1
> (xmcs<-chisq.test(xm,simulate.p.value = T))

    Pearson's Chi-squared test with simulated p-value (based on 2000 replicates)

data:  xm
X-squared = 1.2444, df = NA, p-value = 0.7216