Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/react-native/7.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何按组在R中的数据帧中执行KS.test_R_Kolmogorov Smirnov - Fatal编程技术网

如何按组在R中的数据帧中执行KS.test

如何按组在R中的数据帧中执行KS.test,r,kolmogorov-smirnov,R,Kolmogorov Smirnov,我想在R中对下面的示例数据帧“数据”执行两个示例Kolmogorov-Smirnov(KS)测试: 要在两个单独的列之间执行KS测试,代码如下: > ks.test(data$Protein1, data$Protein2, data=data) Two-sample Kolmogorov-Smirnov test data: data$Protein1 and data$Protein2 D = 0.42308, p-value = 0.01905 alternative

我想在R中对下面的示例数据帧“数据”执行两个示例Kolmogorov-Smirnov(KS)测试:

要在两个单独的列之间执行KS测试,代码如下:

> ks.test(data$Protein1, data$Protein2, data=data)

    Two-sample Kolmogorov-Smirnov test

data:  data$Protein1 and data$Protein2
D = 0.42308, p-value = 0.01905
alternative hypothesis: two-sided

Warning message:
In ks.test(data$Protein1, data$Protein2, data = data) :
  cannot compute exact p-value with ties
>ks.test(数据$Protein1,数据$Protein2,数据=数据)
两样本Kolmogorov-Smirnov检验
数据:数据$Protein1和数据$Protein2
D=0.42308,p值=0.01905
替代假设:双边
警告信息:
在ks.测试中(数据$Protein1,数据$Protein2,数据=数据):
无法使用关系计算精确的p值
但是,我希望每个列和每个组都这样做。例如,对t.test或wilcox.test执行此操作很容易,因为您可以将其编码为 t、 test(y1,y2)或t.test(y~x)#其中y是数字,x是二进制因子 但是当涉及到二进制因子时,没有用于ks.test的代码。 有人能帮忙吗

最后,我想对所有蛋白质的整个数据帧进行此操作,正如我可以成功地对t测试所做的那样,但我想对ks.test进行如下操作:

t_tests <- lapply(
  data[, -1], # apply the function to every variable *other than* the first one (group)
  function(x) { t.test(x ~ HealthGroups, data = data) }
)

t_测试这里有一个非常简单的方法。这使用了一个循环,在R圈中通常是不赞成的。然而,它非常简单且不言自明,这对新用户来说是一个加号,在这种情况下,循环速度太慢没有问题。(请注意,如果愿意,您可以使用
lappy()
,但这仍然是一个循环,只是外部看起来不同。)

只需使用相同的变量创建两个新的子集数据帧。然后在数据帧上循环调用
ks.test
。输出对用户不是很友好,它只会说
j
——因此我添加了一个调用来打印被测试变量的名称

# I am assuming the original data frame is called d
dc <- d[d$Group=="Control",]
dp <- d[d$Group=="Patient",]
for(j in 1:8){  
  writeLines(names(dc)[j])
  print(ks.test(dc[,j], dp[,j]))  
}
# Protein1
# 
#   Two-sample Kolmogorov-Smirnov test
# 
# data:  dc[, j] and dp[, j]
# D = 0.30769, p-value = 0.5882
# alternative hypothesis: two-sided
# 
# Protein2
# 
#   Two-sample Kolmogorov-Smirnov test
# 
# data:  dc[, j] and dp[, j]
# D = 0.23077, p-value = 0.8793
# alternative hypothesis: two-sided
# 
# Protein3
# 
#   Two-sample Kolmogorov-Smirnov test
# 
# data:  dc[, j] and dp[, j]
# D = 0.23077, p-value = 0.8793
# alternative hypothesis: two-sided
# 
# Protein4
# 
#   Two-sample Kolmogorov-Smirnov test
# 
# data:  dc[, j] and dp[, j]
# D = 0.46154, p-value = 0.1254
# alternative hypothesis: two-sided
# 
# Protein5
# 
#   Two-sample Kolmogorov-Smirnov test
# 
# data:  dc[, j] and dp[, j]
# D = 0.23077, p-value = 0.8793
# alternative hypothesis: two-sided
# 
# Protein6
# 
#   Two-sample Kolmogorov-Smirnov test
# 
# data:  dc[, j] and dp[, j]
# D = 0.15385, p-value = 0.9992
# alternative hypothesis: two-sided
# 
# Protein7
# 
#   Two-sample Kolmogorov-Smirnov test
# 
# data:  dc[, j] and dp[, j]
# D = 0.38462, p-value = 0.2999
# alternative hypothesis: two-sided
# 
# Protein8
# 
#   Two-sample Kolmogorov-Smirnov test
# 
# data:  dc[, j] and dp[, j]
# D = 0.46154, p-value = 0.1265
# alternative hypothesis: two-sided
# 
# Warning messages:
# 1: In ks.test(dc[, j], dp[, j]) : cannot compute exact p-value with ties
# 2: In ks.test(dc[, j], dp[, j]) : cannot compute exact p-value with ties
# 3: In ks.test(dc[, j], dp[, j]) : cannot compute exact p-value with ties
# 4: In ks.test(dc[, j], dp[, j]) : cannot compute exact p-value with ties
#我假设原始数据帧被称为d

非常感谢你。这真的很有帮助。我只是想问你能不能帮我把它转换成矩阵。使用我上面提供的t-test代码,它生成了一个列表,我可以取消列表并提供colname和rowname。在本例中,结果以注释行的形式出现,而不是列表。那么是否可以让我的最终结果在行中显示为:Protein 1、2、3等,并且colname应该包含相应的列:D和p-value。@Letin,请查看文档:在值部分,它告诉您函数输出的内容。如果您将输出分配给一个变量,而不是打印它,您可以提取统计信息(D)和p值,并将它们分配给列表、矩阵或数据框等(您可能最喜欢后者,因为您可以有一个带有蛋白质名称的变量。谢谢@gung提供的此信息。我将尝试此方法。
# I am assuming the original data frame is called d
dc <- d[d$Group=="Control",]
dp <- d[d$Group=="Patient",]
for(j in 1:8){  
  writeLines(names(dc)[j])
  print(ks.test(dc[,j], dp[,j]))  
}
# Protein1
# 
#   Two-sample Kolmogorov-Smirnov test
# 
# data:  dc[, j] and dp[, j]
# D = 0.30769, p-value = 0.5882
# alternative hypothesis: two-sided
# 
# Protein2
# 
#   Two-sample Kolmogorov-Smirnov test
# 
# data:  dc[, j] and dp[, j]
# D = 0.23077, p-value = 0.8793
# alternative hypothesis: two-sided
# 
# Protein3
# 
#   Two-sample Kolmogorov-Smirnov test
# 
# data:  dc[, j] and dp[, j]
# D = 0.23077, p-value = 0.8793
# alternative hypothesis: two-sided
# 
# Protein4
# 
#   Two-sample Kolmogorov-Smirnov test
# 
# data:  dc[, j] and dp[, j]
# D = 0.46154, p-value = 0.1254
# alternative hypothesis: two-sided
# 
# Protein5
# 
#   Two-sample Kolmogorov-Smirnov test
# 
# data:  dc[, j] and dp[, j]
# D = 0.23077, p-value = 0.8793
# alternative hypothesis: two-sided
# 
# Protein6
# 
#   Two-sample Kolmogorov-Smirnov test
# 
# data:  dc[, j] and dp[, j]
# D = 0.15385, p-value = 0.9992
# alternative hypothesis: two-sided
# 
# Protein7
# 
#   Two-sample Kolmogorov-Smirnov test
# 
# data:  dc[, j] and dp[, j]
# D = 0.38462, p-value = 0.2999
# alternative hypothesis: two-sided
# 
# Protein8
# 
#   Two-sample Kolmogorov-Smirnov test
# 
# data:  dc[, j] and dp[, j]
# D = 0.46154, p-value = 0.1265
# alternative hypothesis: two-sided
# 
# Warning messages:
# 1: In ks.test(dc[, j], dp[, j]) : cannot compute exact p-value with ties
# 2: In ks.test(dc[, j], dp[, j]) : cannot compute exact p-value with ties
# 3: In ks.test(dc[, j], dp[, j]) : cannot compute exact p-value with ties
# 4: In ks.test(dc[, j], dp[, j]) : cannot compute exact p-value with ties
dk <- data.frame(Protein=character(8), D=numeric(8), p=numeric(8), stringsAsFactors=F)
for(j in 1:8){  
  k <- ks.test(dc[,j], dp[,j])
  dk$Protein[j] <- names(dc)[j]
  dk$D[j]       <- k$statistic
  dk$p[j]       <- k$p.value
}
dk
#    Protein         D         p
# 1 Protein1 0.3076923 0.5881961
# 2 Protein2 0.2307692 0.8793244
# 3 Protein3 0.2307692 0.8793244
# 4 Protein4 0.4615385 0.1253895
# 5 Protein5 0.2307692 0.8793244
# 6 Protein6 0.1538462 0.9992124
# 7 Protein7 0.3846154 0.2999202
# 8 Protein8 0.4615385 0.1264877