R中的关联度量——肯德尔'；s tau-b和tau-c_R_Statistics_Distribution

R中的关联度量——肯德尔'；s tau-b和tau-c

r statistics

R中的关联度量——肯德尔'；s tau-b和tau-c,r,statistics,distribution,R,Statistics,Distribution,是否有用于计算Kendall的tau-b和tau-c及其相关标准误差的R包？我在Google和Rseek上的搜索结果都是空白，但肯定有人在R中实现了这些功能。你试过这个功能吗？有一种方法可以设置为“kendall”（如果需要，还可以设置“pearson”和“spearman”的选项），不确定这是否涵盖了您要查找的所有标准错误，但它应该可以让您开始使用。仅扩展Stedy的答案cor（x，y，method=“kendall”）将为您提供相关性，cor.test（x，y，method=“kendall

是否有用于计算Kendall的tau-b和tau-c及其相关标准误差的R包？我在Google和Rseek上的搜索结果都是空白，但肯定有人在R中实现了这些功能。

你试过这个功能吗？有一种方法可以设置为

“kendall”

（如果需要，还可以设置

“pearson”

和

“spearman”

的选项），不确定这是否涵盖了您要查找的所有标准错误，但它应该可以让您开始使用。

仅扩展Stedy的答案

cor（x，y，method=“kendall”）

将为您提供相关性，

cor.test（x，y，method=“kendall”）

将为您提供p值和CI

另外，看看Kendall软件包，它提供了一个声称更接近的函数

> library(Kendall)
> Kendall(x,y)

演绎器软件包中还有cor.matrix函数，可用于精美打印：

> library(Deducer)
> cor.matrix(variables=d(mpg,hp,wt),,
+ data=mtcars,
+ test=cor.test,
+ method='kendall',
+ alternative="two.sided",exact=F)

                          Kendall's rank correlation tau                          

           mpg     hp      wt     
mpg    cor 1       -0.7428 -0.7278
         N 32      32      32     
    stat**         -5.871  -5.798 
   p-value         0.0000  0.0000 
----------                        
 hp    cor -0.7428 1       0.6113 
         N 32      32      32     
    stat** -5.871          4.845  
   p-value 0.0000          0.0000 
----------                        
 wt    cor -0.7278 0.6113  1      
         N 32      32      32     
    stat** -5.798  4.845          
   p-value 0.0000  0.0000         
----------                        
    ** z
    HA: two.sided

在带有

corr.test（x，method=“Kendall”）

的

psych

软件包中有一个关于肯德尔系数的例程。此函数可应用于data.frame，还可显示每对变量的p值。我猜它显示了tau-a系数。唯一的缺点是它实际上是

cor（）

函数的包装器

维基百科上有关于肯德尔系数的文章，请查收。尝试

sos

软件包和

findFn（）

函数。我在查询

“tau a”

和

tau b

时得到了很多东西，但都以运气不佳告终。正如Ian所建议的那样，搜索结果似乎合并到了Kendall软件包中。

有三个Kendall tau统计数据（tau-a、tau-b和tau-c）

它们是不可交换的，到目前为止发布的答案中没有一个涉及最后两个，这是OP问题的主题

我无法在R标准库（stat等人）或CRAN或其他存储库上提供的任何包中找到计算tau-b或tau-c的函数。我使用了优秀的R软件包sos进行搜索，因此我相信返回的结果相当全面

这就是OP问题的简短答案：tau-b或tau-c没有内置或封装函数
但你很容易自己动手
为Kendall统计数据编写R函数只是一个简单的问题将这些方程转换为代码：

Kendall_tau_a = (P - Q) / (n * (n - 1) / 2) Kendall_tau_b = (P - Q) / ( (P + Q + Y0) * (P + Q + X0) ) ^ 0.5 Kendall_tau_c = (P - Q) * ((2 * m) / n ^ 2 * (m - 1) )
tau-a:等于协和对减去不协和对，除以一个因子以说明对的总数（样本大小）
tau-b:明确说明关系——即，数据对的两个成员具有相同的值；该值等于协和对减去不协和对除以表示x（X0）上未连接对数和y（Y0）上未连接对数之间几何平均数的项
tau-c:较大的表格变体也针对非方形表格进行了优化；等于协和对减去不协和对乘以调整表大小的因子）
那么，肯德尔的tau统计量与分类数据分析中使用的其他统计检验有什么关系呢
所有三个Kendall tau统计量，以及Goodman和Kruskal的伽马统计量都用于顺序和二进制数据的相关性。（Kendall tau统计量比gamma统计量（仅P-Q）更复杂。）
所以Kendalls的tau和gamma与简单的卡方检验和Fisher精确检验相对应，这两种检验（据我所知）只适用于名义数据
示例：

cpa_group = c(4, 2, 4, 3, 2, 2, 3, 2, 1, 5, 5, 1) revenue_per_customer_group = c(3, 3, 1, 3, 4, 4, 4, 3, 5, 3, 2, 2) weight = c(1, 3, 3, 2, 2, 4, 0, 4, 3, 0, 1, 1) dfx = data.frame(CPA=cpa_group, LCV=revenue_per_customer_group, freq=weight) # Reshape data frame so 1 row for each event # (predicate step to create contingency table). dfx2 = data.frame(lapply(dfx, function(x) { rep(x, dfx$freq)})) t = xtabs(~ revenue + cpa, dfx) kc = kendall_tau_c(t) # Returns -.35.

今天我无意中发现了这个页面，当时我正在寻找R中kendall tau-b的实现
对于其他正在寻找相同东西的人：
tau-b实际上是stats包的一部分
有关更多详细信息，请参阅此链接：
我试过了，效果很好：图书馆（统计）

忽略警告信息。头实际上是头b
有一段时间了，但这3个函数都是在DescTools中实现的

library(DescTools) # example in: # http://support.sas.com/documentation/cdl/en/statugfreq/63124/PDF/default/statugfreq.pdf # pp. S. 1821 tab <- as.table(rbind(c(26,26,23,18,9),c(6,7,9,14,23))) # tau-a KendallTauA(tab, conf.level=0.95) tau_a lwr.ci ups.ci 0.2068323 0.1771300 0.2365346 # tau-b KendallTauB(tab, conf.level=0.95) tau_b lwr.ci ups.ci 0.3372567 0.2114009 0.4631126 # tau-c > StuartTauC(tab, conf.level=0.95) tauc lwr.ci ups.ci 0.4110953 0.2546754 0.5675151 # alternative for tau-b: d.frm <- Untable(tab, dimnames = list(1:2, 1:5)) cor(as.numeric(d.frm$Var1), as.numeric(d.frm$Var2),method="kendall") [1] 0.3372567 # but no confidence intervalls for tau-b! Check: unclass(cor.test(as.numeric(d.frm$Var1), as.numeric(d.frm$Var2), method="kendall"))

库（描述工具） #例如： # http://support.sas.com/documentation/cdl/en/statugfreq/63124/PDF/default/statugfreq.pdf #第1821页制表符StuartTauC（制表符，形态等级=0.95） tauc lwr.ci ups.ci 0.4110953 0.2546754 0.5675151 #tau-b的替代方案： d、 frm根据这个r-tutor页面，tau-b实际上是由基r函数计算的道格在这里的回答是不正确的数据包Kendall可用于计算Tau b。 Kendall包函数Kendall（也可能是cor（x，y，method=“Kendall”））使用Tau-b的公式来计算关系。但是，对于有联系的向量，Kendall包具有更正确的p值。见第4页的Kendall文档第4页，D参考Kendall计算的分母： D=n（n− 1)/2. S被称为分数，D（分母）是S的最大可能值。当有联系时，D的公式更复杂（Kendall，1974，第3章）在我们的函数中实现了这两种情况下联系的一般公式。在没有联系的情况下，使用Best和Gipps（1974）给出的精确算法计算无联系的零假设下tau的p值。当存在联系时，采用带连续性校正的正态近似，将S视为正态分布，平均值为零，方差为var（S），其中var（S）由Kendall（1976，等式4.4，第55页）给出。除非联系非常广泛和/或数据非常短，否则该近似值就足够了。如果存在广泛的联系，那么引导提供了一个权宜之计（Davis和Hinkley，1997）。或者，也可以使用基于穷举枚举的精确方法（Valz和Thompson，1994），但本软件包中未实现该方法我最初编辑了道格关于这个问题的答案，但它是w cpa_group = c(4, 2, 4, 3, 2, 2, 3, 2, 1, 5, 5, 1) revenue_per_customer_group = c(3, 3, 1, 3, 4, 4, 4, 3, 5, 3, 2, 2) weight = c(1, 3, 3, 2, 2, 4, 0, 4, 3, 0, 1, 1) dfx = data.frame(CPA=cpa_group, LCV=revenue_per_customer_group, freq=weight) # Reshape data frame so 1 row for each event # (predicate step to create contingency table). dfx2 = data.frame(lapply(dfx, function(x) { rep(x, dfx$freq)})) t = xtabs(~ revenue + cpa, dfx) kc = kendall_tau_c(t) # Returns -.35. x <- c(1,1,2) y<-c(1,2,3) cor.test(x, y, method = "kendall", alternative = "greater") data: x and y z = 1.2247, p-value = 0.1103 alternative hypothesis: true tau is greater than 0 sample estimates: tau 0.8164966 Warning message: In cor.test.default(x, y, method = "kendall", alternative = "greater") : Cannot compute exact p-value with ties library(DescTools) # example in: # http://support.sas.com/documentation/cdl/en/statugfreq/63124/PDF/default/statugfreq.pdf # pp. S. 1821 tab <- as.table(rbind(c(26,26,23,18,9),c(6,7,9,14,23))) # tau-a KendallTauA(tab, conf.level=0.95) tau_a lwr.ci ups.ci 0.2068323 0.1771300 0.2365346 # tau-b KendallTauB(tab, conf.level=0.95) tau_b lwr.ci ups.ci 0.3372567 0.2114009 0.4631126 # tau-c > StuartTauC(tab, conf.level=0.95) tauc lwr.ci ups.ci 0.4110953 0.2546754 0.5675151 # alternative for tau-b: d.frm <- Untable(tab, dimnames = list(1:2, 1:5)) cor(as.numeric(d.frm$Var1), as.numeric(d.frm$Var2),method="kendall") [1] 0.3372567 # but no confidence intervalls for tau-b! Check: unclass(cor.test(as.numeric(d.frm$Var1), as.numeric(d.frm$Var2), method="kendall"))