如何使用dplyr分组进行统计测试，然后使用broom进行TIBLE_R_Tidyverse_Broom

如何使用dplyr分组进行统计测试，然后使用broom进行TIBLE

如何使用dplyr分组进行统计测试，然后使用broom进行TIBLE,r,tidyverse,broom,R,Tidyverse,Broom,我有以下数据框：库（tidyverse） dat#A tibble:32 x 3 #>charge.Group3疏水性.Group3类别 #> #>1 0.167 0.267负数 #>2 0.167 0.467负 #>3 0.1 0.067正 #>4 0.067 0.167正 #>5 0.033 0.267正 #>6 0.033 0.1正 #>7 0.067 0.367正 #>8 0.133 0.233负 #>9 0.2 0.3

我有以下数据框：

库（tidyverse）
dat#A tibble:32 x 3
#>charge.Group3疏水性.Group3类别
#>                                 
#>1 0.167 0.267负数
#>2 0.167 0.467负
#>3 0.1 0.067正
#>4 0.067 0.167正
#>5 0.033 0.267正
#>6 0.033 0.1正
#>7 0.067 0.367正
#>8 0.133 0.233负
#>9 0.2 0.367正
#>10 0.067 0.233正
#> # ... 还有22排

我想为每个功能做什么：

charge.Group3

和

疏水性。Group3

，在阴性和阳性类别之间执行

wilcox.test

。最后得到p值作为数据帧或TIBLE：

功能pvalue
charge.Group3 0.1088
疏水性3.0.03895
#我是手工做的。

请注意，实际上有两个以上的特性。

我怎样才能做到这一点呢？

这里有一种方法可以使用

dplyr:：summary\u在和tidyr:：gather
：
library(tidyverse)
dat %>%
  summarize_at(c("charge.Group3","hydrophobicity.Group3"),
               ~wilcox.test(.x ~ .y)$p.value, .$class) %>%
  gather(features, pvalue)

# # A tibble: 2 x 2
#                features pvalue
#                   <chr>  <dbl>
# 1         charge.Group3  0.109
# 2 hydrophobicity.Group3  0.039

如果只需要测试的p值，那么实际上不需要使用broom

library(tidyverse)


dat %>% 
  gather(group, value, -class) %>%    # reshape data            
  nest(-group) %>%                    # for each group nest data
  mutate(pval = map_dbl(data, ~wilcox.test(value ~ class, data = .)$p.value)) %>%  # get p value for wilcoxon test
  select(-data)                       # remove data column


# # A tibble: 2 x 2
#   group                   pval
#   <chr>                  <dbl>
# 1 charge.Group3         0.109 
# 2 hydrophobicity.Group3 0.0390        

如果你真的想参与扫帚，那么你可以这样做
library(broom)

dat %>% 
   gather(group, value, -class) %>%  
   nest(-group) %>%                  
   mutate(results = map(data, ~tidy(wilcox.test(value ~ class, data = .)))) %>%
   select(-data) %>%
   unnest(results)

# # A tibble: 2 x 5
# group                 statistic p.value method                                            alternative
#   <chr>                     <dbl>   <dbl> <chr>                                             <chr>      
# 1 charge.Group3              170.  0.109  Wilcoxon rank sum test with continuity correction two.sided  
# 2 hydrophobicity.Group3      183   0.0390 Wilcoxon rank sum test with continuity correction two.sided 

库（扫帚）
dat%>%
聚集（组、值、类）%>%
嵌套（-group）%>%
变异（结果=映射（数据，~tidy（wilcox.test（值~class，数据=）））））%>%
选择（-data）%%>%
unnest（结果）
##A tible:2 x 5
#组统计p值法备选方案
#                                                                               
#1.3 170。0.109带连续性校正的Wilcoxon秩和检验双侧
#2疏水性组3 183 0.0390 Wilcoxon秩和检验，带双侧连续性校正

返回更多列，但如果需要，可以保留p值。
谢谢。但是我怎样才能概括你的代码呢。因为有两个以上的功能。我不能在Summared_atYes硬编码。除了类以外的所有内容。改为公式表示法，因为它更紧凑（受@AntoniosK的启发），我认为您可以通过跳过嵌套使它变得非常漂亮和惯用，在这里分组就足够了：dat%%>%gather（group，value，-class）%%>%group\u by（group）%%>%summary（pval=wilcox.test（value~class）$p.value）
（在任何情况下都投了赞成票）确实如此。这是我的想法，但出于某种原因，今天我使用了总结（results=wilcox.test（value~class，data=）$p.value）
但由于数据=。的原因，它没有起作用！：（谢谢提醒。
dat %>% 
  gather(group, value, -class) %>% 
  group_by(group) %>% 
  summarize(results = wilcox.test(value ~ class)$p.value)

library(broom)

dat %>% 
   gather(group, value, -class) %>%  
   nest(-group) %>%                  
   mutate(results = map(data, ~tidy(wilcox.test(value ~ class, data = .)))) %>%
   select(-data) %>%
   unnest(results)

# # A tibble: 2 x 5
# group                 statistic p.value method                                            alternative
#   <chr>                     <dbl>   <dbl> <chr>                                             <chr>      
# 1 charge.Group3              170.  0.109  Wilcoxon rank sum test with continuity correction two.sided  
# 2 hydrophobicity.Group3      183   0.0390 Wilcoxon rank sum test with continuity correction two.sided