在嵌套的TIBLE的两列上应用purrr:：map2_R_Dplyr_Tidyr

在嵌套的TIBLE的两列上应用purrr:：map2

在嵌套的TIBLE的两列上应用purrr:：map2,r,dplyr,tidyr,R,Dplyr,Tidyr,此问题涉及使用tidyverse语言的操作。我试图使用tidyr:：nest和purrr:map2在tibble的两列上执行一个双变量函数，并将它们替换为该双变量函数的两个其他列。该操作是基于H0和H1下的统计值计算ROC的操作，该操作产生两个新值（即列）FPR和TPR。以下是一个工作示例： library(tidyverse) library(purrr) # function to compute the rejection rates get_reject_freq <- funct

此问题涉及使用

tidyverse

语言的操作。我试图使用

tidyr:：nest

和

purrr:map2

在

tibble

的两列上执行一个双变量函数，并将它们替换为该双变量函数的两个其他列。该操作是基于

H0

和

H1

下的统计值计算ROC的操作，该操作产生两个新值（即列）

FPR

和

TPR

。以下是一个工作示例：

library(tidyverse)
library(purrr)
# function to compute the rejection rates
get_reject_freq <- function(Tstat, th_vec, twosided=T) {
  # Tstat is a vector, th could be a vector of thresholds threshold
  if (twosided) Tstat <- abs(Tstat)
  sapply(th_vec, function(th) mean(Tstat > th))
}

# function to compute the ROC
get_ROC <- function(T0, T1, twosided=T) {
  T0_sorted <- sort(unique(T0), decreasing = T)
  tibble(FPR = get_reject_freq(T0, T0_sorted, twosided = twosided), 
         TPR = get_reject_freq(T1, T0_sorted, twosided = twosided))
}

n = m = 15
run_sims_one_iter <- function(j) {
  x = rt(n, df=5, ncp=0)
  y = list(H0=rt(m, df=5, ncp=0), H1=rt(m, df=5, ncp=1))

  result = NULL
  for (h in c("H0","H1")) {
    result[[h]] = tibble(method="t_test", H=h, 
                         test_stat=t.test(x,y[[h]])$statistic) %>% 
      add_row(method="wilcoxon", H=h, 
              test_stat=wilcox.test(x,y[[h]], alternative = "two.sided")$statistic, )
  }
  return( bind_rows(result) )
}

result = bind_rows( lapply(1:100, run_sims_one_iter) )


#### The following can hopefully be improved ###
temp = result %>% 
  group_by(method,H) %>% 
  nest() %>%
  pivot_wider(names_from = H, values_from = data) %>%
  ungroup()


roc_results = bind_rows( 
  lapply(1:nrow(temp), function(i) {
    get_ROC( temp[[i,"H0"]]$test_stat, temp[[i,"H1"]]$test_stat) %>% 
      add_column(method = temp[i,]$method)
  }
))

生成以下表单的输出：

# A tibble: 2 x 3
  method   H0                 H1                
  <chr>    <list>             <list>            
1 t_test   <tibble [100 × 1]> <tibble [100 × 1]>
2 wilcoxon <tibble [100 × 1]> <tibble [100 × 1]>

temp = result %>% 
  group_by(method,H) %>% 
  nest() %>%
  pivot_wider(names_from = H, values_from = data) %>%
  ungroup() %>%
  mutate(res=map2(unlist(H0), unlist(H1), get_ROC)) %>% unnest(res)

理想情况下，我想用一行表单替换

temp

和

roc_结果的构造：
# A tibble: 2 x 3
  method   H0                 H1                
  <chr>    <list>             <list>            
1 t_test   <tibble [100 × 1]> <tibble [100 × 1]>
2 wilcoxon <tibble [100 × 1]> <tibble [100 × 1]>

temp = result %>% 
  group_by(method,H) %>% 
  nest() %>%
  pivot_wider(names_from = H, values_from = data) %>%
  ungroup() %>%
  mutate(res=map2(unlist(H0), unlist(H1), get_ROC)) %>% unnest(res)

但这不起作用。我猜问题可能是get_ROC
的输出大小可能会因每一行而改变（？）。你知道我如何使用tidyverse
方法执行所有操作吗。
你的方向是正确的，但你必须在map2
函数中而不是在参数中取消列出
library(dplyr)
library(tidyr)

result %>% 
  group_by(method,H) %>% 
  nest() %>%
  pivot_wider(names_from = H, values_from = data) %>% 
  mutate(res = purrr::map2(H0, H1, ~get_ROC(unlist(.x), unlist(.y)))) %>%
  unnest(res) %>%
  select(-c(H0, H1))

#  method   FPR   TPR
#   <chr>  <dbl> <dbl>
# 1 t_test  0.01  0.49
# 2 t_test  0.06  0.59
# 3 t_test  0.08  0.65
# 4 t_test  0.1   0.74
# 5 t_test  0.11  0.77
# 6 t_test  0.13  0.82
# 7 t_test  0.19  0.84
# 8 t_test  0.21  0.84
# 9 t_test  0.22  0.85
#10 t_test  0.24  0.86
# … with 156 more rows

库（dplyr）
图书馆（tidyr）
结果%>%
（方法，H）%>%
嵌套（）%>%
pivot\u更宽（名称\u from=H，值\u from=data）%>%
突变（res=purr:：map2（H0，H1，~get_-ROC（未列出（.x），未列出（.y）））%>%
未测试（res）%>%
选择（-c（H0，H1））
#方法FPR-TPR
#      
#1 t_试验0.01 0.49
#2 t_试验0.06 0.59
#3 t_试验0.08 0.65
#4 t_检验0.1 0.74
#5 t_检验0.11 0.77
#6 t_检验0.13 0.82
#7 t_检验0.19 0.84
#8 t_检验0.21 0.84
#9 t_检验0.22 0.85
#10 t_检验0.24 0.86
#…还有156行
太棒了！您是否有关于使用（.x）
符号的文档指针？如果有关于~
用法的解释，也会有所帮助。请解释一下为什么您的版本可以工作，而另一个版本不能工作。@PasseBy51是的，~
是应用函数的公式式语法。它是匿名函数的替代方法。您可以参考地图。好的，谢谢。所以这是purr
的映射函数的问题？我仍然不明白为什么我的原始版本不起作用。不……它与map
无关。请参见unlist（temp$H0）
和unlist（temp$H0[1]）
中的输出差异。你正在做的是前者，而你需要的是后者。您未列出数据帧列表，而我的答案仅列出一个数据帧。我明白了。谢谢，非常感谢！