purrr将t.测试映射到分割df上_R_Purrr

purrr将t.测试映射到分割df上

purrr将t.测试映射到分割df上,r,purrr,R,Purrr,我不熟悉purrr有前途的函数式编程。我正在尝试获取一个分组和拆分的数据帧，并对一个变量进行t检验。使用示例数据集的示例可能如下所示 mtcars %>% dplyr::select(cyl, mpg) %>% group_by(as.character(cyl)) %>% split(.$cyl) %>% map(~ t.test(.$`4`$mpg, .$`6`$mpg)) 这将导致以下错误： Error in var(x) : 'x' is

我不熟悉purrr有前途的函数式编程。我正在尝试获取一个分组和拆分的数据帧，并对一个变量进行t检验。使用示例数据集的示例可能如下所示

mtcars %>% 
  dplyr::select(cyl, mpg) %>% 
  group_by(as.character(cyl)) %>% 
  split(.$cyl) %>% 
  map(~ t.test(.$`4`$mpg, .$`6`$mpg))

这将导致以下错误：

Error in var(x) : 'x' is NULL
In addition: Warning messages:
1: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL'
2: In mean.default(x) : argument is not numeric or logical: returning NA

我只是误解了地图的工作原理吗？还是有更好的方法来思考这个问题？

我不完全理解预期结果，但这可能是答案的起点。来自purrr的映射在公式参数中使用.x

这里有一种方法可以实现我认为您正试图用Purr来实现的目标

但是，purrr:：by_slice与dplyr:：group_-by很好地匹配

或者，您可以使用dplyr:：：summary完全跳过purrr

如果嵌套的data.frame令人困惑，broom可以帮助我们获得结果的简单data.frame摘要

purrr+扫帚+三年

dplyr+扫帚

编辑以包括对评论的回复

有了管道，我们很快就会忘乎所以。我认为沃尔特的回答做得很好，但我想确保我提供了一个简短的回答。我希望pipeR的使用不要过于混乱

library(purrr)
library(dplyr)
library(broom)
library(tidyr)
library(pipeR)

mtcars %>>%
  (split(.,.$cyl)) %>>%
  (split_cyl~
    names(split_cyl) %>>%
     (
       cross_d(
         list(against=.,tested=.),
         .filter = `==`
       )
     ) %>>%
     by_row(
       ~tidy(t.test(split_cyl[[.x$tested]]$mpg,split_cyl[[.x$against]]$mpg))
     )
  ) %>>%
  unnest()

要执行两个样本t检验，必须创建气缸数的组合。我不认为可以使用purrr函数创建组合。然而，仅使用purrr和base R函数的方法是

library(purrr)
t_test2 <- mtcars %>% split(.$cyl) %>%
          transpose() %>%
          .[["mpg"]] %>%
          (function(x) combn(names(x), m=2, function(y) t.test(flatten_dbl(x[y[1]]), flatten_dbl(x[y[2]])) , simplify=FALSE))

虽然这看起来有点做作

类似的方法是只使用带链接的基R函数

t_test <- mtcars %>% split(.$cyl) %>%
                          (function(x) combn(names(x), m=2, function(y) x[y], simplify=FALSE)) %>%
                           lapply( function(x) t.test(x[[1]]$mpg, x[[2]]$mpg))

特别是在处理需要多个输入的管道时，我们这里没有Haskell箭头，我发现首先通过类型/签名进行推理更容易，然后将逻辑封装在可以进行单元测试的函数中，然后编写一个简洁的链

在本例中，您希望比较所有可能的向量对，因此我将设定编写一个函数的目标，该函数接受一对向量，即2个向量的列表，并返回它们的双向t检验

一旦你这样做了，你只需要一些胶水。因此，计划是：

编写函数，获取向量列表并执行双向t检验。编写一个函数/管道，从mtcars轻松获取向量。将以上内容映射到配对列表上。在编写任何代码之前，制定这个计划是很重要的。R不是强类型，这一事实使事情变得有些混乱，但通过这种方式，您可以首先对类型进行推理，然后对实现进行推理

第一步 t、测试需要点，所以我们使用purrr:lift让它获取一个列表。因为我们不想匹配列表中元素的名称，所以我们使用.unnamed=TRUE。此外，我们还特别清楚地表明，我们使用的是arity为2的t.test函数，尽管代码工作不需要这个额外的步骤

t.test2 <- function(x, y) t.test(x, y)
liftedTT <- lift(t.test2, .unnamed = TRUE)

这里有很多需要清理的地方，主要是使用因子级别并在输出中保留它们，而不是在第二个函数中使用全局变量，但我认为您想要的核心是这里。根据我的经验，避免迷路的诀窍是从内到外工作。

我注意到的一件事是，地图文档中的示例分别显示了地图对每个列表项的拆分操作，但您的示例尝试在列表项之间进行操作。是的，非常正确。你知道列表项之间是否有一种简单的操作方法吗？我不知道如何使用wity map使其工作，但你可以捕获结果，然后对该结果使用Lappy。我的想法如下。如果在mtcars\u split中捕获拆分结果，则可以执行类似于lapplynamesmtcars\u split[2:Lengthtmtcars\u split]、functionx{t.testmtcars\u split['4']]$mpg、mtcars\u split[[x]]$mpg}的操作。我怀疑有一个更干净的方法，也就是说，更可读的方法来做这件事。这可能是有帮助的额外阅读这是伟大的，非常感谢！我不知道dplyr:：data_帧的用法。那会很快的扫帚更新非常有用！快速澄清：此结果集显示每个切片或组的t检验。我如何将一组与另一组进行比较。类似于t.test4$mpg，6$mpg？完全使用dplyr:：：summary跳过purrr不起作用：Erreur:变量的长度必须为1或9。问题变量：“as.charactercyl”；Summary不喜欢返回的数据帧。我喜欢purrr+dplyr解决方案：purrr中的by_slice和by_row现在不推荐使用。因此，现在可行的解决方案是使用dplyr+broom来汇总分组的统计数据。这对我不起作用。UseMethodextract\中出错：没有适用于“extract\”的方法应用于类列表的对象。我使用的软件包：[1]dplyr\u 0.5.0 purrr\u 0.2.2 readr\u 1.0.0 tidyr\u 0.6.0[5]tibble\u 1.2 ggplot2\u 2.1.0.9001 tidyverse\u 1.0.0

library(broom)

mtcars %>% 
  dplyr::select(cyl, mpg) %>% 
  group_by(as.character(cyl)) %>%
  do(tidy(t.test(.$mpg)))

library(purrr)
library(dplyr)
library(broom)
library(tidyr)
library(pipeR)

mtcars %>>%
  (split(.,.$cyl)) %>>%
  (split_cyl~
    names(split_cyl) %>>%
     (
       cross_d(
         list(against=.,tested=.),
         .filter = `==`
       )
     ) %>>%
     by_row(
       ~tidy(t.test(split_cyl[[.x$tested]]$mpg,split_cyl[[.x$against]]$mpg))
     )
  ) %>>%
  unnest()

library(purrr)
t_test2 <- mtcars %>% split(.$cyl) %>%
          transpose() %>%
          .[["mpg"]] %>%
          (function(x) combn(names(x), m=2, function(y) t.test(flatten_dbl(x[y[1]]), flatten_dbl(x[y[2]])) , simplify=FALSE))

t_test <- mtcars %>% split(.$cyl) %>%
                          (function(x) combn(names(x), m=2, function(y) x[y], simplify=FALSE)) %>%
                           lapply( function(x) t.test(x[[1]]$mpg, x[[2]]$mpg))

t.test2 <- function(x, y) t.test(x, y)
liftedTT <- lift(t.test2, .unnamed = TRUE)

doTT <- function(pair) {
  mtcars %>%
    split(as.character(.$cyl)) %>%
    map(~ select(., mpg)) %>% 
    extract(pair) %>% 
    liftedTT %>% 
    broom::tidy
}

1:length(unique(mtcars$cyl)) %>% 
  combn(2) %>% 
  as.data.frame %>% 
  as.list %>% 
  map(~ doTT(.))

$V1
  estimate estimate1 estimate2 statistic      p.value parameter conf.low conf.high
1 6.920779  26.66364  19.74286  4.719059 0.0004048495  12.95598 3.751376  10.09018

$V2
  estimate estimate1 estimate2 statistic      p.value parameter conf.low conf.high
1 11.56364  26.66364      15.1  7.596664 1.641348e-06  14.96675 8.318518  14.80876

$V3
  estimate estimate1 estimate2 statistic      p.value parameter conf.low conf.high
1 4.642857  19.74286      15.1  5.291135 4.540355e-05  18.50248 2.802925  6.482789