具有动态列名的Dplyr和purrr，按组选择和复制_R_Dplyr_Purrr

具有动态列名的Dplyr和purrr，按组选择和复制

具有动态列名的Dplyr和purrr，按组选择和复制,r,dplyr,purrr,R,Dplyr,Purrr,下面的代码需要从变量中获取列名，然后使用该操作中指定的列有选择地对行执行操作。下面是我的简单示例，创建列res以匹配列目标：我采用了一种迭代的方法，在grp列中循环使用唯一的变量，创建临时列，然后对这些列进行汇总以得到最终结果。笨重，但最后还是到了那里我相信有一种更优雅的方式可以使用来自Purr的map家族中的一员来完成。有人能告诉我如何使用purrr在没有循环的情况下做到这一点吗？使用这种方法，我真的很难让动态列名位正常工作。提前谢谢。也许： tst %>% mutate(re

下面的代码需要从变量中获取列名，然后使用该操作中指定的列有选择地对行执行操作。下面是我的简单示例，创建列res以匹配列目标：

我采用了一种迭代的方法，在grp列中循环使用唯一的变量，创建临时列，然后对这些列进行汇总以得到最终结果。笨重，但最后还是到了那里

我相信有一种更优雅的方式可以使用来自Purr的map家族中的一员来完成。有人能告诉我如何使用purrr在没有循环的情况下做到这一点吗？使用这种方法，我真的很难让动态列名位正常工作。提前谢谢。

也许：

tst %>% 
  mutate(res = sapply(seq(nrow(tst)), function(x) tst[x,as.character(tst$grp[x])]))


# A tibble: 6 x 6
    grp     a     b     c target   res
  <chr> <dbl> <dbl> <dbl>  <dbl> <dbl>
1     a     2     4     8      2     2
2     a     2     4     8      2     2
3     b     2     4     8      4     4
4     b     2     4     8      4     4
5     c     2     4     8      8     8
6     c     2     4     8      8     8

不需要编写循环的东西

library(tidyverse)

tst <- tibble(grp = c("a","a","b","b","c","c"), a = rep(2,6), b = rep(4,6), 
              c = rep(8,6), target = c(2,2,4,4,8,8), res = rep(0,6))

tst %>% 
  mutate(res = 
           case_when(
             grp == "a" ~ a,
             grp == "b" ~ b,
             grp == "c" ~ c
           ))

# A tibble: 6 x 6
  grp       a     b     c target   res
  <chr> <dbl> <dbl> <dbl>  <dbl> <dbl>
1 a         2     4     8      2     2
2 a         2     4     8      2     2
3 b         2     4     8      4     4
4 b         2     4     8      4     4
5 c         2     4     8      8     8
6 c         2     4     8      8     8

注意：如果需要，您可以使用自己的公式代替~a。有关更多帮助，请参阅？case_when

您可以使用imap，它迭代列值及其名称。列值是grp的值，名称只是序列1，…，6

此外，还必须将数据帧本身作为附加参数df=提供给imap，并将其转发给函数参数。总计：

tst %>% 
  mutate(res = purrr::imap_dbl(grp, df = ., 
    .f = function(g, i, df) df[i,g][[1]] # [[1]] turns the result from tibble into a double
  ))

编辑：我用一个较大的表计时此解决方案：

tst <- tst[sample(nrow(tst), 50000, TRUE),]

tst <- tst[sample(nrow(tst), 50000, TRUE), ]

大约需要50秒

这是一个基本的R解决方案，它也不再是：

# Save all source columns in a matrix. This enables indexing by another matrix
x <- as.matrix(tst[, unique(tst$grp)])
# Matrix of (row, column) pairs to extract from x
i <- cbind(seq_len(nrow(tst)), match(tst$grp, colnames(x)))
tst$res <- x[i]

编辑：较大表格的运行时间：

tst <- tst[sample(nrow(tst), 50000, TRUE),]

tst <- tst[sample(nrow(tst), 50000, TRUE), ]

0.008s-0.015秒

谢谢，我就是这样after@ScottSimpson如果您有一个大于10000行的大表格，您可能希望查看速度快5000倍的表格。谢谢，非常有用