R 将所有行粘贴到一起的最快方法

R 将所有行粘贴到一起的最快方法,r,data-wrangling,R,Data Wrangling,我想将所有行逐列粘贴到同一单元格中 例如,我有一个如下表: Col1 AAABBC Col2 AAABBB Col3 ABBBAA 库(TIBLE) tibble::tribble( ~Col1,~Col2,~Col3, “AA”、“AA”、“AB”, “AB”,“AB”,“BB”, “BC”、“BB”、“AA” ) 我想要的输出是一个3X1表,如下所示: Col1 AAABBC Col2 AAABBB Col3 ABBBAA 然而,实际情况更复杂,因为我的原始表有600000

我想将所有行逐列粘贴到同一单元格中

例如,我有一个如下表:

Col1 AAABBC

Col2 AAABBB

Col3 ABBBAA 
库(TIBLE)
tibble::tribble(
~Col1,~Col2,~Col3,
“AA”、“AA”、“AB”,
“AB”,“AB”,“BB”,
“BC”、“BB”、“AA”
)
我想要的输出是一个3X1表,如下所示:

Col1 AAABBC

Col2 AAABBB

Col3 ABBBAA 
然而,实际情况更复杂,因为我的原始表有600000行和2000列。我想知道实现这一目标最快的方法是什么。我尝试了循环,但它花了很长时间才完成逐行粘贴

感谢您的帮助,谢谢

库(data.table)
dt结果
#>1:AAABBC
#>2:AAABBB
#>3:ABBBAA
#或
dtv1
#>1:AAABBC
#>2:AAABBB
#>3:ABBBAA
于2021-03-18年由(v0.3.0)创建

第二种方法应该比第一种方法快。

lappy(df,paste,collapse=”“)

这将返回一个列表。如果需要向量,请使用
sapply
而不是
lappy
。如果您需要数据帧,请将整个调用包装在
data.frame

中。如果您有足够的内存来存储多个数据实例,则使用
doParallel
包的这种方法可以工作。这里我使用的是
tidyverse
family


library(tidyverse)
library(doParallel)

n <- 1000
# Generate a 1000 rows df with ~3000 columns
big_table <- do.call("rbind", replicate(n, data, simplify = FALSE))
lapply(1:10, function(x) {big_table <<- bind_cols(big_table, big_table); return(x)})

# Get the list of column names
col_list <- names(big_table)
# Define number of cores you want to process
number_of_parallel_cores <- 4
col_group <- split(col_list, sort(rep_len(1:number_of_parallel_cores, length(col_list))))

# Running the code with timer
system.time({
  registerDoParallel(number_of_parallel_cores)
  combine_data <- bind_rows(foreach(i_col_group = col_group) %dopar% {
    big_table %>%
      select(one_of(i_col_group)) %>%
      summarize(across(.fns = paste, collapse = "")) %>%
      pivot_longer(cols = everything(), names_to = "col_names", values_to = "values")
  })
})
输出

   col_names values                                                                                                                                                                                                               
   <chr>     <chr>                                                                                                                                                                                                                
 1 Col1...1  AAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAA…
 2 Col2...2  AAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAA…
 3 Col3...3  ABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAAB…
 4 Col1...4  AAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAA…
 5 Col2...5  AAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAA…
 6 Col3...6  ABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAAB…
 7 Col1...7  AAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAA…
 8 Col2...8  AAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAA…
 9 Col3...9  ABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAAB…
10 Col1...10 AAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAA…
# … with 3,062 more rows

我们可以使用
dapply
from
collapse
,它针对行操作进行了优化

library(collapse)
dapply(df1, paste, collapse="", MARGIN = 1)
#[1] "AAAAAB" "ABABBB" "BCBBAA"
根据
?dapply

dapply有效地将函数应用于类似矩阵的对象的列或行,并在默认情况下返回具有相同类型和属性的对象。或者,也可以在普通矩阵或data.frame中返回结果。还提供了简单的并行性


天哪,你太棒了。这真的很有帮助!!!非常感谢!!!
system.time(
  big_table %>%
    select(one_of(col_list)) %>%
    summarize(across(.fns = paste, collapse = "")) %>%
    pivot_longer(cols = everything(), names_to = "col_names", values_to = "values")
)

   user  system elapsed 
  0.021   0.000   0.022 
library(collapse)
dapply(df1, paste, collapse="", MARGIN = 1)
#[1] "AAAAAB" "ABABBB" "BCBBAA"