R 将所有行粘贴到一起的最快方法
我想将所有行逐列粘贴到同一单元格中 例如,我有一个如下表:R 将所有行粘贴到一起的最快方法,r,data-wrangling,R,Data Wrangling,我想将所有行逐列粘贴到同一单元格中 例如,我有一个如下表: Col1 AAABBC Col2 AAABBB Col3 ABBBAA 库(TIBLE) tibble::tribble( ~Col1,~Col2,~Col3, “AA”、“AA”、“AB”, “AB”,“AB”,“BB”, “BC”、“BB”、“AA” ) 我想要的输出是一个3X1表,如下所示: Col1 AAABBC Col2 AAABBB Col3 ABBBAA 然而,实际情况更复杂,因为我的原始表有600000
Col1 AAABBC
Col2 AAABBB
Col3 ABBBAA
库(TIBLE)
tibble::tribble(
~Col1,~Col2,~Col3,
“AA”、“AA”、“AB”,
“AB”,“AB”,“BB”,
“BC”、“BB”、“AA”
)
我想要的输出是一个3X1表,如下所示:
Col1 AAABBC
Col2 AAABBB
Col3 ABBBAA
然而,实际情况更复杂,因为我的原始表有600000行和2000列。我想知道实现这一目标最快的方法是什么。我尝试了循环,但它花了很长时间才完成逐行粘贴
感谢您的帮助,谢谢 库(data.table)
dt结果
#>1:AAABBC
#>2:AAABBB
#>3:ABBBAA
#或
dtv1
#>1:AAABBC
#>2:AAABBB
#>3:ABBBAA
于2021-03-18年由(v0.3.0)创建
第二种方法应该比第一种方法快。lappy(df,paste,collapse=”“)
这将返回一个列表。如果需要向量,请使用
sapply
而不是lappy
。如果您需要数据帧,请将整个调用包装在data.frame
中。如果您有足够的内存来存储多个数据实例,则使用doParallel
包的这种方法可以工作。这里我使用的是tidyverse
family
library(tidyverse)
library(doParallel)
n <- 1000
# Generate a 1000 rows df with ~3000 columns
big_table <- do.call("rbind", replicate(n, data, simplify = FALSE))
lapply(1:10, function(x) {big_table <<- bind_cols(big_table, big_table); return(x)})
# Get the list of column names
col_list <- names(big_table)
# Define number of cores you want to process
number_of_parallel_cores <- 4
col_group <- split(col_list, sort(rep_len(1:number_of_parallel_cores, length(col_list))))
# Running the code with timer
system.time({
registerDoParallel(number_of_parallel_cores)
combine_data <- bind_rows(foreach(i_col_group = col_group) %dopar% {
big_table %>%
select(one_of(i_col_group)) %>%
summarize(across(.fns = paste, collapse = "")) %>%
pivot_longer(cols = everything(), names_to = "col_names", values_to = "values")
})
})
输出
col_names values
<chr> <chr>
1 Col1...1 AAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAA…
2 Col2...2 AAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAA…
3 Col3...3 ABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAAB…
4 Col1...4 AAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAA…
5 Col2...5 AAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAA…
6 Col3...6 ABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAAB…
7 Col1...7 AAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAA…
8 Col2...8 AAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAA…
9 Col3...9 ABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAABBBAAAB…
10 Col1...10 AAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAAABBCAA…
# … with 3,062 more rows
我们可以使用
dapply
fromcollapse
,它针对行操作进行了优化
library(collapse)
dapply(df1, paste, collapse="", MARGIN = 1)
#[1] "AAAAAB" "ABABBB" "BCBBAA"
根据?dapply
dapply有效地将函数应用于类似矩阵的对象的列或行,并在默认情况下返回具有相同类型和属性的对象。或者,也可以在普通矩阵或data.frame中返回结果。还提供了简单的并行性
天哪,你太棒了。这真的很有帮助!!!非常感谢!!!
system.time(
big_table %>%
select(one_of(col_list)) %>%
summarize(across(.fns = paste, collapse = "")) %>%
pivot_longer(cols = everything(), names_to = "col_names", values_to = "values")
)
user system elapsed
0.021 0.000 0.022
library(collapse)
dapply(df1, paste, collapse="", MARGIN = 1)
#[1] "AAAAAB" "ABABBB" "BCBBAA"