Bash 优化粘贴循环_Bash_Unix_Optimization_For Loop

Bash 优化粘贴循环

bash unix optimization for-loop

Bash 优化粘贴循环,bash,unix,optimization,for-loop,Bash,Unix,Optimization,For Loop,我在/myfolder中有1000个文件，每个文件约8Mb，有500K行和2列，如下所示： file1.txt Col1 Col2 a 0.1 b 0.3 c 0.2 ... file2.txt Col1 Col2 a 0.8 b 0.9 c 0.4 ... 我需要从所有文件中删除第1列-Col1，并排粘贴所有文件，文件的顺序无关紧要我有以下代码运行，它已经运行了4个小时。。。无论如何，要加快速度吗 for i in /myfolder/*; do \ paste all.txt <

我在

/myfolder

中有1000个文件，每个文件约8Mb，有500K行和2列，如下所示：

file1.txt
Col1 Col2
a 0.1
b 0.3
c 0.2
...

file2.txt
Col1 Col2
a 0.8
b 0.9
c 0.4
...

我需要从所有文件中删除第1列-

Col1

，并排粘贴所有文件，文件的顺序无关紧要

我有以下代码运行，它已经运行了4个小时。。。无论如何，要加快速度吗

for i in /myfolder/*; do \
paste all.txt <(cut -f2 ${i}) > temp.txt; \
mv temp.txt all.txt; \
done

我认为，如果您并行地迭代这些文件，这个任务会容易得多。对于每个文件的每一行，只需剪切第一部分，然后打印结果的串联

在Python中，这类似于

import glob

# Open all *.txt files in parallel
files = [open(fn, 'r') for fn in glob.glob('*.txt')]
while True:
    # Try reading one line from each file, collecting into 'allLines'
    try:
        allLines = [next(f).strip() for f in files]
    except StopIteration:
        break

    # Chop off everything up to (including) the first space for each line
    secondColumns = (l[l.find(' ') + 1:] for l in allLines)

    # Print the columns, interspersing space characters
    print ' '.join(secondColumns)

唉，使

所有行

生成程序似乎不起作用-下一个

调用不会出于某种原因引发停止迭代
错误。我不会完全回答。但如果你尝试一下，你可能会成功。
例如：-基于第一列合并4个文件：
join -1 1 -2 1 temp1 temp2 | join - temp3|join - temp4

因此，您可以编写一个脚本，首先用所有文件构建命令框架，最后执行命令。
希望这是有用的。
第一列对所有文件都一样吗？是的，对所有文件都一样。我们是否建议join
比paste更有效？此外，文件没有排序。
join -1 1 -2 1 temp1 temp2 | join - temp3|join - temp4