如何使用分隔符将csv文件拆分为多个文件？python_Python_Unix_Csv_Cut

如何使用分隔符将csv文件拆分为多个文件？python

python unix csv

如何使用分隔符将csv文件拆分为多个文件？python,python,unix,csv,cut,Python,Unix,Csv,Cut,我有一个以制表符分隔的文件：这是一个句子。神甫这是什么食物吧。贝弗你好，福吧，黑羊。神甫我可以在unix终端中使用cut-f1和cut-f2将其拆分为两个文件： this is a sentence. what is this foo bar. hello foo bar blah black sheep. 以及：但是在python中也可以这样做吗？会更快吗？我一直在这样做： [i.split('\t')[0] for i in open('in.txt', 'r')] 但是在p

我有一个以制表符分隔的文件：

这是一个句子。神甫这是什么食物吧。贝弗你好，福吧，黑羊。神甫

我可以在unix终端中使用

cut-f1

和

cut-f2

将其拆分为两个文件：

this is a sentence.
what is this foo bar.
hello foo bar blah black sheep.

以及：

但是在python中也可以这样做吗？会更快吗？

我一直在这样做：

[i.split('\t')[0] for i in open('in.txt', 'r')]

但是在python中也可以这样做吗

是的，你可以：

l1, l2 = [[],[]]

with open('in.txt', 'r') as f:
    for i in f:
        # will loudly fail if more than two columns on a line
        left, right = i.split('\t')
        l1.append(left)
        l2.append(right)

print("\n".join(l1))
print("\n".join(l2))

会更快吗

不太可能，cut是一个针对这种处理进行优化的C程序，python是一种通用语言，具有很大的灵活性，但不一定很快

尽管如此，使用我编写的算法，您可能获得的唯一优势是，您只读取一次文件，而使用cut，您将读取两次。这可能会有所不同

尽管我们需要运行一些基准测试，以达到100%

在我的笔记本电脑上有一个小的基准，它的价值是：

>>> timeit.timeit(stmt=lambda: t("file_of_606251_lines"), number=1)
1.393364901014138

也就是1.990秒

因此，python版本确实比预期的更快；-）

>>> timeit.timeit(stmt=lambda: t("file_of_606251_lines"), number=1)
1.393364901014138

% time cut -d' ' -f1 file_of_606251_lines > /dev/null
cut -d' ' -f1 file_of_606251_lines > /dev/null  0.74s user 0.02s system 98% cpu 0.775 total
% time cut -d' ' -f2 file_of_606251_lines > /dev/null
cut -d' ' -f2 file_of_606251_lines > /dev/null  1.18s user 0.02s system 99% cpu 1.215 total