Linux 如何将一个文本文件拆分为多个*.txt文件？_Linux_Bash

Linux 如何将一个文本文件拆分为多个*.txt文件？

linux bash

Linux 如何将一个文本文件拆分为多个*.txt文件？,linux,bash,Linux,Bash,我得到了一个文本文件file.txt（12Mbs），其中包含： something1 something2 something3 something4 (...) 有没有办法将file.txt拆分为12个*.txt文件，比如file2.txt，file3.txt，file4.txt（…）？您可以使用linux bash核心实用程序split split -b 1M -d file.txt file 请注意，M或MB都可以，但大小不同。MB是1000*1000，M是1024^2 如果要按行

我得到了一个文本文件

file.txt

（12Mbs），其中包含：

something1
something2
something3
something4
(...)

有没有办法将

file.txt

拆分为12个*.txt文件，比如

file2.txt

，

file3.txt

，

file4.txt

（…）？

您可以使用linux bash核心实用程序

split

split -b 1M -d  file.txt file

请注意，

或

MB

都可以，但大小不同。MB是1000*1000，M是1024^2

如果要按行分隔，可以使用

-l

参数

更新

a=(`wc -l yourfile`) ; lines=`echo $(($a/12)) | bc -l` ; split -l $lines -d  file.txt file

根据建议的另一种解决方案，您可以执行以下操作

split -n l/12 file.txt

注意，

不是

one

，

split-n

有几个选项，比如

，

k/n

，

l/k/n

，

r/n

，

r/k/n

使用bash：

readarray -t LINES < file.txt
COUNT=${#LINES[@]}
for I in "${!LINES[@]}"; do
    INDEX=$(( (I * 12 - 1) / COUNT + 1 ))
    echo "${LINES[I]}" >> "file${INDEX}.txt"
done

与

split

不同，此选项确保行数最为均匀

$ split -l 100 input_file output_file

其中

-l

是每个文件中的行数。这将产生：

输出文件aa
输出文件
输出文件
输出文件

split -b=1M -d  file.txt file --additional-suffix=.txt

> split -b 10M -d  system.log system_split.log

awk -vc=1 'NR%1000000==0{++c}{print $0 > c".txt"}' Datafile.txt

for filename in *.txt; do mv "$filename" "Prefix_$filename"; done;

$ ls -laF
total 1391952
drwxr-xr-x 2 user.name group         40 Sep 14 15:43 ./
drwxr-xr-x 3 user.name group       4096 Sep 14 15:39 ../
-rw-r--r-- 1 user.name group 1425352817 Sep 14 14:01 really_big_file.txt

split-b=1M-d file.txt文件

-b

split-b1m-d file.txt文件

split-b=1m-d file.txt文件--附加后缀=.txt

-b5m

split

-100000

-1000000

split

-n

--附加后缀的命令行选项
相反，我用了这个：
split -d -l NUM_LINES really_big_file.txt split_files.txt.

其中，-d
是在split_files.txt的末尾添加一个数字后缀。
和-l
指定每个文件的行数
$ ls -laF
total 2783904
drwxr-xr-x 2 user.name group        156 Sep 14 15:43 ./
drwxr-xr-x 3 user.name group       4096 Sep 14 15:39 ../
-rw-r--r-- 1 user.name group 1425352817 Sep 14 14:01 really_big_file.txt
-rw-r--r-- 1 user.name group  428604626 Sep 14 15:43 split_files.txt.00
-rw-r--r-- 1 user.name group  427152423 Sep 14 15:43 split_files.txt.01
-rw-r--r-- 1 user.name group  427141443 Sep 14 15:43 split_files.txt.02
-rw-r--r-- 1 user.name group  142454325 Sep 14 15:43 split_files.txt.03


$ wc -l *.txt*
    100000 really_big_file.txt
     30000 split_files.txt.00
     30000 split_files.txt.01
     30000 split_files.txt.02
     10000 split_files.txt.03
    200000 total

例如，假设我有一个非常大的文件，如下所示：
awk -vc=1 'NR%1000000==0{++c}{print $0 > c".txt"}' Datafile.txt

for filename in *.txt; do mv "$filename" "Prefix_$filename"; done;

$ ls -laF
total 1391952
drwxr-xr-x 2 user.name group         40 Sep 14 15:43 ./
drwxr-xr-x 3 user.name group       4096 Sep 14 15:39 ../
-rw-r--r-- 1 user.name group 1425352817 Sep 14 14:01 really_big_file.txt

此文件有100000行，我想将其拆分为最多30000行的文件。此命令将运行拆分并在输出文件模式的末尾追加一个整数split\u files.txt.

$ split -d -l 30000 really_big_file.txt split_files.txt.

结果文件被正确分割，每个文件最多30000行
$ ls -laF
total 2783904
drwxr-xr-x 2 user.name group        156 Sep 14 15:43 ./
drwxr-xr-x 3 user.name group       4096 Sep 14 15:39 ../
-rw-r--r-- 1 user.name group 1425352817 Sep 14 14:01 really_big_file.txt
-rw-r--r-- 1 user.name group  428604626 Sep 14 15:43 split_files.txt.00
-rw-r--r-- 1 user.name group  427152423 Sep 14 15:43 split_files.txt.01
-rw-r--r-- 1 user.name group  427141443 Sep 14 15:43 split_files.txt.02
-rw-r--r-- 1 user.name group  142454325 Sep 14 15:43 split_files.txt.03


$ wc -l *.txt*
    100000 really_big_file.txt
     30000 split_files.txt.00
     30000 split_files.txt.01
     30000 split_files.txt.02
     10000 split_files.txt.03
    200000 total

如果每个零件都有相同的行号，例如22，这里是我的解决方案：

split——数字后缀=2——附加后缀=.txt-l22 file.txt文件


在前22行中获取file2.txt，在下22行中获取file3.txt
谢谢@hamruta takawale，@dror-s和@stackoverflowuser2010
split
可以做到这一点too@JohnSmith是的，我们没有很快看到这个选项。@JohnSmith我收回了它。我们如何确保行是均匀的？当然，不使用wc-l
并计算它，否则我们可以使用bash本身或awk。这是事实这是我编写脚本而没有考虑拆分的原因。你能不能通过使用<代码>拆分来更新文件的数量，而不必分割它的线？你可以使用<代码> WC-L/CODE >获取总的线条并运行这样的代码<代码> A=（<代码> WC-L YouFrase>代码>）；lines=

split

wc

a=（`wc-cyourfile`）；n=12；bytes=`echo（a-a%n）/n`|bc-l`；split-b=$bytes-d file.txt文件

system\u split.log1

system\u split.log2

附加后缀

-d

wc-l

split

-C

--附加后缀=.txt