Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/bash/17.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Windows 如何批量删除大型文本文件上的重复行_Windows_Bash_Batch File_Awk_Cygwin - Fatal编程技术网

Windows 如何批量删除大型文本文件上的重复行

Windows 如何批量删除大型文本文件上的重复行,windows,bash,batch-file,awk,cygwin,Windows,Bash,Batch File,Awk,Cygwin,我有一个超过50GB的文本文件。它包含许多行,每行平均约15个字符。我希望每一行都是唯一的(区分大小写)。因此,如果一行与另一行完全相同,则必须将其删除,而不改变其他行的顺序或以任何方式对文件进行排序 我的问题与其他问题不同,因为我有一个巨大的文件,无法用我搜索的其他解决方案处理 我试过: awk !seen[$0]++ bigtextfile.txt > dublicatesremoved.txt 它启动得又快又好,但很快我就出现以下错误: awk: (FILENAME=bigtext

我有一个超过50GB的文本文件。它包含许多行,每行平均约15个字符。我希望每一行都是唯一的(区分大小写)。因此,如果一行与另一行完全相同,则必须将其删除,而不改变其他行的顺序或以任何方式对文件进行排序

我的问题与其他问题不同,因为我有一个巨大的文件,无法用我搜索的其他解决方案处理

我试过:

awk !seen[$0]++ bigtextfile.txt > dublicatesremoved.txt
它启动得又快又好,但很快我就出现以下错误:

awk: (FILENAME=bigtextfile.txt FNR=19083509) fatal: more_nodes: nextfree: can't allocate 4000 bytes of memory (Not enough space)
当输出文件约为200MB时,会出现上述错误


有没有其他快速方法可以在windows上执行相同的操作?

您可以在UNIX机箱或windows上的Cygwin上执行此操作:

$ cat file
Speed, bonnie boat, like a bird on the wing,
Onward! the sailors cry;
Carry the lad that's born to be King
Over the sea to Skye.

Loud the winds howl, loud the waves roar,
Speed, bonnie boat, like a bird on the wing,
Thunderclaps rend the air;
Onward! the sailors cry;
Baffled, our foes stand by the shore,
Carry the lad that's born to be King
Follow they will not dare.
Over the sea to Skye.

上面试图一次处理整个文件的唯一命令是
sort
,而
sort
旨在使用分页等方法来精确处理大文件的分页操作(请参阅),因此我认为这是您能够做到这一点的最佳方法

cat-n文件开始
,然后将每个命令一次一个地添加到管道中,以查看它在做什么(见下文)但它只是先添加行号,这样我们就可以按内容进行唯一排序以获得唯一值,然后按原始行号排序以恢复原始行号,然后删除我们在第一步添加的行号:

$ cat -n file
     1  Speed, bonnie boat, like a bird on the wing,
     2  Onward! the sailors cry;
     3  Carry the lad that's born to be King
     4  Over the sea to Skye.
     5
     6  Loud the winds howl, loud the waves roar,
     7  Speed, bonnie boat, like a bird on the wing,
     8  Thunderclaps rend the air;
     9  Onward! the sailors cry;
    10  Baffled, our foes stand by the shore,
    11  Carry the lad that's born to be King
    12  Follow they will not dare.
    13  Over the sea to Skye.
    14


您有一个50GB的日志文件(如果它是日志文件),那么您需要认真寻找一些日志旋转器来帮助您保持环境整洁。它不是日志文件Windows没有
awk
,因此,我认为可以肯定地说,他们可以安装GNU
sort
uniq
。发布示例输入/输出有助于展示您的需求。你能在Windows上安装Cygwin以便运行UNIX工具吗?同样,发布一些示例输入/输出作为获取帮助的起点@非常感谢您提供的信息,但我不知道Windows有哪些工具可用,也不知道Windows引用规则,也不知道如何将一个命令的输出传递给另一个命令的输入,等等,所以我个人无法帮助提供直接在Windows上运行的答案,但希望其他人可以。您是否尝试过一个解决方案?特别是从?几秒钟后,我出现以下错误:排序:字符串比较失败:无效或不完整的多字节或宽字符排序:设置LC_ALL='C'以解决此问题。我确实尝试了“LC_ALL='C',但出现了相同的错误,我是否做错了什么?哦,忘记导出VAR您可能还需要运行
dos2unix
或类似程序(例如
sed的/\r$/'文件
)如果输入文件是在Windows上创建的,并且包含
\r\n
行结束符而不是
\n
s,那么我不认为您会这样做,因为显示的命令应该像对待任何其他字符一样对待
\r
s。它对你有用吗?我尝试了你的建议,它填满了我的ssd,所以我不得不取消它,现在我释放了更多的空间,我会再试一次。(我有70GB以上的可用空间,但出于某些原因需要更多空间)。结束后我会在这里汇报。
$ cat -n file
     1  Speed, bonnie boat, like a bird on the wing,
     2  Onward! the sailors cry;
     3  Carry the lad that's born to be King
     4  Over the sea to Skye.
     5
     6  Loud the winds howl, loud the waves roar,
     7  Speed, bonnie boat, like a bird on the wing,
     8  Thunderclaps rend the air;
     9  Onward! the sailors cry;
    10  Baffled, our foes stand by the shore,
    11  Carry the lad that's born to be King
    12  Follow they will not dare.
    13  Over the sea to Skye.
    14
$ cat -n file | sort -k2 -u
     5
    10  Baffled, our foes stand by the shore,
     3  Carry the lad that's born to be King
    12  Follow they will not dare.
     6  Loud the winds howl, loud the waves roar,
     2  Onward! the sailors cry;
     4  Over the sea to Skye.
     1  Speed, bonnie boat, like a bird on the wing,
     8  Thunderclaps rend the air;
$ cat -n file | sort -k2 -u | sort -n
     1  Speed, bonnie boat, like a bird on the wing,
     2  Onward! the sailors cry;
     3  Carry the lad that's born to be King
     4  Over the sea to Skye.
     5
     6  Loud the winds howl, loud the waves roar,
     8  Thunderclaps rend the air;
    10  Baffled, our foes stand by the shore,
    12  Follow they will not dare.
$ cat -n file | sort -k2 -u | sort -n | cut -f2-
Speed, bonnie boat, like a bird on the wing,
Onward! the sailors cry;
Carry the lad that's born to be King
Over the sea to Skye.

Loud the winds howl, loud the waves roar,
Thunderclaps rend the air;
Baffled, our foes stand by the shore,
Follow they will not dare.