Linux 如何比较两个未排序且行稍有不同的文本文件_Linux_Windows_Diff

Linux 如何比较两个未排序且行稍有不同的文本文件

linux windows

Linux 如何比较两个未排序且行稍有不同的文本文件,linux,windows,diff,Linux,Windows,Diff,我有以下两个文件： File1 the quick brown fox jumps jumps over the very lazy dog brown fox jumps over the lorem ipsum dolor lorem ipsum dolor 文件2 jumps over the very lazy *chicken* brown fox jumps over the the quick brown fox *swims* an apple a day keeps the

我有以下两个文件：

File1

the quick brown fox jumps
jumps over the very lazy dog
brown fox jumps over the
lorem ipsum dolor

lorem ipsum dolor

文件2

jumps over the very lazy *chicken*
brown fox jumps over the
the quick brown fox *swims*
an apple a day keeps the doctor away

an apple a day keeps the doctor away

我需要区分两个文件，并从中提取两个文件中存在的唯一行

但问题是：

两个文件中的所有行都未排序

行可以（也可以不）相同

在比较行时，前四个词很重要。第五个字是“不在乎”。在上面的示例中，文件1中带有鸡肉和游泳的线条在文件2中被视为“存在”

因此，根据上述条件，预期输出为：

File1

the quick brown fox jumps
jumps over the very lazy dog
brown fox jumps over the
lorem ipsum dolor

lorem ipsum dolor

文件2

jumps over the very lazy *chicken*
brown fox jumps over the
the quick brown fox *swims*
an apple a day keeps the doctor away

an apple a day keeps the doctor away

有人知道一种快速有效地区分这一点的方法吗？（最短的解决方案，易于阅读的输出）我尝试的是使用excel将两个文件并排进行可视化比较。但是我要对一对日志文件做这个。要花很长时间才能把它们全部完成

如有更好的建议，我们将不胜感激

谢谢并致以最诚挚的问候。

为什么不为这项工作编写一个小程序，让它在两种平台上都能工作？它很容易在一些独立于平台的C代码中实现：

#包括
#包括
#包括
typedef结构行
{
字符*行；
字符*代币；
大小词；
常量字符**字；
}线路；
字符*复制字符串（常量字符*s）
{
char*r=malloc（strlen+1）；
如果（！r）退出（退出失败）；
strcpy（r，s）；
返回r；
}
整数比较线（常数无效*a，常数无效*b）
{
常数线*line1=a；
常数线*line2=b；
大小\u t mw=line1->nwords；
如果（line2->nwordsnwords；
对于（尺寸i=0；i单词[i]，第2行->单词[i]）；
if（r）返回r；
}
如果（line1->nwords>mw）返回1；
如果（line2->nwords>mw）返回-1；
返回0；
}
大小\u t读取文件（行**linesptr，文件*f，大小\u t字数）
{
大小\u t上限=256；
尺寸n=0；
char-buf[1024]；
行*行=malloc（cap*sizeof（行））；
如果（！行）退出（退出失败）；
while（fgets（buf，1024，f））
{
如果（n==上限）
{
上限*=2；
lines=realloc（lines，cap*sizeof（Line））；
如果（！行）退出（退出失败）；
}
行[n]。行=复制字符串（buf）；
行[n]。令牌=复制字符串（buf）；
行[n].words=malloc（wordCount*sizeof（const char*））；
如果（！lines[n].words）退出（退出失败）；
尺寸c=0；
char*word=strtok（行[n]。标记，“\t”）；
while（word&&c


如果您使用gcc，请使用
gcc-s-g0-O2-std=c11-Wall-Wextra-pedantic-ofinduniq finduniq.c

演示：
$。/finduniq 4 test1.txt test2.txt
test2.txt：每天一个苹果，医生远离你
test1.txt:lorem ipsum dolor
$./finduniq 6 test1.txt test2.txt
test2.txt：每天一个苹果，医生远离你
test2.txt：跳过非常懒惰的*小鸡*
test1.txt：跳过非常懒的狗
test1.txt:lorem ipsum dolor
test2.txt：快速棕色狐狸*游泳*
test1.txt：棕色狐狸快速跳跃
$diff file1 file2 | grep”“| sed-E的//^（）//g'| sort | uniq-w5-u

diff
-比较文件file1
和file2

grep
和sed
删除额外的行和符号，然后对字符串进行排序
uniq
输出唯一字符串（-w5
比较行中的前5个字符，尝试解决问题#问题列表中的3个）
，而此代码可能会回答问题，提供关于如何和/或为什么解决问题的附加上下文将提高答案的长期价值。我添加了一个描述。非常简短和简单：D将在我回来时尝试这个。T