Linux 如何在unix中比较具有不同列的文件?
我想比较Today.txt和Main.txt的文件名。 如果存在匹配项,则使用新文件(如matched.txt)打印Main.txt中匹配文件的所有6列 以及与Main.txt不匹配的文件,然后在一个新文件中列出TODAY.txt的文件名和时间,比如unmatched.txtLinux 如何在unix中比较具有不同列的文件?,linux,shell,awk,sed,grep,Linux,Shell,Awk,Sed,Grep,我想比较Today.txt和Main.txt的文件名。 如果存在匹配项,则使用新文件(如matched.txt)打印Main.txt中匹配文件的所有6列 以及与Main.txt不匹配的文件,然后在一个新文件中列出TODAY.txt的文件名和时间,比如unmatched.txt Nov 4 +CHCK01_20161104.txt 06:39 2.15M 17153 on_time Nov 4 TRIPS11_20161104.txt 09:03 0.00M 24
Nov 4 +CHCK01_20161104.txt 06:39 2.15M 17153 on_time
Nov 4 TRIPS11_20161104.txt 09:03 0.00M 24 On_Time
Nov 4 AR02_20161104.txt 09:31 0.00M 7 On_Time
Nov 4 AR01_20161104.txt 09:31 0.04M 433 On_Time
CHCK05_20161104.txt 11:10
CHCK09_20161104.txt 21:46
ifs01_20161104.txt 21:16
Main.txt
date filename timestamp space count status
Nov 4 +CHCK01_20161104.txt 06:39 2.15M 17153 on_time
Nov 4 TRIPS11_20161104.txt 09:03 0.00M 24 On_Time
Nov 4 AR02_20161104.txt 09:31 0.00M 7 On_Time
Nov 4 AR01_20161104.txt 09:31 0.04M 433 On_Time
今天.txt
filename time
CHCK01_20161104.txt 06:03
CHCK05_20161104.txt 11:10
CHCK09_20161104.txt 21:46
AR01_20161104.txt 09:36
AR02_20161104.txt 09:36
ifs01_20161104.txt 21:16
TRIPS11_20161104.txt 09:16
所需输出:
matched.txt
Nov 4 +CHCK01_20161104.txt 06:39 2.15M 17153 on_time
Nov 4 TRIPS11_20161104.txt 09:03 0.00M 24 On_Time
Nov 4 AR02_20161104.txt 09:31 0.00M 7 On_Time
Nov 4 AR01_20161104.txt 09:31 0.04M 433 On_Time
CHCK05_20161104.txt 11:10
CHCK09_20161104.txt 21:46
ifs01_20161104.txt 21:16
unmatched.txt
Nov 4 +CHCK01_20161104.txt 06:39 2.15M 17153 on_time
Nov 4 TRIPS11_20161104.txt 09:03 0.00M 24 On_Time
Nov 4 AR02_20161104.txt 09:31 0.00M 7 On_Time
Nov 4 AR01_20161104.txt 09:31 0.04M 433 On_Time
CHCK05_20161104.txt 11:10
CHCK09_20161104.txt 21:46
ifs01_20161104.txt 21:16
你能帮我一下吗
提前多谢 对于
awk
,匹配的和不匹配的各一个
$ awk 'NR==FNR{a[$1]; next} $3 in a{print > "matched.txt"}' Today.txt Main.txt
$ cat matched.txt
Nov 4 CHCK01_20161104.txt 06:39 2.15M 17153 on_time
Nov 4 TRIPS11_20161104.txt 09:03 0.00M 24 On_Time
Nov 4 AR02_20161104.txt 09:31 0.00M 7 On_Time
Nov 4 AR01_20161104.txt 09:31 0.04M 433 On_Time
$ awk 'NR==FNR{a[$3]; next} !($1 in a) && FNR>1{print > "unmatched.txt"}' Main.txt Today.txt
$ cat unmatched.txt
CHCK05_20161104.txt 11:10
CHCK09_20161104.txt 21:46
ifs01_20161104.txt 21:16
- 逻辑类似,使用第一个文件参数的必需列将数组
a
初始化为awk
- 然后,根据第二个文件的文件名是否应该出现在
a
中,打印到所需的输出文件
使用grep
和awk
组合:
$ grep -Ff <(awk 'NR>1{print $1}' Today.txt) Main.txt
Nov 4 CHCK01_20161104.txt 06:39 2.15M 17153 on_time
Nov 4 TRIPS11_20161104.txt 09:03 0.00M 24 On_Time
Nov 4 AR02_20161104.txt 09:31 0.00M 7 On_Time
Nov 4 AR01_20161104.txt 09:31 0.04M 433 On_Time
$ grep -vFf <(awk 'NR>1{print $3}' Main.txt) Today.txt | tail -n+2
CHCK05_20161104.txt 11:10
CHCK09_20161104.txt 21:46
ifs01_20161104.txt 21:16
$grep-Ff带awk
,匹配的和不匹配的各一个
$ awk 'NR==FNR{a[$1]; next} $3 in a{print > "matched.txt"}' Today.txt Main.txt
$ cat matched.txt
Nov 4 CHCK01_20161104.txt 06:39 2.15M 17153 on_time
Nov 4 TRIPS11_20161104.txt 09:03 0.00M 24 On_Time
Nov 4 AR02_20161104.txt 09:31 0.00M 7 On_Time
Nov 4 AR01_20161104.txt 09:31 0.04M 433 On_Time
$ awk 'NR==FNR{a[$3]; next} !($1 in a) && FNR>1{print > "unmatched.txt"}' Main.txt Today.txt
$ cat unmatched.txt
CHCK05_20161104.txt 11:10
CHCK09_20161104.txt 21:46
ifs01_20161104.txt 21:16
- 逻辑类似,使用第一个文件参数的必需列将数组
a
初始化为awk
- 然后,根据第二个文件的文件名是否应该出现在
a
中,打印到所需的输出文件
使用grep
和awk
组合:
$ grep -Ff <(awk 'NR>1{print $1}' Today.txt) Main.txt
Nov 4 CHCK01_20161104.txt 06:39 2.15M 17153 on_time
Nov 4 TRIPS11_20161104.txt 09:03 0.00M 24 On_Time
Nov 4 AR02_20161104.txt 09:31 0.00M 7 On_Time
Nov 4 AR01_20161104.txt 09:31 0.04M 433 On_Time
$ grep -vFf <(awk 'NR>1{print $3}' Main.txt) Today.txt | tail -n+2
CHCK05_20161104.txt 11:10
CHCK09_20161104.txt 21:46
ifs01_20161104.txt 21:16
$grep-Ffawk
救援
$ awk 'FNR==1{next}
NR==FNR{a[$1]=$2; next}
$3 in a{print; delete a[$3]}
END{for(k in a) print k,a[k] > "unmatched"}' today main > matched
$ head *matched
==> matched <==
Nov 4 CHCK01_20161104.txt 06:39 2.15M 17153 on_time
Nov 4 TRIPS11_20161104.txt 09:03 0.00M 24 On_Time
Nov 4 AR02_20161104.txt 09:31 0.00M 7 On_Time
Nov 4 AR01_20161104.txt 09:31 0.04M 433 On_Time
==> unmatched <==
ifs01_20161104.txt 21:16
CHCK09_20161104.txt 21:46
CHCK05_20161104.txt 11:10
$awk'FNR==1{next}
NR==FNR{a[$1]=$2;next}
{打印;删除[$3]}中的$3
END{for(k in a)print k,a[k]>unmatched}'today main>matched
$head*匹配
==>匹配的未匹配的awk
救援
$ awk 'FNR==1{next}
NR==FNR{a[$1]=$2; next}
$3 in a{print; delete a[$3]}
END{for(k in a) print k,a[k] > "unmatched"}' today main > matched
$ head *matched
==> matched <==
Nov 4 CHCK01_20161104.txt 06:39 2.15M 17153 on_time
Nov 4 TRIPS11_20161104.txt 09:03 0.00M 24 On_Time
Nov 4 AR02_20161104.txt 09:31 0.00M 7 On_Time
Nov 4 AR01_20161104.txt 09:31 0.04M 433 On_Time
==> unmatched <==
ifs01_20161104.txt 21:16
CHCK09_20161104.txt 21:46
CHCK05_20161104.txt 11:10
$awk'FNR==1{next}
NR==FNR{a[$1]=$2;next}
{打印;删除[$3]}中的$3
END{for(k in a)print k,a[k]>unmatched}'today main>matched
$head*匹配
==>matched unmatched以下是使用管道电源的答案
tail -n +2 /tmp/today | while read a b; do \
if ! grep $a /tmp/main >> /tmp/matched; then \
echo $a $b; \
fi; \
done > /tmp/unmatched
解释
打印/tmp/今天,第一行除外
tail -n +2 /tmp/today
在两个变量中读取文件
while read a b
grep/tmp/main中的$a并存储在文件中
grep $a /tmp/main >> /tmp/matched
如果grep返回非零,则回显$a和$b
echo $a $b
输出:
root@do:~# cat /tmp/matched
Nov 4 CHCK01_20161104.txt 06:39 2.15M 17153 on_time
Nov 4 AR01_20161104.txt 09:31 0.04M 433 On_Time
Nov 4 AR02_20161104.txt 09:31 0.00M 7 On_Time
Nov 4 TRIPS11_20161104.txt 09:03 0.00M 24 On_Time
root@do:~# cat /tmp/unmatched
CHCK05_20161104.txt 11:10
CHCK09_20161104.txt 21:46
ifs01_20161104.txt 21:16
root@do:~#
下面是使用管道电源的答案
tail -n +2 /tmp/today | while read a b; do \
if ! grep $a /tmp/main >> /tmp/matched; then \
echo $a $b; \
fi; \
done > /tmp/unmatched
解释
打印/tmp/今天,第一行除外
tail -n +2 /tmp/today
在两个变量中读取文件
while read a b
grep/tmp/main中的$a并存储在文件中
grep $a /tmp/main >> /tmp/matched
如果grep返回非零,则回显$a和$b
echo $a $b
输出:
root@do:~# cat /tmp/matched
Nov 4 CHCK01_20161104.txt 06:39 2.15M 17153 on_time
Nov 4 AR01_20161104.txt 09:31 0.04M 433 On_Time
Nov 4 AR02_20161104.txt 09:31 0.00M 7 On_Time
Nov 4 TRIPS11_20161104.txt 09:03 0.00M 24 On_Time
root@do:~# cat /tmp/unmatched
CHCK05_20161104.txt 11:10
CHCK09_20161104.txt 21:46
ifs01_20161104.txt 21:16
root@do:~#
对于制表符分隔的输出,您可以设置-vofs='\t'
,我有个问题要问您。我正在打印inprogress目录中带有加号(+)的文件,如示例所示。11月4日+CHCK01_20161104.txt 06:39 2.15M 17153准时正在进行的文件将附加加号(+),其他文件将在main.txt中使用相同的名称。我希望在我所需的输出中包含+symbol的文件和其他文件(匹配),请建议如何比较main.txt和Today.txt以获得匹配和未匹配的.txt?谢谢!对于制表符分隔的输出,您可以设置-vofs='\t'
,我有个问题要问您。我正在打印inprogress目录中带有加号(+)的文件,如示例所示。11月4日+CHCK01_20161104.txt 06:39 2.15M 17153准时正在进行的文件将附加加号(+),其他文件将在main.txt中使用相同的名称。我希望在我所需的输出中包含+symbol的文件和其他文件(匹配),请建议如何比较main.txt和Today.txt以获得匹配和未匹配的.txt?谢谢!