使用awk比较2个文件,并打印匹配行和不匹配行

使用awk比较2个文件,并打印匹配行和不匹配行,awk,Awk,我有两个CSV文件,其中包含匹配字段和非匹配字段。 我想比较第二列、第三列和第四列,并在此基础上将输出作为匹配(M)、不匹配(NM)和未找到(NF带NULL)列 a) 如果第2列、第3列和第4列完全匹配,则为匹配项。 b) 如果第2列和第3列匹配,但第4列不匹配,则应为非匹配。 c) 如果第2列或第3列本身不匹配,则应为未找到的情况 1.csv 2.csv 期望输出 我曾尝试使用NR、FNR将awk关联数组组合为$2、$3和$4,但未能获得所需的结果。 有些记录,如文件2.csv的第5行,只有属

我有两个CSV文件,其中包含匹配字段和非匹配字段。
我想比较第二列、第三列和第四列,并在此基础上将输出作为匹配(M)、不匹配(NM)和未找到(NF带NULL)列

a) 如果第2列、第3列和第4列完全匹配,则为匹配项。
b) 如果第2列和第3列匹配,但第4列不匹配,则应为非匹配。
c) 如果第2列或第3列本身不匹配,则应为未找到的情况

1.csv 2.csv 期望输出 我曾尝试使用NR、FNR将awk关联数组组合为$2、$3和$4,但未能获得所需的结果。
有些记录,如文件2.csv的第5行,只有属性(没有类对象),其值在代码失败的第3列中维护。NULL或Blank可用于此类记录的$2。

使用GNU awk

 awk -F, 'NR==FNR { map[FNR]=$0;next } { split(map[FNR],map1,",");if ( $2==map1[2] && $3==map1[3] && $4==map1[4]) { print $0",M" } else if ( $2==map1[2] || $3==map1[3] && $4!=map1[4] ) { print $0",NM" } else { print $0",NF" } }' 1.csv 2.csv
说明:

awk -F, 'NR==FNR {                                                           # Set the field delimiter to ","
                   map[FNR]=$0;                                              # When processing the first file (NR==FNR), create an array map with the file number record as the index and the line as the value
                   next 
                 } 
                 { 
                   split(map[FNR],map1,",");                                  # For the second file, split the array entry into map1 using "," as the delimiter
                   if ( $2==map1[2] && $3==map1[3] && $4==map1[4]) { 
                      print $0",M"                                            # Print "M" entries based on the logic outlined utilising the split entries in map1.
                   } 
                   else if ( $2==map1[2] || $3==map1[3] && $4!=map1[4] ) {    # Print the "NM" entries based on the logic outlined
                      print $0",NM" 
                   } 
                   else { 
                      print $0",NF"                                           # Print the "NF" entries in all other cases.
                   } 
                  }' 1.csv 2.csv

Raman感谢您尝试这一点,但当我运行此程序时,我得到以下输出-SL_12332,个人详细信息,姓名,Samantha,NF SL_12332,个人详细信息,地址,孟买公园街,NF SL_12332,个人详细信息,年龄,22岁,NF SL_12332,个人详细信息,性别,F,NF身高,5.8英尺,米SL_12332,班级,分区,3D,SL_12332,候选人详细信息,体育,stateLevelBasketballrepresentation,NF,NF,,,M您正在运行哪个版本的awk?当使用您的测试数据在本地进行测试时,这可以正常工作。GNU Awk 5.0.1,API:2.0(GNU MPFR 4.0.2,GNU MP 6.1.2)好的,我正在运行4.0.2,我知道问题出在哪里了。我再看一看,请不要多贴-
Class,Attributes,2344,12332,Remarks  
personal_details,name,Andrew,Samantha,NM  
personal_details,address,G-101 SSR New-Delhi,Park Street Mumbai,NM  
personal_details,Age,22Y,22Y,M  
personal_details,sex,M,F,NM  
personal_details,height,5.8 ft,NULL,NF  
education,Roll_number,22345,NULL,NF  
education,stream,ScienceandMaths,NULL,NF  
class,section,3D,3D,M  
NULL,height,NULL,5.3 ft,NF  
candidate_Other_details,NULL,sports,stateLevelBasketballrepresentation,NF
 awk -F, 'NR==FNR { map[FNR]=$0;next } { split(map[FNR],map1,",");if ( $2==map1[2] && $3==map1[3] && $4==map1[4]) { print $0",M" } else if ( $2==map1[2] || $3==map1[3] && $4!=map1[4] ) { print $0",NM" } else { print $0",NF" } }' 1.csv 2.csv
awk -F, 'NR==FNR {                                                           # Set the field delimiter to ","
                   map[FNR]=$0;                                              # When processing the first file (NR==FNR), create an array map with the file number record as the index and the line as the value
                   next 
                 } 
                 { 
                   split(map[FNR],map1,",");                                  # For the second file, split the array entry into map1 using "," as the delimiter
                   if ( $2==map1[2] && $3==map1[3] && $4==map1[4]) { 
                      print $0",M"                                            # Print "M" entries based on the logic outlined utilising the split entries in map1.
                   } 
                   else if ( $2==map1[2] || $3==map1[3] && $4!=map1[4] ) {    # Print the "NM" entries based on the logic outlined
                      print $0",NM" 
                   } 
                   else { 
                      print $0",NF"                                           # Print the "NF" entries in all other cases.
                   } 
                  }' 1.csv 2.csv