使用awk比较2个文件，并打印匹配行和不匹配行_Awk

使用awk比较2个文件，并打印匹配行和不匹配行

awk

使用awk比较2个文件，并打印匹配行和不匹配行,awk,Awk,我有两个CSV文件，其中包含匹配字段和非匹配字段。我想比较第二列、第三列和第四列，并在此基础上将输出作为匹配（M）、不匹配（NM）和未找到（NF带NULL）列 a）如果第2列、第3列和第4列完全匹配，则为匹配项。 b）如果第2列和第3列匹配，但第4列不匹配，则应为非匹配。 c）如果第2列或第3列本身不匹配，则应为未找到的情况 1.csv 2.csv 期望输出我曾尝试使用NR、FNR将awk关联数组组合为$2、$3和$4，但未能获得所需的结果。有些记录，如文件2.csv的第5行，只有属

我有两个CSV文件，其中包含匹配字段和非匹配字段。
我想比较第二列、第三列和第四列，并在此基础上将输出作为匹配（M）、不匹配（NM）和未找到（NF带NULL）列

a）如果第2列、第3列和第4列完全匹配，则为匹配项。
b）如果第2列和第3列匹配，但第4列不匹配，则应为非匹配。
c）如果第2列或第3列本身不匹配，则应为未找到的情况

1.csv 2.csv 期望输出我曾尝试使用NR、FNR将awk关联数组组合为$2、$3和$4，但未能获得所需的结果。
有些记录，如文件2.csv的第5行，只有属性（没有类对象），其值在代码失败的第3列中维护。NULL或Blank可用于此类记录的$2。

使用GNU awk

 awk -F, 'NR==FNR { map[FNR]=$0;next } { split(map[FNR],map1,",");if ( $2==map1[2] && $3==map1[3] && $4==map1[4]) { print $0",M" } else if ( $2==map1[2] || $3==map1[3] && $4!=map1[4] ) { print $0",NM" } else { print $0",NF" } }' 1.csv 2.csv

说明：

awk -F, 'NR==FNR {                                                           # Set the field delimiter to ","
                   map[FNR]=$0;                                              # When processing the first file (NR==FNR), create an array map with the file number record as the index and the line as the value
                   next 
                 } 
                 { 
                   split(map[FNR],map1,",");                                  # For the second file, split the array entry into map1 using "," as the delimiter
                   if ( $2==map1[2] && $3==map1[3] && $4==map1[4]) { 
                      print $0",M"                                            # Print "M" entries based on the logic outlined utilising the split entries in map1.
                   } 
                   else if ( $2==map1[2] || $3==map1[3] && $4!=map1[4] ) {    # Print the "NM" entries based on the logic outlined
                      print $0",NM" 
                   } 
                   else { 
                      print $0",NF"                                           # Print the "NF" entries in all other cases.
                   } 
                  }' 1.csv 2.csv

Raman感谢您尝试这一点，但当我运行此程序时，我得到以下输出-SL_12332，个人详细信息，姓名，Samantha，NF SL_12332，个人详细信息，地址，孟买公园街，NF SL_12332，个人详细信息，年龄，22岁，NF SL_12332，个人详细信息，性别，F，NF身高，5.8英尺，米SL_12332，班级，分区，3D，SL_12332，候选人详细信息，体育，stateLevelBasketballrepresentation，NF，NF，，，M您正在运行哪个版本的awk？当使用您的测试数据在本地进行测试时，这可以正常工作。GNU Awk 5.0.1，API:2.0（GNU MPFR 4.0.2，GNU MP 6.1.2）好的，我正在运行4.0.2，我知道问题出在哪里了。我再看一看，请不要多贴-

Class,Attributes,2344,12332,Remarks  
personal_details,name,Andrew,Samantha,NM  
personal_details,address,G-101 SSR New-Delhi,Park Street Mumbai,NM  
personal_details,Age,22Y,22Y,M  
personal_details,sex,M,F,NM  
personal_details,height,5.8 ft,NULL,NF  
education,Roll_number,22345,NULL,NF  
education,stream,ScienceandMaths,NULL,NF  
class,section,3D,3D,M  
NULL,height,NULL,5.3 ft,NF  
candidate_Other_details,NULL,sports,stateLevelBasketballrepresentation,NF

 awk -F, 'NR==FNR { map[FNR]=$0;next } { split(map[FNR],map1,",");if ( $2==map1[2] && $3==map1[3] && $4==map1[4]) { print $0",M" } else if ( $2==map1[2] || $3==map1[3] && $4!=map1[4] ) { print $0",NM" } else { print $0",NF" } }' 1.csv 2.csv

awk -F, 'NR==FNR {                                                           # Set the field delimiter to ","
                   map[FNR]=$0;                                              # When processing the first file (NR==FNR), create an array map with the file number record as the index and the line as the value
                   next 
                 } 
                 { 
                   split(map[FNR],map1,",");                                  # For the second file, split the array entry into map1 using "," as the delimiter
                   if ( $2==map1[2] && $3==map1[3] && $4==map1[4]) { 
                      print $0",M"                                            # Print "M" entries based on the logic outlined utilising the split entries in map1.
                   } 
                   else if ( $2==map1[2] || $3==map1[3] && $4!=map1[4] ) {    # Print the "NM" entries based on the logic outlined
                      print $0",NM" 
                   } 
                   else { 
                      print $0",NF"                                           # Print the "NF" entries in all other cases.
                   } 
                  }' 1.csv 2.csv