awk比较两个文件并合并输出 原始问题
我有两个文件awk比较两个文件并合并输出 原始问题,awk,Awk,我有两个文件1.csv和2.csv $ cat alpha1.csv AKTEL_BANGLADESH,BANGLADESH,Alphanumeric_A_MSISDN_blocking,1095 ALJAWAL_SAUDI_TELECOM_COMPANY,SAUDI_ARABIA,Alphanumeric_A_MSISDN_blocking,9592 B-MOBILE_BRUNEI,BRUNEI,Alphanumeric_A_MSISDN_blocking,3 $ cat
1.csv
和2.csv
$ cat alpha1.csv
AKTEL_BANGLADESH,BANGLADESH,Alphanumeric_A_MSISDN_blocking,1095
ALJAWAL_SAUDI_TELECOM_COMPANY,SAUDI_ARABIA,Alphanumeric_A_MSISDN_blocking,9592
B-MOBILE_BRUNEI,BRUNEI,Alphanumeric_A_MSISDN_blocking,3
$ cat SPAM1.csv
AIN_AIS_GLOBAL_COMMUNICATIONS,THAILAND,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),1
AKTEL_BANGLADESH,BANGLADESH,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),16
ALJAWAL_SAUDI_TELECOM_COMPANY,SAUDI_ARABIA,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),10593
AT&T_WIRELESS,UNITED_STATES,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),218
BANGLALINK_SHEBA_BANGLADESH,BANGLADESH,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),111
1.csv:-
AK,BA,Alpha,1095
ALL,SA,Alpha,9592
2.csv:-
AK,BA,SPAM,10
我想合并文件,这样它将打印输出文件如下
输出:-
AK,BA,Alpha,1095,SPAM,10
AL,SA,Alpha,9592,NA,NA
更新问题 我有两个文件
alpha1.csv
和SPAM1.csv
$ cat alpha1.csv
AKTEL_BANGLADESH,BANGLADESH,Alphanumeric_A_MSISDN_blocking,1095
ALJAWAL_SAUDI_TELECOM_COMPANY,SAUDI_ARABIA,Alphanumeric_A_MSISDN_blocking,9592
B-MOBILE_BRUNEI,BRUNEI,Alphanumeric_A_MSISDN_blocking,3
$ cat SPAM1.csv
AIN_AIS_GLOBAL_COMMUNICATIONS,THAILAND,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),1
AKTEL_BANGLADESH,BANGLADESH,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),16
ALJAWAL_SAUDI_TELECOM_COMPANY,SAUDI_ARABIA,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),10593
AT&T_WIRELESS,UNITED_STATES,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),218
BANGLALINK_SHEBA_BANGLADESH,BANGLADESH,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),111
预期产出:
AIN_AIS_GLOBAL_COMMUNICATIONS,THAILAND,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),1,**NA,NA**
AKTEL_BANGLADESH,BANGLADESH,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),16,Alphanumeric_A_MSISDN_blocking,1095
ALJAWAL_SAUDI_TELECOM_COMPANY,SAUDI_ARABIA,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),10593,Alphanumeric_A_MSISDN_blocking,9592
AT&T_WIRELESS,UNITED_STATES,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),218,**NA,NA**
BANGLALINK_SHEBA_BANGLADESH,BANGLADESH,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),111,**NA,NA**
B-MOBILE_BRUNEI,BRUNEI,**NA,NA**,Alphanumeric_A_MSISDN_blocking,3
我的命令仅打印文件2与文件1的匹配案例,而不打印不匹配案例:
$ awk 'BEGIN{FS=OFS=","} FNR==NR {a[$1,$2]=$3 FS $4; next} {print $0, (i=a[$1,$2]?a[$1,$2]:"NA,NA")}' alpha1.csv SPAM1.csv
AIN_AIS_GLOBAL_COMMUNICATIONS,THAILAND,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),1,NA,NA
AKTEL_BANGLADESH,BANGLADESH,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),16,Alphanumeric_A_MSISDN_blocking,1095
ALJAWAL_SAUDI_TELECOM_COMPANY,SAUDI_ARABIA,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),10593,Alphanumeric_A_MSISDN_blocking,9592
AT&T_WIRELESS,UNITED_STATES,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),218,NA,NA
BANGLALINK_SHEBA_BANGLADESH,BANGLADESH,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),111,NA,NA
您可以使用此选项,例如:
$ awk 'BEGIN{FS=OFS=","} FNR==NR {a[$1,$2]=$3 FS $4; next} {print $0, (($1,$2) in a?a[$1,$2]:"NA,NA")}' f2 f1
AK,BA,Alpha,1095,SPAM,10
ALL,SA,Alpha,9592,NA,NA
解释
将输入和输出字段分隔符设置为逗号BEGIN{FS=OFS=“,”}
在数组FNR==NR{a[$1,$2]=$3fs$4;next}
中存储第三个和第四个值,数组的索引是元组a[]
($1,$2)
打印该行以及数组中匹配的项。如果没有这样的元素,则打印{print$0,($1,$2)在a?a[$1,$2]:“NA,NA”)}
NA,NA
i=
的目的是什么?另外,a[$1,$2]
作为三元表达式中的条件,如果数组包含的值的计算结果为0,则该条件的计算结果将为false-您确实应该在a中测试($1,$2)。非常正确。。。我不知怎的认为需要给变量赋值。那么,{print$0,($1,$2)在a?a[$1,$2]:“NA,NA”)}
是正确的吗?它工作得很好,但我不知道是否还有其他不符合要求的地方。是的,就是这样做的。软呢帽,我想它还是不起作用。如果我更改了这两个文件的顺序,那么输出中的行数是不同的。因此,在这两种情况下,它都会忽略一些数据。@user2761031您的意思是什么?我编写的代码基于示例输入和预期输出。为了澄清,请使用更多信息更新您的问题。但如果这是一个新的要求,你最好。