Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/image/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
awk比较两个文件并合并输出 原始问题_Awk - Fatal编程技术网

awk比较两个文件并合并输出 原始问题

awk比较两个文件并合并输出 原始问题,awk,Awk,我有两个文件1.csv和2.csv $ cat alpha1.csv AKTEL_BANGLADESH,BANGLADESH,Alphanumeric_A_MSISDN_blocking,1095 ALJAWAL_SAUDI_TELECOM_COMPANY,SAUDI_ARABIA,Alphanumeric_A_MSISDN_blocking,9592 B-MOBILE_BRUNEI,BRUNEI,Alphanumeric_A_MSISDN_blocking,3 $ cat

我有两个文件
1.csv
2.csv

$ cat alpha1.csv  
AKTEL_BANGLADESH,BANGLADESH,Alphanumeric_A_MSISDN_blocking,1095  
ALJAWAL_SAUDI_TELECOM_COMPANY,SAUDI_ARABIA,Alphanumeric_A_MSISDN_blocking,9592  
B-MOBILE_BRUNEI,BRUNEI,Alphanumeric_A_MSISDN_blocking,3  


$ cat SPAM1.csv  
AIN_AIS_GLOBAL_COMMUNICATIONS,THAILAND,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),1  
AKTEL_BANGLADESH,BANGLADESH,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),16  
ALJAWAL_SAUDI_TELECOM_COMPANY,SAUDI_ARABIA,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),10593  
AT&T_WIRELESS,UNITED_STATES,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),218  
BANGLALINK_SHEBA_BANGLADESH,BANGLADESH,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),111  
1.csv:-

AK,BA,Alpha,1095  
ALL,SA,Alpha,9592  
2.csv:-

AK,BA,SPAM,10  
我想合并文件,这样它将打印输出文件如下

输出:-

AK,BA,Alpha,1095,SPAM,10  
AL,SA,Alpha,9592,NA,NA  

更新问题 我有两个文件
alpha1.csv
SPAM1.csv

$ cat alpha1.csv  
AKTEL_BANGLADESH,BANGLADESH,Alphanumeric_A_MSISDN_blocking,1095  
ALJAWAL_SAUDI_TELECOM_COMPANY,SAUDI_ARABIA,Alphanumeric_A_MSISDN_blocking,9592  
B-MOBILE_BRUNEI,BRUNEI,Alphanumeric_A_MSISDN_blocking,3  


$ cat SPAM1.csv  
AIN_AIS_GLOBAL_COMMUNICATIONS,THAILAND,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),1  
AKTEL_BANGLADESH,BANGLADESH,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),16  
ALJAWAL_SAUDI_TELECOM_COMPANY,SAUDI_ARABIA,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),10593  
AT&T_WIRELESS,UNITED_STATES,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),218  
BANGLALINK_SHEBA_BANGLADESH,BANGLADESH,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),111  
预期产出:

AIN_AIS_GLOBAL_COMMUNICATIONS,THAILAND,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),1,**NA,NA**  
AKTEL_BANGLADESH,BANGLADESH,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),16,Alphanumeric_A_MSISDN_blocking,1095  
ALJAWAL_SAUDI_TELECOM_COMPANY,SAUDI_ARABIA,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),10593,Alphanumeric_A_MSISDN_blocking,9592  
AT&T_WIRELESS,UNITED_STATES,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),218,**NA,NA**  
BANGLALINK_SHEBA_BANGLADESH,BANGLADESH,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),111,**NA,NA**  
B-MOBILE_BRUNEI,BRUNEI,**NA,NA**,Alphanumeric_A_MSISDN_blocking,3  
我的命令仅打印文件2与文件1的匹配案例,而不打印不匹配案例:

$ awk 'BEGIN{FS=OFS=","} FNR==NR {a[$1,$2]=$3 FS $4; next} {print $0, (i=a[$1,$2]?a[$1,$2]:"NA,NA")}' alpha1.csv SPAM1.csv  
AIN_AIS_GLOBAL_COMMUNICATIONS,THAILAND,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),1,NA,NA  
AKTEL_BANGLADESH,BANGLADESH,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),16,Alphanumeric_A_MSISDN_blocking,1095  
ALJAWAL_SAUDI_TELECOM_COMPANY,SAUDI_ARABIA,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),10593,Alphanumeric_A_MSISDN_blocking,9592  
AT&T_WIRELESS,UNITED_STATES,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),218,NA,NA  
BANGLALINK_SHEBA_BANGLADESH,BANGLADESH,SPAM_CHAIN_SMS_REJECT(Spam_Detection_and_Blocking),111,NA,NA  

您可以使用此选项,例如:

$ awk 'BEGIN{FS=OFS=","} FNR==NR {a[$1,$2]=$3 FS $4; next} {print $0, (($1,$2) in a?a[$1,$2]:"NA,NA")}' f2 f1
AK,BA,Alpha,1095,SPAM,10
ALL,SA,Alpha,9592,NA,NA
解释
  • BEGIN{FS=OFS=“,”}
    将输入和输出字段分隔符设置为逗号
  • FNR==NR{a[$1,$2]=$3fs$4;next}
    在数组
    a[]
    中存储第三个和第四个值,数组的索引是元组
    ($1,$2)
  • {print$0,($1,$2)在a?a[$1,$2]:“NA,NA”)}
    打印该行以及数组中匹配的项。如果没有这样的元素,则打印
    NA,NA

这不是你上一个问题的副本吗?是的,但非常简单,我正在删除上一个问题。啊。。。然后想法是通过第一个和第二个字段比较文件,对吗?是的,但重点是不应该忽略细节,如果与第二个文件不匹配,那么应该在该位置打印na或0。我的命令仅打印匹配的案例$awk-F,'BEGIN{OFS=“,”;}NR==FNR{a[$1,$2]=$0;}NR!=FNR{print$0,a[$1,$2],$3;}'1.csv 2.csv|cut-d,-f1,2,3,4,7,8输出:AK,BA,SPAM,10,Alpha,1095
i=
的目的是什么?另外,
a[$1,$2]
作为三元表达式中的条件,如果数组包含的值的计算结果为0,则该条件的计算结果将为false-您确实应该在a中测试
($1,$2)。非常正确。。。我不知怎的认为需要给变量赋值。那么,
{print$0,($1,$2)在a?a[$1,$2]:“NA,NA”)}
是正确的吗?它工作得很好,但我不知道是否还有其他不符合要求的地方。是的,就是这样做的。软呢帽,我想它还是不起作用。如果我更改了这两个文件的顺序,那么输出中的行数是不同的。因此,在这两种情况下,它都会忽略一些数据。@user2761031您的意思是什么?我编写的代码基于示例输入和预期输出。为了澄清,请使用更多信息更新您的问题。但如果这是一个新的要求,你最好。