Python 合并两个日期相同的文件

Python 合并两个日期相同的文件,python,bash,Python,Bash,我有两个文件,file1和file2,我需要根据日期将它们合并为filex。以下是一个例子: 文件1: 20150122,735620,iamSelected,CIG,20150122,735620,4.40902,-0.4255319148934609,-3.0,iamSelected,GRH,20150122,735620,0.62,-3.0 20150123,735621,iamSelected,A,20150123,735621,5,6,iamSelected,AA,20150123,7

我有两个文件,
file1
file2
,我需要根据日期将它们合并为
filex
。以下是一个例子:

文件1:

20150122,735620,iamSelected,CIG,20150122,735620,4.40902,-0.4255319148934609,-3.0,iamSelected,GRH,20150122,735620,0.62,-3.0
20150123,735621,iamSelected,A,20150123,735621,5,6,iamSelected,AA,20150123,735621,7,6
20150124,735622,iamSelected,B,20150124,735622,7,-3
20150125,735623,iamSelected,K,20150125,735622,10,6.5
文件2:

20150122,735620,iamSelected,CIGG,20150122,735620,4.40902,-0.4255319148934609,-3.0
20150123,735621,iamSelected,A,20150123,735621,5,6,iamSelected,AA,20150123,735621,7,6
20150125,735623,iamSelected,B,20150125,735623,7,-3
20150126,735624,iamSelected,KK,20150126,735624,10,6.5
输出
filex
应如下所示:

文件X:

20150122,735620,iamSelected,CIG,20150122,735620,4.40902,-0.4255319148934609,-3.0,iamSelected,GRH,20150122,735620,0.62,-3.0,iamSelected,CIGG,20150122,735620,4.40902,-0.4255319148934609,-3.0
20150123,735621,iamSelected,A,20150123,735621,5,6,iamSelected,AA,20150123,735621,7,6,iamSelected,A,20150123,735621,5,6,iamSelected,AA,20150123,735621,7,6
20150124,735622,iamSelected,B,20150124,735622,7,-3
20150125,735623,iamSelected,K,20150125,735622,10,6.5,iamSelected,B,20150125,735623,7,-3
20150126,735624,iamSelected,KK,20150126,735624,10,6.5
我试过:

os.system("awk -F, 'NR==FNR{ a[$1]=$2 FS $3; next }{ if($1 in a) $0=$0 OFS a[$1] }1' file1 OFS=',' file2 >output")
但是它不起作用!!
有什么帮助吗?

awk代码不起作用
a[$1]=$2 FS$3
仅存储第一个文件的第二个和第三个字段,并使用
$1
作为键。下面的解决方案使用复合键
$1 OFS$2
(如果不正确,请从哈希引用中删除
OFS$2
),将其从
$0
中删除,并将字符串的其余部分哈希为数据

试试这个:

$ awk 'BEGIN{FS=OFS=","} NR==FNR{k=$1 OFS $2;sub(/^([^,]+,){2}/,"");a[k]=$0;next}{print $0 (a[$1 OFS $2]==""?"":OFS) a[$1 OFS $2];delete a[$1 OFS $2]}END{for(i in a)print i,a[i]}' file2 file1
20150122,735620,iamSelected,CIG,20150122,735620,4.40902,-0.4255319148934609,-3.0,iamSelected,GRH,20150122,735620,0.62,-3.0,iamSelected,CIGG,20150122,735620,4.40902,-0.4255319148934609,-3.0
20150123,735621,iamSelected,A,20150123,735621,5,6,iamSelected,AA,20150123,735621,7,6,iamSelected,A,20150123,735621,5,6,iamSelected,AA,20150123,735621,7,6
20150124,735622,iamSelected,B,20150124,735622,7,-3
20150125,735623,iamSelected,K,20150125,735622,10,6.5,iamSelected,B,20150125,735623,7,-3
20150126,735624,iamSelected,KK,20150126,735624,10,6.5
解释:

$ awk '
BEGIN { FS=OFS="," }                                # delimiters
NR==FNR {                                           # file2
    k=$1 OFS $2                                     # construct key for hashing
    sub(/^([^,]+,){2}/,"")                          # remove 2 first fields
    a[k]=$0                                         # hash
    next
}
{                                                   # file1
    print $0 (a[$1 OFS $2]==""?"":OFS) a[$1 OFS $2] # merge and print
    delete a[$1 OFS $2]                             # delete hash entry
}
END {                                               # process non-referred hash entries
    for(i in a)
        print i,a[i]
}' file2 file1

使用
join
命令的简短解决方案:

join -j1 -t, -a1 -a2  file1 file2 > filex
filex
内容:

20150122,735620,iamSelected,CIG,20150122,735620,4.40902,-0.4255319148934609,-3.0,iamSelected,GRH,20150122,735620,0.62,-3.0,735620,iamSelected,CIGG,20150122,735620,4.40902,-0.4255319148934609,-3.0
20150123,735621,iamSelected,A,20150123,735621,5,6,iamSelected,AA,20150123,735621,7,6,735621,iamSelected,A,20150123,735621,5,6,iamSelected,AA,20150123,735621,7,6
20150124,735622,iamSelected,B,20150124,735622,7,-3
20150125,735623,iamSelected,K,20150125,735622,10,6.5,735623,iamSelected,B,20150125,735623,7,-3
20150126,735624,iamSelected,KK,20150126,735624,10,6.5
Python代码

def file_contents(file_name):
with open(file_name, 'r') as fn:
    return fn.readlines()

f1_cont = sorted(file_contents('file1'))
f2_cont = sorted(file_contents('file2'))

out_put = open('filex', 'w')
for f in f1_cont:
        try:
                for j in xrange(len(f2_cont)):                        if f2_cont[j].startswith(f.split(",")[0]):
                                out_put.write(((f.strip('\n')+','+str(",".join(f2_cont[j].strip('\n').split(",")[2:])))+"\n")  )
                                f2_cont.remove(f2_cont[j])
                                continue
                out_put.write(f+"\n")
        except IndexError:
                pass    
for i in f2_cont:       
        out_put.write(i+"\n")
out_put.close()
产生你想要的结果[如你所问]

20150122,735620,iamSelected,CIG,20150122,735620,4.40902,-0.4255319148934609,-3.0,iamSelected,GRH,20150122,735620,0.62,-3.0,iamSelected,CIGG,20150122,735620,4.40902,-0.4255319148934609,-3.0
20150123,735621,iamSelected,A,20150123,735621,5,6,iamSelected,AA,20150123,735621,7,6,iamSelected,A,20150123,735621,5,6,iamSelected,AA,20150123,735621,7,6
20150124,735622,iamSelected,B,20150124,735622,7,-3
20150125,735623,iamSelected,K,20150125,735622,10,6.5,iamSelected,B,20150125,735623,7,-3
20150126,735624,iamSelected,KK,20150126,735624,10,6.5

我不知道为什么你们用python编程,若你们的脚本是在bashNo,我用命令行,但我们可以用python程序来做!当问题陈述只是“它不起作用”时,很难提供解决方案。请你的问题更完整地描述一下你预期会发生什么,以及这与实际结果有什么不同。请参阅,以获取关于什么是好的解释的提示。非常感谢。如何将它们放入文件中?
awk。。。file2 file1>filex
将awk的输出存储到
filex
。或者,如果您指的是awk代码本身,请将
'
s之间的内容剪切并粘贴到文件
program.awk
,并将其命名为
awk-f program.awk file2 file1>filex
。非常感谢