Python 如何将两个csv文件与公共列值组合,但两个文件的行数不同
例如:Python 如何将两个csv文件与公共列值组合,但两个文件的行数不同,python,csv,Python,Csv,例如: file1.csv contains 2 columns: c11;c12 file2.csv contains 2 columns: c21;c22 Common column: c11, c21 f1.csv file1.csv contains 2 columns: c11;c12 file2.csv contains 2 columns: c21;c22 Common column: c11, c21 f2.csv file1.csv contains 2 columns:
file1.csv contains 2 columns: c11;c12
file2.csv contains 2 columns: c21;c22
Common column: c11, c21
f1.csv
file1.csv contains 2 columns: c11;c12
file2.csv contains 2 columns: c21;c22
Common column: c11, c21
f2.csv
file1.csv contains 2 columns: c11;c12
file2.csv contains 2 columns: c21;c22
Common column: c11, c21
a;text_a
b;text_b
f;text_f
x;text_x
输出f1+f2:
file1.csv contains 2 columns: c11;c12
file2.csv contains 2 columns: c21;c22
Common column: c11, c21
a;path_a
c;path_c
d;path_d
k;path_k
l;path_l
m:path_m
如何使用python实现它?使用csv模块很容易做到这一点:
file1.csv contains 2 columns: c11;c12
file2.csv contains 2 columns: c21;c22
Common column: c11, c21
a;text_a;path_a
b;text_b,''
c;'';path_c
d;'';path_d
f;text_f;''
k;'';path_k
l;'';path_l
m;'';path_m
x;text_x;''
使用csv模块很容易做到这一点:
file1.csv contains 2 columns: c11;c12
file2.csv contains 2 columns: c21;c22
Common column: c11, c21
a;text_a;path_a
b;text_b,''
c;'';path_c
d;'';path_d
f;text_f;''
k;'';path_k
l;'';path_l
m;'';path_m
x;text_x;''
对于基于一个或多个公共列合并多个文件(甚至大于2),python中最好且有效的方法之一是使用brewery。您甚至可以指定合并时需要考虑哪些字段以及需要保存哪些字段
file1.csv contains 2 columns: c11;c12
file2.csv contains 2 columns: c21;c22
Common column: c11, c21
import csv
with open('file1.csv') as f:
r = csv.reader(f, delimiter=';')
dict1 = {row[0]: row[1] for row in r}
with open('file2.csv') as f:
r = csv.reader(f, delimiter=';')
dict2 = {row[0]: row[1] for row in r}
keys = set(dict1.keys() + dict2.keys())
with open('output.csv', 'wb') as f:
w = csv.writer(f, delimiter=';')
w.writerows([[key, dict1.get(key, "''"), dict2.get(key, "''")]
for key in keys])
创建所有字段的列表并添加文件名以存储有关数据记录来源的信息。查看源定义并收集字段:
file1.csv contains 2 columns: c11;c12
file2.csv contains 2 columns: c21;c22
Common column: c11, c21
import brewery
from brewery
import ds
import sys
sources = [
{"file": "grants_2008.csv",
"fields": ["receiver", "amount", "date"]},
{"file": "grants_2009.csv",
"fields": ["id", "receiver", "amount", "contract_number", "date"]},
{"file": "grants_2010.csv",
"fields": ["receiver", "subject", "requested_amount", "amount", "date"]}
]
对于基于一个或多个公共列合并多个文件(甚至大于2),python中最好且有效的方法之一是使用brewery。您甚至可以指定合并时需要考虑哪些字段以及需要保存哪些字段
file1.csv contains 2 columns: c11;c12
file2.csv contains 2 columns: c21;c22
Common column: c11, c21
import csv
with open('file1.csv') as f:
r = csv.reader(f, delimiter=';')
dict1 = {row[0]: row[1] for row in r}
with open('file2.csv') as f:
r = csv.reader(f, delimiter=';')
dict2 = {row[0]: row[1] for row in r}
keys = set(dict1.keys() + dict2.keys())
with open('output.csv', 'wb') as f:
w = csv.writer(f, delimiter=';')
w.writerows([[key, dict1.get(key, "''"), dict2.get(key, "''")]
for key in keys])
创建所有字段的列表并添加文件名以存储有关数据记录来源的信息。查看源定义并收集字段:
file1.csv contains 2 columns: c11;c12
file2.csv contains 2 columns: c21;c22
Common column: c11, c21
import brewery
from brewery
import ds
import sys
sources = [
{"file": "grants_2008.csv",
"fields": ["receiver", "amount", "date"]},
{"file": "grants_2009.csv",
"fields": ["id", "receiver", "amount", "contract_number", "date"]},
{"file": "grants_2010.csv",
"fields": ["receiver", "subject", "requested_amount", "amount", "date"]}
]
如果您只需要这一点,请查看命令行连接工具:感谢您的建议,但是如何在这种情况下使用连接命令的示例非常好。如果您只需要这一点,请查看命令行连接工具:感谢您的建议,但是一个如何在这种情况下使用join命令的示例非常感谢您的响应。我还有一个问题。如果file2.csv有3列i.s.o 2列,则其他条件相同。这对代码有很大影响吗?谢谢你的回复。我还有一个问题。如果file2.csv有3列i.s.o 2列,则其他条件相同。这对代码有很大影响吗?