Python 组合多个csv文件
我有3个csv文件,我想将这3个文件写入一个csv文件中。 比如说 file1.csv 文件2.csv 文件3.csv 所需输出如下所示Python 组合多个csv文件,python,csv,pandas,Python,Csv,Pandas,我有3个csv文件,我想将这3个文件写入一个csv文件中。 比如说 file1.csv 文件2.csv 文件3.csv 所需输出如下所示 a b c d e f g h i j k l 1 2 3 4 13 14 15 16 9 10 11 12 5 6 7 8 17 18 19 20 21 22 23 24 这些人是对的,你不应该要求代码。尽管如此,我还是觉得这项任务很有说服力,我花了三分钟的时间来解决这个问题: import csv allColumns = []
a b c d e f g h i j k l
1 2 3 4 13 14 15 16 9 10 11 12
5 6 7 8 17 18 19 20 21 22 23 24
这些人是对的,你不应该要求代码。尽管如此,我还是觉得这项任务很有说服力,我花了三分钟的时间来解决这个问题:
import csv
allColumns = []
for dataFileName in [ 'a.csv', 'b.csv', 'c.csv' ]:
with open(dataFileName) as dataFile:
fileColumns = zip(*list(csv.reader(dataFile, delimiter=' ')))
allColumns += fileColumns
allRows = zip(*allColumns)
with open('combined.csv', 'w') as resultFile:
writer = csv.writer(resultFile, delimiter=' ')
for row in allRows:
writer.writerow(row)
请注意,此解决方案可能无法对大输入正常工作。它还假设所有文件的行(行)数相等,如果不是这样,则可能会中断。一个想法是使用zip函数
file1 = "a b c d\n1 2 3 4\n5 6 7 8"
file2 = "e f g h\n13 14 15 16\n17 18 19 20"
file3 = "i j k l\n9 10 11 12\n21 22 23 24"
merged_file =[i+" " +j+" " +k for i,j,k in zip(file1.split('\n'),file2.split('\n'),file3.split('\n'))]
for i in merged_file:
print i
考虑到所有文件都有相等的行。这个解决方案也适用于大输入,因为一次只将3行(每个文件一行)带入内存
import csv
with open('foo1.txt') as f1, open('foo2.txt') as f2, \
open('foo2.txt') as f3, open('out.txt', 'w') as f_out:
writer = csv.writer(f_out, delimiter=' ')
readers = [csv.reader(x, delimiter=' ') for x in (f1, f2, f3)]
while True:
try:
writer.writerow([y for w in readers for y in next(w)])
except StopIteration:
break
上述代码的基于for循环的版本,但这需要首先对其中一个文件进行迭代以获得行数:
import csv
with open('foo1.txt') as f1, open('foo2.txt') as f2, \
open('foo2.txt') as f3, open('out.txt', 'w') as f_out:
writer = csv.writer(f_out, delimiter=' ')
lines = sum(1 for _ in f1) #Number of lines in f1
f1.seek(0) #Move the file pointer to the start of file
readers = [csv.reader(x, delimiter=' ') for x in (f1, f2, f3)]
for _ in range(lines):
writer.writerow([y for w in readers for y in next(w)])
蟒蛇走了。
(上述发布代码的稍微改进版本)
然后得到组合的csv文件output.csv
inputs = 'file1.csv', 'file2.csv', 'file3.csv'
with open('out.csv','w') as output:
for line in zip(*map(open, inputs)):
output.write('%s\n'%' '.join(i.strip() for i in line))
Unix命令行方式。
如果您使用的是UNIX类型的操作系统,请检查您是否只关心合并文件
Godspeed.您可以使用一个数据处理工具
import pandas as pd
df1 = pd.read_csv('file1.csv')
df2 = pd.read_csv('file2.csv')
df3 = pd.read_csv('file3.csv')
df_combined = pd.concat([df1, df2, df3],axis=1)
df_combined.to_csv('output.csv', index=None)
然后得到组合的csv文件output.csv
inputs = 'file1.csv', 'file2.csv', 'file3.csv'
with open('out.csv','w') as output:
for line in zip(*map(open, inputs)):
output.write('%s\n'%' '.join(i.strip() for i in line))
编辑:这里有一个详细的版本
inputs = 'file1.csv', 'file2.csv', 'file3.csv'
# open all input files
inputs = map(open, inputs)
with open('out.csv','w') as output:
# iter over all the input files at the same time
for line in zip(*inputs):
# format the output line from input lines
line = ' '.join(i.strip() for i in line)
output.write('%s\n' % line)
除了第一个答案(这是正确的答案)之外,您还可以通过以下方式处理文件夹中任意数量的csv文件:
import os
import pandas as pd
folder = r"C:\MyFolder"
frames = [pd.read_csv(os.path.join(folder,name) for name in os.listdir(folder) if name.endswith('.csv')]
merged = pd.concat(frames)
文件:
首先想到的是使用熊猫模块,正如waitingkuo的回答一样。但我想你也可以用听写器
import csv
# Initialize output file
header = [x for x in 'abcdefghijkl']
output = csv.DictWriter(open('final_output.csv', 'wb'), fieldnames = header)
output.writerow(dict(zip(header, header)))
# Compile contents of all three files into a single dictionary, outputdict
outputdict = {key:[] for key in header}
for fname in ['file1.csv', 'file2.csv', 'file3.csv']:
f = csv.DictReader(open(fname, 'r'))
[(outputdict[k]).append(line[k]) for k in line for line in f]
# Transfer the contents of outputdict into a csv file
[output.writerow(l) for l in outputdict]
任何人都能提供python代码吗
-哇,不。这是stackoverlow.com,不是domyworksforyou.com。向我们展示你的代码,我们很乐意帮助你。那么请阅读一些关于文件处理的Python教程。你可以开始了。要理解基础知识,你需要学习。你不可能通过要求代码来获得任何结果。它包括一个额外的新行,打开out.txt
作为wb
@thefour你确定吗?在Python2和Python3上都测试过,没有额外的新行。我在Python2上试过,得到了一个额外的新行,所以Jon建议使用wb
,效果很好。请提供更多细节。
inputs = 'file1.csv', 'file2.csv', 'file3.csv'
with open('out.csv','w') as output:
for line in zip(*map(open, inputs)):
output.write('%s\n'%' '.join(i.strip() for i in line))
inputs = 'file1.csv', 'file2.csv', 'file3.csv'
# open all input files
inputs = map(open, inputs)
with open('out.csv','w') as output:
# iter over all the input files at the same time
for line in zip(*inputs):
# format the output line from input lines
line = ' '.join(i.strip() for i in line)
output.write('%s\n' % line)
import os
import pandas as pd
folder = r"C:\MyFolder"
frames = [pd.read_csv(os.path.join(folder,name) for name in os.listdir(folder) if name.endswith('.csv')]
merged = pd.concat(frames)
import csv
# Initialize output file
header = [x for x in 'abcdefghijkl']
output = csv.DictWriter(open('final_output.csv', 'wb'), fieldnames = header)
output.writerow(dict(zip(header, header)))
# Compile contents of all three files into a single dictionary, outputdict
outputdict = {key:[] for key in header}
for fname in ['file1.csv', 'file2.csv', 'file3.csv']:
f = csv.DictReader(open(fname, 'r'))
[(outputdict[k]).append(line[k]) for k in line for line in f]
# Transfer the contents of outputdict into a csv file
[output.writerow(l) for l in outputdict]