Python 将1000个文本文件转换为单个csv文件
我想将多个文本文件转换为单个csv文件。文本名为(file1.txt、file2.txt…file1000.txt)。文本文件(file1.txt)格式如下:Python 将1000个文本文件转换为单个csv文件,python,csv,Python,Csv,我想将多个文本文件转换为单个csv文件。文本名为(file1.txt、file2.txt…file1000.txt)。文本文件(file1.txt)格式如下: Employee id: us51243 Employee name: Mark santosh department:engineering Age:25 Employee id,Employee name,department,Age us98621,Andy Gonzalez,Support & services,25
Employee id: us51243
Employee name: Mark santosh
department:engineering
Age:25
Employee id,Employee name,department,Age
us98621,Andy Gonzalez,Support & services,25
我希望输出为:
Employee id,Employee name,department,Age
us51243,Mark santosh,engineering,25//(file1.txt values)
...................................//(file2.txt values)
但在输出中,我只得到file1000.txt的值,如下所示:
Employee id: us51243
Employee name: Mark santosh
department:engineering
Age:25
Employee id,Employee name,department,Age
us98621,Andy Gonzalez,Support & services,25
这是我的密码:
import csv
import os
for x in range(1,1001):
filepath=os.path.normpath('C:\\Text\\file{}.txt'.format(x))
with open(filepath) as f, open('emp.csv', 'w',newline='') as file:
writer = csv.writer(file)
val = zip(*[l.rstrip().split(': ') for l in f])
writer.writerows(val)
请注意:我还想只显示一次标题(员工id、员工姓名、部门、年龄)请尝试以下操作:
import csv
import os
FIELDS = ('Employee id', 'Employee name', 'department', 'Age')
def read_file(file, keys):
output = dict.fromkeys(keys)
for line in file:
line = line.rstrip().split(': ')
output[line[0]] = line[1]
return output
with open('emp.csv', 'w', newline='') as destiny:
writer = csv.DictWriter(destiny, FIELDS)
writer.writeheader()
for x in range(1, 1001):
with open(os.path.normpath('C:\\test\\file{}.txt'.format(x))) as origin:
writer.writerow(read_file(file, FIELDS))
您当前正在为每个新文本文件重新打开文件,这将导致所有内容被覆盖。此外,您还可以使用CSV库读取文本文件,方法是将分隔符指定为
:
,并跳过任何额外的空格:
import csv
import os
header = ["Employee id", "Employee name", "department", "Age"]
with open('emp.csv', 'w', newline='') as f_output:
csv_output = csv.writer(f_output)
csv_output.writerow(header)
for x in range(1, 1001):
filepath = os.path.normpath(r'C:\Text\file{}.txt'.format(x))
with open(filepath, 'r', newline='') as f_text:
csv_text = csv.reader(f_text, delimiter=':', skipinitialspace=True)
csv_output.writerow(row[1] for row in csv_text)
首先,让我们创建两个文件:
s1 = u"""Employee id: us51243
Employee name: Mark santosh
department:engineering
Age:25"""
s2 = u"""Employee id: us51244
Employee name: Any santosh
department:engineering
Age:24"""
with open("file1.txt", "w") as f:
f.write(s1)
with open("file2.txt", "w") as f:
f.write(s2)
现在让我们使用熊猫:
import pandas as pd
# Filelist
filelist = ["file1.txt","file2.txt"]
# Create dataframe
df = pd.DataFrame(columns=["Employee id","Employee name","department","Age","file"])
# Loop through files
for ind,file in enumerate(filelist):
data = pd.read_csv(file, header=None, sep=":").iloc[:,1]
df.loc[ind] = data.tolist() + [file]
df
输出:
Employee id Employee name department Age file
0 us51243 Mark santosh engineering 25 file1.txt
1 us51243 Mark santosh engineering 25 file2.txt
每个文件是否只有4个静态字段?1000个文件,每个文件中有4个字段?我想旧的行每次都会被覆盖。使用
('emp.csv','w+',newline='')
添加行,而不是重写行<代码>('emp.csv','a',newline='')也是一个选项。@RomanPerekhrest是的,每个文件只有4个静态字段。我可以建议一个简短的命令行解决方案,它将比任何python方法都快得多,而且无条件地快。但是我看到您的路径C:\\Text\\file
指向Windows操作系统,Windows是“问题”和“不便”@Hanseffranz这一个('emp.csv','a',newline='')可以工作,但我只想在开始时显示一次标题(员工id,员工姓名,部门,年龄)。