使用Python将单行.dat文件合并为一个.csv文件
我是编程界的初学者,我想了解一些如何解决挑战的技巧。 现在我有大约10000个.dat文件,每个文件都有一行,如下结构: Attribute1=Value&Attribute2=Value&Attribute3=Value…AttibuteN=Value 我一直在尝试使用python和CSV库将这些.dat文件转换为单个.CSV文件 到目前为止,我能够写一些东西来读取所有文件,将每个文件的内容存储在一个新行中,并用“&”to“,”替换,但是由于Attribute1、Attribute2…AttributeN对于每个文件都是完全相同的,所以我想将它们放入列标题中,并从其他行中删除它们 有什么建议吗使用Python将单行.dat文件合并为一个.csv文件,python,csv,Python,Csv,我是编程界的初学者,我想了解一些如何解决挑战的技巧。 现在我有大约10000个.dat文件,每个文件都有一行,如下结构: Attribute1=Value&Attribute2=Value&Attribute3=Value…AttibuteN=Value 我一直在尝试使用python和CSV库将这些.dat文件转换为单个.CSV文件 到目前为止,我能够写一些东西来读取所有文件,将每个文件的内容存储在一个新行中,并用“&”to“,”替换,但是由于Attribute1、Attribute2…Attr
谢谢大家! 将dat文件放在名为
myDats
的文件夹中。将此脚本与名为temp.txt
的文件放在myDats
文件夹旁边。您还需要输出.csv
。[也就是说,您将在同一文件夹中拥有output.csv
、myDats
和mergeDats.py
]
mergeDats.py
import csv
import os
g = open("temp.txt","w")
for file in os.listdir('myDats'):
f = open("myDats/"+file,"r")
tempData = f.readlines()[0]
tempData = tempData.replace("&","\n")
g.write(tempData)
f.close()
g.close()
h = open("text.txt","r")
arr = h.read().split("\n")
dict = {}
for x in arr:
temp2 = x.split("=")
dict[temp2[0]] = temp2[1]
with open('output.csv','w' """use 'wb' in python 2.x""" ) as output:
w = csv.DictWriter(output,my_dict.keys())
w.writeheader()
w.writerow(my_dict)
但是由于Attribute1,Attribute2…AttributeN是完全相同的
对于每个文件,我都希望将它们转换为列标题和
每隔一行将其移除
对于第一个文件,执行一次:
','.join(k for (k,v) in map(lambda s: s.split('='), input.split('&')))
对于每个文件的内容:
','.join(v for (k,v) in map(lambda s: s.split('='), input.split('&')))
也许你需要额外修剪琴弦;我不知道你的输入有多干净。因为你是一个初学者,我准备了一些有效的代码,同时也很容易理解 我假设您在名为“输入”的文件夹中拥有所有文件。下面的代码应位于文件夹旁边的脚本文件中 请记住,应该使用此代码来理解如何解决此类问题。优化和健全性检查被故意忽略了 您可能还需要检查某些行中缺少值时会发生什么,缺少属性时会发生什么,输入损坏时会发生什么等等:) 祝你好运
import os
# this function splits the attribute=value into two lists
# the first list are all the attributes
# the second list are all the values
def getAttributesAndValues(line):
attributes = []
values = []
# first we split the input over the &
AtributeValues = line.split('&')
for attrVal in AtributeValues:
# we split the attribute=value over the '=' sign
# the left part goes to split[0], the value goes to split[1]
split = attrVal.split('=')
attributes.append(split[0])
values.append(split[1])
# return the attributes list and values list
return attributes,values
# test the function using the line beneath so you understand how it works
# line = "Attribute1=Value&Attribute2=Value&Attribute3=Vale&AttibuteN=Value"
# print getAttributesAndValues(line)
# this function writes a single file to an output file
def writeToCsv(inFile='', wfile="outFile.csv", delim=","):
f_in = open(inFile, 'r') # only reading the file
f_out = open(wfile, 'ab+') # file is opened for reading and appending
# read the whole file line by line
lines = f_in.readlines()
# loop throug evert line in the file and write its values
for line in lines:
# let's check if the file is empty and write the headers then
first_char = f_out.read(1)
header, values = getAttributesAndValues(line)
# we write the header only if the file is empty
if not first_char:
for attribute in header:
f_out.write(attribute+delim)
f_out.write("\n")
# we write the values
for value in values:
f_out.write(value+delim)
f_out.write("\n")
# Read all the files in the path (without dir pointer)
allInputFiles = os.listdir('input/')
allInputFiles = allInputFiles[1:]
# loop through all the files and write values to the csv file
for singleFile in allInputFiles:
writeToCsv('input/'+singleFile)
非常感谢你!正如你所希望的,这段代码帮助我解决了我的问题,给了我一些学习的东西。谢谢!运行此命令时,我会得到:“IOError:[Errno 2]没有这样的文件或目录:“1.dat”可以修复它,请再试一次。这是一个有趣的方法!我会试试看,然后告诉你会发生什么。非常感谢。
import os
# this function splits the attribute=value into two lists
# the first list are all the attributes
# the second list are all the values
def getAttributesAndValues(line):
attributes = []
values = []
# first we split the input over the &
AtributeValues = line.split('&')
for attrVal in AtributeValues:
# we split the attribute=value over the '=' sign
# the left part goes to split[0], the value goes to split[1]
split = attrVal.split('=')
attributes.append(split[0])
values.append(split[1])
# return the attributes list and values list
return attributes,values
# test the function using the line beneath so you understand how it works
# line = "Attribute1=Value&Attribute2=Value&Attribute3=Vale&AttibuteN=Value"
# print getAttributesAndValues(line)
# this function writes a single file to an output file
def writeToCsv(inFile='', wfile="outFile.csv", delim=","):
f_in = open(inFile, 'r') # only reading the file
f_out = open(wfile, 'ab+') # file is opened for reading and appending
# read the whole file line by line
lines = f_in.readlines()
# loop throug evert line in the file and write its values
for line in lines:
# let's check if the file is empty and write the headers then
first_char = f_out.read(1)
header, values = getAttributesAndValues(line)
# we write the header only if the file is empty
if not first_char:
for attribute in header:
f_out.write(attribute+delim)
f_out.write("\n")
# we write the values
for value in values:
f_out.write(value+delim)
f_out.write("\n")
# Read all the files in the path (without dir pointer)
allInputFiles = os.listdir('input/')
allInputFiles = allInputFiles[1:]
# loop through all the files and write values to the csv file
for singleFile in allInputFiles:
writeToCsv('input/'+singleFile)