使用Python将单行.dat文件合并为一个.csv文件

使用Python将单行.dat文件合并为一个.csv文件,python,csv,Python,Csv,我是编程界的初学者,我想了解一些如何解决挑战的技巧。 现在我有大约10000个.dat文件,每个文件都有一行,如下结构: Attribute1=Value&Attribute2=Value&Attribute3=Value…AttibuteN=Value 我一直在尝试使用python和CSV库将这些.dat文件转换为单个.CSV文件 到目前为止,我能够写一些东西来读取所有文件,将每个文件的内容存储在一个新行中,并用“&”to“,”替换,但是由于Attribute1、Attribute2…Attr

我是编程界的初学者,我想了解一些如何解决挑战的技巧。 现在我有大约10000个.dat文件,每个文件都有一行,如下结构:

Attribute1=Value&Attribute2=Value&Attribute3=Value…AttibuteN=Value

我一直在尝试使用python和CSV库将这些.dat文件转换为单个.CSV文件

到目前为止,我能够写一些东西来读取所有文件,将每个文件的内容存储在一个新行中,并用“&”to“,”替换,但是由于Attribute1、Attribute2…AttributeN对于每个文件都是完全相同的,所以我想将它们放入列标题中,并从其他行中删除它们

有什么建议吗


谢谢大家!

将dat文件放在名为
myDats
的文件夹中。将此脚本与名为
temp.txt
的文件放在
myDats
文件夹旁边。您还需要
输出.csv
。[也就是说,您将在同一文件夹中拥有
output.csv
myDats
mergeDats.py
]

mergeDats.py

import csv
import os
g = open("temp.txt","w")
for file in os.listdir('myDats'):
    f = open("myDats/"+file,"r")
    tempData = f.readlines()[0]
    tempData = tempData.replace("&","\n")
    g.write(tempData)
    f.close()
g.close()
h = open("text.txt","r")
arr = h.read().split("\n")
dict = {}
for x in arr:
    temp2 = x.split("=")
    dict[temp2[0]] = temp2[1]
with open('output.csv','w' """use 'wb' in python 2.x""" ) as output:
    w = csv.DictWriter(output,my_dict.keys())
    w.writeheader()
    w.writerow(my_dict)
但是由于Attribute1,Attribute2…AttributeN是完全相同的 对于每个文件,我都希望将它们转换为列标题和 每隔一行将其移除

对于第一个文件,执行一次:

','.join(k for (k,v) in map(lambda s: s.split('='), input.split('&')))
对于每个文件的内容:

','.join(v for (k,v) in map(lambda s: s.split('='), input.split('&')))

也许你需要额外修剪琴弦;我不知道你的输入有多干净。

因为你是一个初学者,我准备了一些有效的代码,同时也很容易理解

我假设您在名为“输入”的文件夹中拥有所有文件。下面的代码应位于文件夹旁边的脚本文件中

请记住,应该使用此代码来理解如何解决此类问题。优化和健全性检查被故意忽略了

您可能还需要检查某些行中缺少值时会发生什么,缺少属性时会发生什么,输入损坏时会发生什么等等:)

祝你好运

import os

# this function splits the attribute=value into two lists
# the first list are all the attributes
# the second list are all the values
def getAttributesAndValues(line):
    attributes = []
    values = []

    # first we split the input over the &
    AtributeValues = line.split('&')
    for attrVal in AtributeValues:
        # we split the attribute=value over the '=' sign
        # the left part goes to split[0], the value goes to split[1]
        split = attrVal.split('=')
        attributes.append(split[0])
        values.append(split[1])

    # return the attributes list and values list
    return attributes,values

# test the function using the line beneath so you understand how it works
# line = "Attribute1=Value&Attribute2=Value&Attribute3=Vale&AttibuteN=Value"
# print getAttributesAndValues(line)

# this function writes a single file to an output file
def writeToCsv(inFile='', wfile="outFile.csv", delim=","):
    f_in = open(inFile, 'r')    # only reading the file
    f_out = open(wfile, 'ab+')  # file is opened for reading and appending

    # read the whole file line by line
    lines = f_in.readlines()

    # loop throug evert line in the file and write its values
    for line in lines:
        # let's check if the file is empty and write the headers then
        first_char = f_out.read(1)
        header, values = getAttributesAndValues(line)

        # we write the header only if the file is empty
        if not first_char:
            for attribute in header:
                f_out.write(attribute+delim)
            f_out.write("\n")

        # we write the values
        for value in values:
            f_out.write(value+delim)
        f_out.write("\n")

# Read all the files in the path (without dir pointer)
allInputFiles = os.listdir('input/')
allInputFiles = allInputFiles[1:]

# loop through all the files and write values to the csv file
for singleFile in allInputFiles:
    writeToCsv('input/'+singleFile)

非常感谢你!正如你所希望的,这段代码帮助我解决了我的问题,给了我一些学习的东西。谢谢!运行此命令时,我会得到:“IOError:[Errno 2]没有这样的文件或目录:“1.dat”可以修复它,请再试一次。这是一个有趣的方法!我会试试看,然后告诉你会发生什么。非常感谢。
import os

# this function splits the attribute=value into two lists
# the first list are all the attributes
# the second list are all the values
def getAttributesAndValues(line):
    attributes = []
    values = []

    # first we split the input over the &
    AtributeValues = line.split('&')
    for attrVal in AtributeValues:
        # we split the attribute=value over the '=' sign
        # the left part goes to split[0], the value goes to split[1]
        split = attrVal.split('=')
        attributes.append(split[0])
        values.append(split[1])

    # return the attributes list and values list
    return attributes,values

# test the function using the line beneath so you understand how it works
# line = "Attribute1=Value&Attribute2=Value&Attribute3=Vale&AttibuteN=Value"
# print getAttributesAndValues(line)

# this function writes a single file to an output file
def writeToCsv(inFile='', wfile="outFile.csv", delim=","):
    f_in = open(inFile, 'r')    # only reading the file
    f_out = open(wfile, 'ab+')  # file is opened for reading and appending

    # read the whole file line by line
    lines = f_in.readlines()

    # loop throug evert line in the file and write its values
    for line in lines:
        # let's check if the file is empty and write the headers then
        first_char = f_out.read(1)
        header, values = getAttributesAndValues(line)

        # we write the header only if the file is empty
        if not first_char:
            for attribute in header:
                f_out.write(attribute+delim)
            f_out.write("\n")

        # we write the values
        for value in values:
            f_out.write(value+delim)
        f_out.write("\n")

# Read all the files in the path (without dir pointer)
allInputFiles = os.listdir('input/')
allInputFiles = allInputFiles[1:]

# loop through all the files and write values to the csv file
for singleFile in allInputFiles:
    writeToCsv('input/'+singleFile)