使用Python将单行.dat文件合并为一个.csv文件_Python_Csv

使用Python将单行.dat文件合并为一个.csv文件

python csv

使用Python将单行.dat文件合并为一个.csv文件,python,csv,Python,Csv,我是编程界的初学者，我想了解一些如何解决挑战的技巧。现在我有大约10000个.dat文件，每个文件都有一行，如下结构： Attribute1=Value&Attribute2=Value&Attribute3=Value…AttibuteN=Value 我一直在尝试使用python和CSV库将这些.dat文件转换为单个.CSV文件到目前为止，我能够写一些东西来读取所有文件，将每个文件的内容存储在一个新行中，并用“&”to“，”替换，但是由于Attribute1、Attribute2…Attr

我是编程界的初学者，我想了解一些如何解决挑战的技巧。现在我有大约10000个.dat文件，每个文件都有一行，如下结构：

Attribute1=Value&Attribute2=Value&Attribute3=Value…AttibuteN=Value

我一直在尝试使用python和CSV库将这些.dat文件转换为单个.CSV文件

到目前为止，我能够写一些东西来读取所有文件，将每个文件的内容存储在一个新行中，并用“&”to“，”替换，但是由于Attribute1、Attribute2…AttributeN对于每个文件都是完全相同的，所以我想将它们放入列标题中，并从其他行中删除它们

有什么建议吗

谢谢大家!

将dat文件放在名为

myDats

的文件夹中。将此脚本与名为

temp.txt

的文件放在

myDats

文件夹旁边。您还需要

输出.csv

。[也就是说，您将在同一文件夹中拥有

output.csv

、

myDats

和

mergeDats.py

]

mergeDats.py

import csv
import os
g = open("temp.txt","w")
for file in os.listdir('myDats'):
    f = open("myDats/"+file,"r")
    tempData = f.readlines()[0]
    tempData = tempData.replace("&","\n")
    g.write(tempData)
    f.close()
g.close()
h = open("text.txt","r")
arr = h.read().split("\n")
dict = {}
for x in arr:
    temp2 = x.split("=")
    dict[temp2[0]] = temp2[1]
with open('output.csv','w' """use 'wb' in python 2.x""" ) as output:
    w = csv.DictWriter(output,my_dict.keys())
    w.writeheader()
    w.writerow(my_dict)

但是由于Attribute1，Attribute2…AttributeN是完全相同的对于每个文件，我都希望将它们转换为列标题和每隔一行将其移除

对于第一个文件，执行一次：

','.join(k for (k,v) in map(lambda s: s.split('='), input.split('&')))

对于每个文件的内容：

','.join(v for (k,v) in map(lambda s: s.split('='), input.split('&')))

也许你需要额外修剪琴弦；我不知道你的输入有多干净。

因为你是一个初学者，我准备了一些有效的代码，同时也很容易理解

我假设您在名为“输入”的文件夹中拥有所有文件。下面的代码应位于文件夹旁边的脚本文件中

请记住，应该使用此代码来理解如何解决此类问题。优化和健全性检查被故意忽略了

您可能还需要检查某些行中缺少值时会发生什么，缺少属性时会发生什么，输入损坏时会发生什么等等：）

祝你好运

import os

# this function splits the attribute=value into two lists
# the first list are all the attributes
# the second list are all the values
def getAttributesAndValues(line):
    attributes = []
    values = []

    # first we split the input over the &
    AtributeValues = line.split('&')
    for attrVal in AtributeValues:
        # we split the attribute=value over the '=' sign
        # the left part goes to split[0], the value goes to split[1]
        split = attrVal.split('=')
        attributes.append(split[0])
        values.append(split[1])

    # return the attributes list and values list
    return attributes,values

# test the function using the line beneath so you understand how it works
# line = "Attribute1=Value&Attribute2=Value&Attribute3=Vale&AttibuteN=Value"
# print getAttributesAndValues(line)

# this function writes a single file to an output file
def writeToCsv(inFile='', wfile="outFile.csv", delim=","):
    f_in = open(inFile, 'r')    # only reading the file
    f_out = open(wfile, 'ab+')  # file is opened for reading and appending

    # read the whole file line by line
    lines = f_in.readlines()

    # loop throug evert line in the file and write its values
    for line in lines:
        # let's check if the file is empty and write the headers then
        first_char = f_out.read(1)
        header, values = getAttributesAndValues(line)

        # we write the header only if the file is empty
        if not first_char:
            for attribute in header:
                f_out.write(attribute+delim)
            f_out.write("\n")

        # we write the values
        for value in values:
            f_out.write(value+delim)
        f_out.write("\n")

# Read all the files in the path (without dir pointer)
allInputFiles = os.listdir('input/')
allInputFiles = allInputFiles[1:]

# loop through all the files and write values to the csv file
for singleFile in allInputFiles:
    writeToCsv('input/'+singleFile)

非常感谢你！正如你所希望的，这段代码帮助我解决了我的问题，给了我一些学习的东西。谢谢！运行此命令时，我会得到：“IOError:[Errno 2]没有这样的文件或目录：“1.dat”可以修复它，请再试一次。这是一个有趣的方法！我会试试看，然后告诉你会发生什么。非常感谢。

import os

# this function splits the attribute=value into two lists
# the first list are all the attributes
# the second list are all the values
def getAttributesAndValues(line):
    attributes = []
    values = []

    # first we split the input over the &
    AtributeValues = line.split('&')
    for attrVal in AtributeValues:
        # we split the attribute=value over the '=' sign
        # the left part goes to split[0], the value goes to split[1]
        split = attrVal.split('=')
        attributes.append(split[0])
        values.append(split[1])

    # return the attributes list and values list
    return attributes,values

# test the function using the line beneath so you understand how it works
# line = "Attribute1=Value&Attribute2=Value&Attribute3=Vale&AttibuteN=Value"
# print getAttributesAndValues(line)

# this function writes a single file to an output file
def writeToCsv(inFile='', wfile="outFile.csv", delim=","):
    f_in = open(inFile, 'r')    # only reading the file
    f_out = open(wfile, 'ab+')  # file is opened for reading and appending

    # read the whole file line by line
    lines = f_in.readlines()

    # loop throug evert line in the file and write its values
    for line in lines:
        # let's check if the file is empty and write the headers then
        first_char = f_out.read(1)
        header, values = getAttributesAndValues(line)

        # we write the header only if the file is empty
        if not first_char:
            for attribute in header:
                f_out.write(attribute+delim)
            f_out.write("\n")

        # we write the values
        for value in values:
            f_out.write(value+delim)
        f_out.write("\n")

# Read all the files in the path (without dir pointer)
allInputFiles = os.listdir('input/')
allInputFiles = allInputFiles[1:]

# loop through all the files and write values to the csv file
for singleFile in allInputFiles:
    writeToCsv('input/'+singleFile)