Python 从某些文件中获取数据_Python_Python 2.7_Csv

Python 从某些文件中获取数据

python python-2.7 csv

Python 从某些文件中获取数据,python,python-2.7,csv,Python,Python 2.7,Csv,我有大约200个文件，希望获取每个文件中的数据，然后在一个.csv文件中显示所有数据例如，文件列表是 #targeted folder a001.opd a002.opd a003.opd ... .. . a200.opd 每个文件都有相同的数据结构，如下所示 model <many spaces> 1 <many spaces> 0.003 0.002 # Title mo(data,1) <many spaces>

我有大约200个文件，希望获取每个文件中的数据，然后在一个

.csv

文件中显示所有数据

例如，文件列表是

#targeted folder
a001.opd
a002.opd
a003.opd
...
..
.
a200.opd

每个文件都有相同的数据结构，如下所示

model <many spaces>          1 <many spaces>       0.003    0.002  # Title
mo(data,1) <many spaces>     1 <many spaces>       0.2      0.0001 # 1
mo(data,1) <many spaces>     2 <many spaces>      -0.1      0.04   # 2
mo(data,1) <many spaces>     3 <many spaces>      -0.4      0.005  # 3
....................................
................
.............                                                      # n-1
......                                                             # n

下面是我所做的代码

import os

def openfolder(path, outputfile='grab_result.csv'):  # get .opd file from folder and open an output file
    if os.path.isdir(path): 
        fo = open(outputfile, 'wb')
        fo.write('filename') # write title here
        for filename in [os.path.abspath(path)+'\\'+each for each in os.listdir(path) if each.endswith('.opd')]:                         
            return openfile(filename)
    else:
        print "path unavailable"

    openfolder('C:\\path', 'C:\\path\\grab_result.csv')     


def openfile(filename):   # open file.opd
    if os.path.isfile(filename) and filename.endswith('.opd'): 
        return grabdata(open(filename, 'rb').read())         
    else:
        print "invalid file"
        return []       


def grabdata(string):   # start to grab data
    ret = []
    idx_data = string.find('model')
    # then I stop here....

有人知道如何从这些文件中获取数据吗

这是我的示例文件（）

它是这样的：

def grabdata(filename): 
    # start to grab data 
    matches = []
    with open(filename) as f:            
        for line in f:
            # add if matches:
            if line.startswith("model"): # or: line.find("model") != -1
                matches.append(line.strip())
    return matches

应该是这样的：

def grabdata(filename): 
    # start to grab data 
    matches = []
    with open(filename) as f:            
        for line in f:
            # add if matches:
            if line.startswith("model"): # or: line.find("model") != -1
                matches.append(line.strip())
    return matches

如果你有很多包含很多内容的文件，我会使用生成器。这允许不将所有内容加载到内存中。以下是我将如何着手的：

def get_all_files(path):
    ## get a generator with all file names
    import os
    import glob
    return glob.iglob(os.path.join(path,'*.opd'))

def get_all_data(files):
    ## get a generator with all the data from all the files
    for fil in files:
        with open(fil, 'r') as the_file:
            for line in the_file:
                yield line

def write_lines_to_file(lines, outfile):
    with open(outfile, 'w') as the_file:
        for line in lines:
            ## add here an if statement if not all lines should be written to outfile
            the_file.write(line+'\n')

path = 'blah blah'
outfile = 'blah.csv'
files = get_all_files(path)
lines = get_all_data(files)
write_lines_to_file(lines, outfile)

如果你有很多包含很多内容的文件，我会使用生成器。这允许不将所有内容加载到内存中。以下是我将如何着手的：

def get_all_files(path):
    ## get a generator with all file names
    import os
    import glob
    return glob.iglob(os.path.join(path,'*.opd'))

def get_all_data(files):
    ## get a generator with all the data from all the files
    for fil in files:
        with open(fil, 'r') as the_file:
            for line in the_file:
                yield line

def write_lines_to_file(lines, outfile):
    with open(outfile, 'w') as the_file:
        for line in lines:
            ## add here an if statement if not all lines should be written to outfile
            the_file.write(line+'\n')

path = 'blah blah'
outfile = 'blah.csv'
files = get_all_files(path)
lines = get_all_data(files)
write_lines_to_file(lines, outfile)

这不是一个答案，只是一个延伸的评论

为什么要从左到右（200 x 5列宽）显示结果？如果您要转置数据，它会为以后添加额外的列提供更大的灵活性吗？例如：

a001   model   mo(data,1)   mo(data,1)   mo(data,1)
         1          1           2            3
         0.003      0.2        -0.1         -0.2
         0.002      0.0001      0.04         0.003

a002   model   mo(data,1)   mo(data,1)   mo(data,1)
         1          1           2            3
...

使其具有200 x 5列宽的困难在于需要填充列。如果一个文件丢失了信息，那么它可能会破坏您的整个结构。您还需要编写由所有200个文件中的一个片段组成的每一行。

这不是答案，只是一个扩展注释

为什么要从左到右（200 x 5列宽）显示结果？如果您要转置数据，它会为以后添加额外的列提供更大的灵活性吗？例如：

a001   model   mo(data,1)   mo(data,1)   mo(data,1)
         1          1           2            3
         0.003      0.2        -0.1         -0.2
         0.002      0.0001      0.04         0.003

a002   model   mo(data,1)   mo(data,1)   mo(data,1)
         1          1           2            3
...

使其具有200 x 5列宽的困难在于需要填充列。如果一个文件丢失了信息，那么它可能会破坏您的整个结构。您还需要从所有200个文件中写入由一个切片组成的每一行。

您可能不希望在openfolder函数中使用

openfolder（'C:\\path'，'C:\\path\\grab\u result.csv'）

，除非您想要无限循环，否则您可能不希望使用

openfolder（'C:\\path'，C:\\path\\grab\u result.csv'））

在openfolder函数中，除非您想要无限循环，否则您可以跳过

行=f.readlines（）

并迭代文件对象（这样，您就不会将整个文件放入内存）@JulienSpronck，哦，我的错误！谢谢您可以跳过

lines=f.readlines（）

并迭代文件对象（这样，您就不会将整个文件放入内存）@JulienSpronck，哦，我的错误！谢谢老实说，我不得不说这个程序比我的程序优雅而且速度快！我想知道我是否想在outfile.csv的第一行（作为标题）中添加文件名（a001、a002…等），我的意思是在我的每个数据之前添加标题，你知道如何实现吗？谢谢。要在每个数据之前添加文件名，您可以在生成所有文件内容之前生成文件名=>在fil in files的

行之后添加以下行：

在get_all_data（）中：

yield fil

。你可以把它改成任何你想要的字符串，而不是文件名，老实说，这个程序比我的程序优雅而且速度快！我想知道我是否想在outfile.csv的第一行（作为标题）中添加文件名（a001、a002…等），我的意思是在我的每个数据之前添加标题，你知道如何实现吗？谢谢。要在每个数据之前添加文件名，您可以在生成所有文件内容之前生成文件名=>在fil in files的

行之后添加以下行：

在get_all_data（）中：

yield fil

。您可以将其更改为任何您想要的字符串，而不是文件名或文件名之外的字符串。