用Python读写数据（Matlab用户）_Python

用Python读写数据（Matlab用户）

python

用Python读写数据（Matlab用户）,python,Python,我正试图从Matlab开始学习Python，所以从一开始，我就在阅读和编写数据。我被Matlab优秀的自包含文档宠坏了，我很难找到在Python中遵循的最佳方法，因为在Matlab中主要是通过fopen、textscan、fgetl、regexp和fprintf实现的。我见过一些人提倡numpy.loadtxt（&savetxt），而另一些人则提倡“以open（…）作为f:for line…”的方法——在这种情况下，我可能需要后者来读取字符串标题列，后跟一个大小未知的浮点数矩阵。我已经编写了示例

我正试图从Matlab开始学习Python，所以从一开始，我就在阅读和编写数据。我被Matlab优秀的自包含文档宠坏了，我很难找到在Python中遵循的最佳方法，因为在Matlab中主要是通过fopen、textscan、fgetl、regexp和fprintf实现的。我见过一些人提倡numpy.loadtxt（&savetxt），而另一些人则提倡“以open（…）作为f:for line…”的方法——在这种情况下，我可能需要后者来读取字符串标题列，后跟一个大小未知的浮点数矩阵。我已经编写了示例Matlab代码：

读取要读取的文件的路径和名称（如文本文件中所述），并将它们组合成单个字符串

从1确定文件中的标题数，以及标题下以逗号分隔的浮动数据矩阵的大小

从1读取文件中的头和矩阵，分为两个变量

将标题和矩阵写入另一个文件

这些步骤中的一些，例如2-3，可以在实践中结合使用，但在这里将它们分开将有助于我完成一些不同的任务。这可能更像是“请与我分享这个通用任务的Python最佳编码实践”，而不是一个非常具体的问题，但我希望这对其他Python新用户也很有用。感谢您提供任何特定的Python代码和/或参考资料

%%
function ReadWrite()
tic
f=readPaths();
[t,n]=pullSize(f);
[hdr,d]=readData(f,t,n);
writeData(hdr,d);
toc
end
%%
function f=readPaths
fid=fopen('Paths.txt','r');
f=textscan(fid,'%s%s','delimiter','\t','headerlines',1);
fclose(fid);
f=char(fullfile(f{1},f{2}));
end
%%
function [t,n]=pullSize(f)
n=0;
fid=fopen(f,'r');
l=fgetl(fid);
h=isempty(regexp(l,',','once')); % headers are not comma delimited
while h
    n=n+1;
    l=fgetl(fid);
    h=isempty(regexp(l,',','once'));
end
fclose(fid);
t=length(regexp(l,','))+1; % file is comma delimited
end
%%
function [hdr,d]=readData(f,t,n)
fid=fopen(f,'r');
hdr=textscan(fid,'%s',n);
d=textscan(fid,repmat('%f',1,t),'delimiter',',');
fclose(fid);
d=[d{:}];
hdr=[hdr{:}];
end
%%
function writeData(hdr,d)
fid=fopen('DataTest.csv','w');
for i=1:length(hdr)
    fprintf(fid,'%s\n',hdr{i});
end
fprintf(fid,[repmat('%.4f,',1,size(d,2)-1),'%.4f\n'],d');
fclose(fid);
end

这似乎是您可能想要使用Pandas库的原因。Pandas有一个read_csv方法，它完全按照它的声音来做，并将数据存储在所谓的DataFrame中，您基本上可以将其视为excel电子表格

该功能类似于R的dataframes或data.table包。

您可以使用Python标准库中的csv模块来实现这一点

import csv

with open('path/to/file.csv', 'r') as f:
    dict_reader = csv.DictReader(f)

with open('path/to/output.csv', 'w') as w:
    dict_writer = csv.DictWriter(w, dict_reader.fieldnames)
    dict_writer.writeheader()
    dict_writer.writerows(dict_reader)

谢谢大家。我最初是在讨论csv模块和其他一些您建议的东西，但最终主要使用了numpy的load[save]txt以及标准Python的readline和其他一些杂项库。我今天刚刚回到这里，花了一段时间才弄清楚嵌套函数、读/写格式等等，但我复制了代码，它只比我的Matlab版本慢一点——包括下面为子孙后代准备的：

import numpy as np
import os
from itertools import islice
import time


def readwrite():
    tic = time.time()
    f = read_paths('Paths.txt')
    n = pull_size(f)
    hdr, d = read_data(f, n)
    write_data('Data_Py.txt', hdr, d)
    toc = time.time()
    with open('Runtime_Py.txt', 'w') as fid:
        fid.write("Elapsed time is %.6f seconds." % (toc - tic))


def read_paths(f):
    f = np.loadtxt(f, dtype='str', delimiter='\t', skiprows=1)
    return os.path.join(f[0], f[1])


def pull_size(f):
    n = 0
    with open(f, 'r') as fid:
        l = fid.readline()
        h = not(l.count(','))  # headers are not comma delimited
        while h:
            n = n + 1
            l = fid.readline()
            h = not (l.count(','))
    # t = l.count(',') + 1  # file is comma delimited
    return n


def read_data(f, n):
    with open(f, 'r') as fid:
        hdr = ''.join(list(islice(fid, n)))
    hdr = hdr.rstrip('\n')
    d = np.loadtxt(f, dtype='float', delimiter=',', skiprows=n)
    return hdr, d


def write_data(f, hdr, d):
    np.savetxt(f, d, fmt='%.2f', delimiter=',', newline='\n', header=hdr, 
comments='')

readwrite()

您可能还想为那些熟悉阅读您的

matlab

code的用户添加标签谢谢，但我最初添加了标签，主持人删除了matlab标志。针对此类问题，有一个代码审查堆栈交换网站。谢谢，我在谷歌搜索pandas，看到一些人说它比numpy快得多。加载[保存]所以我以后可能会更深入地研究这个问题。