Python np.loadtxt，用于包含多个矩阵的文件_Python_Numpy_Matrix

Python np.loadtxt，用于包含多个矩阵的文件

python numpy matrix

Python np.loadtxt，用于包含多个矩阵的文件,python,numpy,matrix,Python,Numpy,Matrix,我有一个类似以下内容的文件： some text the grids are 3 x 3 more text matrix marker 1 1 3 2 4 7 4 2 9 1 1 new matrix 2 4 9 4 1 1 3 4 4 3 1 new matrix 3 3 7 2 1 1 3 4 2 3 2 start = re.compile("\w+\s+matrix\s+(\d+)\s+(\d+)\n") end = re.compile("\n\n")

我有一个类似以下内容的文件：

some text
the grids are 
       3 x 3

more text

matrix marker 1 1
3 2 4
7 4 2
9 1 1

new matrix  2 4
9 4 1
1 3 4
4 3 1

new matrix  3 3
7 2 1
1 3 4
2 3 2

start = re.compile("\w+\s+matrix\s+(\d+)\s+(\d+)\n")
end = re.compile("\n\n")

。。该文件将继续，几个3x3矩阵以相同的方式出现。每个矩阵前面都有一个唯一ID的文本，尽管ID对我来说并不特别重要。我想创建这些矩阵的矩阵。我可以用loadtxt来做吗

这是我最好的尝试。此代码中的

可以替换为一个迭代变量，该变量从6开始，按矩阵中的行数递增。我以为

skiprows

会接受一个列表，但显然它只接受整数

np.loadtxt(fl, skiprows = [x for x in range(nlines) if x not in (np.array([1,2,3])+ 6)])

TypeError                                 Traceback (most recent call last)
<ipython-input-23-7d82fb7ef14a> in <module>()
----> 1 np.loadtxt(fl, skiprows = [x for x in range(nlines) if x not in (np.array([1,2,3])+ 6)])

/usr/local/lib/python2.7/site-packages/numpy/lib/npyio.pyc in loadtxt(fname, dtype, comments, delimiter, converters, skiprows, usecols, unpack, ndmin)
    932 
    933         # Skip the first `skiprows` lines
--> 934         for i in range(skiprows):
    935             next(fh)
    936

np.loadtxt（fl，skiprows=[x代表范围内的x（nlines），如果x不在（np.array（[1,2,3]）+6）]）
TypeError回溯（最近一次调用上次）
在（）
---->1 np.loadtxt（fl，skiprows=[x代表范围内的x（nlines），如果x不在（np.array（[1,2,3]）+6）]）
/loadtxt中的usr/local/lib/python2.7/site-packages/numpy/lib/npyio.pyc（fname、dtype、comments、delimiter、converter、skiprows、usecols、unpack、ndmin）
932
933#跳过第一行“skiprows”
-->934适用于范围内的i（skiprows）：
935下一站（fh）
936

您需要更改处理工作流以使用以下步骤：首先，提取与所需矩阵对应的子字符串，然后调用

numpy.loadtxt

。要做到这一点，最好的方法是：

查找以

re

开头和结尾的矩阵

在该范围内加载矩阵

重置您的范围并继续

您的矩阵标记似乎多种多样，因此可以使用如下正则表达式：

some text
the grids are 
       3 x 3

more text

matrix marker 1 1
3 2 4
7 4 2
9 1 1

new matrix  2 4
9 4 1
1 3 4
4 3 1

new matrix  3 3
7 2 1
1 3 4
2 3 2

start = re.compile("\w+\s+matrix\s+(\d+)\s+(\d+)\n")
end = re.compile("\n\n")

然后，您可以找到开始/结束对，然后加载每个矩阵的文本：

import io
import numpy as np

# read our data
data = open("/path/to/file.txt").read()

def load_matrix(data, *args):
    # find start and end bounds
    s = start.search(data)
    if not s:
        # no matrix leftover, return None
        return None
    e = end.search(data, s.end())
    e_index = e.end() if e else len(data)

    # load text
    buf = io.StringIO(data[s.end(): e_index])
    matrix = np.loadtxt(buf, *args)    # add other args here

    # reset our buffer
    data = data[e_index:]

    return matrix

创意

在本例中，矩阵开头的正则表达式标记具有矩阵维度的捕获组

（\d+）

，因此，如果您愿意，可以获得矩阵的

MxN

表示。列表项然后我还会搜索行中带有单词“matrix”的项，其中包含任意前导文本和两个数字，最后用空格分隔

结尾的匹配是两个“\n\n”组，或者是两个换行符（如果有Windows行结尾，也可能需要考虑“\r”）。 自动执行此操作

现在我们有了一种查找单个案例的方法，您所需要做的就是在仍然获得匹配项的情况下，迭代此操作并填充矩阵列表

matrices = []

# read our data
data = open("/path/to/file.txt").read()

while True:
    result = load_matrix(data, ...)     # pass other arguments to loadtxt
    if not result:
        break
    matrices.append(result)

也许我误解了，但是如果您可以匹配3x3矩阵前面的行，那么您可以创建一个生成器来馈送到

loadtxt

：

import numpy as np

def get_matrices(fs):
    while True:
        line = next(fs)
        if not line:
            break
        if 'matrix' in line: # or whatever matches the line before a matrix
            yield next(fs)
            yield next(fs)
            yield next(fs)


with open('matrices.dat') as fs:
    g = get_matrices(fs)
    M = np.loadtxt(g)

M = M.reshape((M.size//9, 3, 3))
print(M)

如果你喂它：

some text
the grids are 
       3 x 3

more text

matrix marker 1 1
3 2 4
7 4 2
9 1 1

new matrix  2 4
9 4 1
1 3 4
4 3 1

new matrix  3 3
7 2 1
1 3 4
2 3 2

new matrix  7 6
1 0 1
2 0 3
0 1 2

您将得到一个矩阵数组：

[[[ 3.  2.  4.]
  [ 7.  4.  2.]
  [ 9.  1.  1.]]

 [[ 9.  4.  1.]
  [ 1.  3.  4.]
  [ 4.  3.  1.]]

 [[ 7.  2.  1.]
  [ 1.  3.  4.]
  [ 2.  3.  2.]]

 [[ 1.  0.  1.]
  [ 2.  0.  3.]
  [ 0.  1.  2.]]]

或者，如果您只想

生成

看起来可能是3x3整数矩阵中的行的所有行，请与正则表达式匹配：

import re

def get_matrices(fs):
    while True:
        line = next(fs)
        if not line:
            break
        if re.match('\d+\s+\d+\s+\d+', line):
            yield line

一个电话不行。我建议您使用自己的文件

readlines

读取该文件，并将简单的数字块（列数一致的行）传递给

loadtxt

，甚至直接解析它们。这些9个数字块应该很容易解析。@hpaulj如何将数字块传递到

loadtxt

？你的建议基本上就是我在文章中尝试做的。另外，请记住这是一个简化的问题。我的真实案例有30x30个矩阵。请尝试向其传递字符串/行列表。