Python 3.x 寻找多元时间序列/矩阵模式的最快方法_Python 3.x

Python 3.x 寻找多元时间序列/矩阵模式的最快方法

python-3.x

Python 3.x 寻找多元时间序列/矩阵模式的最快方法,python-3.x,Python 3.x,我有以下代码 a = [[1,2,3], [4,5,6], [7,8,9]] patterns = ('a',[[1,2],[8,9]]) for pattern in patterns[1]: for x in a: for index in range(len(x)): if x[index:index+len(pattern)]==pattern: x[index:index+len(pattern)]=[

我有以下代码

a = [[1,2,3], [4,5,6], [7,8,9]]

patterns = ('a',[[1,2],[8,9]])

for pattern in patterns[1]:
    for x in a:
        for index in range(len(x)):
            if x[index:index+len(pattern)]==pattern:
                x[index:index+len(pattern)]=[patterns[0] for p in pattern]

此代码查找多行模式，但不考虑模式的对齐，也不应在转换矩阵中的任何内容之前找到完整的模式。但如何做到这一点，目前我还没有意识到

从形式上讲，问题如下：

我有一个矩阵

matrix=[
[1,2,3],
[4,5,6],
[7,8,9]]

我想找到一个像

[1,2]
[any,5]

因此，模式[1,2]在下面任何一行中，第一个值为任意值，第二个值为5

or

[1]
[4]

因此，a 1和a 4在同一列中

or

[2,3]
[8,9]

因此，2和3在行内相邻，8和9相邻，而2和8在同一列内，3和9在同一列内

to transform the matrix into (given the first pattern and transforming it into 'a') 
output = [
[a,a,3],
[4,a,6],
[7,8,9]]

现在，我看了以下问题：，但我不是用正确的关键字搜索，就是这个问题是新的

我自己也会使用类似于

if matrix[index:index+len(pattern)]==pattern

一旦发现一个模式，就有额外的花絮在较低的行中检查，但是这太慢了，因为行的长度是数万行，而行几乎是千行

我需要在同一个矩阵上多次重复此搜索和替换操作，从而得到如下矩阵：

Given:
input = [
[1,2,3],
[4,5,6],
[1,8,9]]
and
a=[[1,2,any],
   [any,8,9]]
b=[[3],
   [6]]
c=[[4,5],
   [1,any]]

Output = [
[a,a,b],
[c,c,b],
[c,a,a]]

感谢您的关注，如果我的格式有任何错误，请告诉我这是我在Stack上的第一篇文章，这里有一个可能的方法。这只是尝试一次匹配模式的一行，如果成功，则移动到下一行，直到实现完全匹配。如果发生这种情况，它将用传递的替换字符串替换保存的索引

def replace_pattern(matrix, pattern_list, replacement, start_row, col=None, idxs=[]):

    # if the pattern list is empty we found our pattern, lets replace the idxs
    if (pattern_list == []):
        replace_idx(matrix, replacement, idxs)
        return True

    n_rows = len(matrix)
    n_cols = len(matrix[0])
    pattern = pattern_list[0]
    pattern_size = len(pattern)

    # impossible to complete pattern if we have more lines remaining in the pattern than in the matrix
    if (start_row + len(pattern_list) > n_rows):
        return False

    for row in range(start_row, n_rows):
        # if we already found part of the pattern previously we only need to check a fixed position
        if col != None:
            new_idxs = idxs + [(row, filter_idx(pattern, col))]
            if match(matrix[row][col : (col + pattern_size)], pattern) and replace_pattern(matrix, pattern_list[1:], replacement, row + 1, col, new_idxs):
                return True
        # if we have not found part of the pattern yet, we can search in every position of the current line
        else:
            for pos in range(0, n_cols - pattern_size + 1):
                new_idxs = idxs + [(row, filter_idx(pattern, pos))]
                if match(matrix[row][pos : (pos + pattern_size)], pattern) and replace_pattern(matrix, pattern_list[1:], replacement, row + 1, pos, new_idxs):
                    return True
    return False

def replace_idx(matrix, replacement, idxs):
    for entry in idxs:
        row = entry[0]
        for col in entry[1]:
            matrix[row][col] = replacement

此函数使用了一些帮助函数：

这个函数确定模式是否与某些值匹配

def match(values, pattern):
    for i in range(len(values)):
        if values[i] == 'any' or pattern[i] == 'any':
            continue
        else:
            if values[i] != pattern[i]:
                return False
    return True

这一个用“any”过滤掉模式索引，因为您不希望替换这些单元格

def filter_idx(pattern, col):
    pattern_size = len(pattern)
    l = []
    for i in range(col, col + pattern_size):
        if pattern[i - col] != 'any':
            l.append(i)
    return l

最后一个用传递的替换字符串替换（行，[cols]）对

def replace_pattern(matrix, pattern_list, replacement, start_row, col=None, idxs=[]):

    # if the pattern list is empty we found our pattern, lets replace the idxs
    if (pattern_list == []):
        replace_idx(matrix, replacement, idxs)
        return True

    n_rows = len(matrix)
    n_cols = len(matrix[0])
    pattern = pattern_list[0]
    pattern_size = len(pattern)

    # impossible to complete pattern if we have more lines remaining in the pattern than in the matrix
    if (start_row + len(pattern_list) > n_rows):
        return False

    for row in range(start_row, n_rows):
        # if we already found part of the pattern previously we only need to check a fixed position
        if col != None:
            new_idxs = idxs + [(row, filter_idx(pattern, col))]
            if match(matrix[row][col : (col + pattern_size)], pattern) and replace_pattern(matrix, pattern_list[1:], replacement, row + 1, col, new_idxs):
                return True
        # if we have not found part of the pattern yet, we can search in every position of the current line
        else:
            for pos in range(0, n_cols - pattern_size + 1):
                new_idxs = idxs + [(row, filter_idx(pattern, pos))]
                if match(matrix[row][pos : (pos + pattern_size)], pattern) and replace_pattern(matrix, pattern_list[1:], replacement, row + 1, pos, new_idxs):
                    return True
    return False

def replace_idx(matrix, replacement, idxs):
    for entry in idxs:
        row = entry[0]
        for col in entry[1]:
            matrix[row][col] = replacement

使用以下输入：

m = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

pattern_a = [[1, 2, 'any'],['any', 8, 9]]
pattern_b =[[3], [6]]
pattern_c = [[4, 5], [7, 'any']]

replace_pattern(m, pattern_a, 'a', 0)
replace_pattern(m, pattern_b, 'b', 0)
replace_pattern(m, pattern_c, 'c', 0)
print(m)

我得到了输出：

[['a', 'a', 'b'], ['c', 'c', 'b'], ['c', 'a', 'a']]

这是一个需求转储。请展示你所做的，并使用实数表示法，因为你目前所做的是不明确的/无意义的。你所说的实数表示法是什么意思，这个术语听起来非常模糊。例如，

[，5]

，什么是？啊，对了，我的错，我将用NA替换它，我只是不想让他们写实际的Python，并展示你的尝试这是完美的谢谢你，真聪明你是如何解决它的。现在我看到了，它看起来很简单，但我被困在这几天，所以再次感谢。