Python 3.x 寻找多元时间序列/矩阵模式的最快方法

Python 3.x 寻找多元时间序列/矩阵模式的最快方法,python-3.x,Python 3.x,我有以下代码 a = [[1,2,3], [4,5,6], [7,8,9]] patterns = ('a',[[1,2],[8,9]]) for pattern in patterns[1]: for x in a: for index in range(len(x)): if x[index:index+len(pattern)]==pattern: x[index:index+len(pattern)]=[

我有以下代码

a = [[1,2,3], [4,5,6], [7,8,9]]

patterns = ('a',[[1,2],[8,9]])

for pattern in patterns[1]:
    for x in a:
        for index in range(len(x)):
            if x[index:index+len(pattern)]==pattern:
                x[index:index+len(pattern)]=[patterns[0] for p in pattern]


此代码查找多行模式,但不考虑模式的对齐,也不应在转换矩阵中的任何内容之前找到完整的模式。但如何做到这一点,目前我还没有意识到


从形式上讲,问题如下:

我有一个矩阵

matrix=[
[1,2,3],
[4,5,6],
[7,8,9]]
我想找到一个像

[1,2]
[any,5]

因此,模式[1,2]在下面任何一行中,第一个值为任意值,第二个值为5

or

[1]
[4]
因此,a 1和a 4在同一列中

or

[2,3]
[8,9]
因此,2和3在行内相邻,8和9相邻,而2和8在同一列内,3和9在同一列内

to transform the matrix into (given the first pattern and transforming it into 'a') 
output = [
[a,a,3],
[4,a,6],
[7,8,9]]
现在,我看了以下问题: ,但我不是用正确的关键字搜索,就是这个问题是新的

我自己也会使用类似于

if matrix[index:index+len(pattern)]==pattern
一旦发现一个模式,就有额外的花絮在较低的行中检查,但是这太慢了,因为行的长度是数万行,而行几乎是千行

我需要在同一个矩阵上多次重复此搜索和替换操作,从而得到如下矩阵:

Given:
input = [
[1,2,3],
[4,5,6],
[1,8,9]]
and
a=[[1,2,any],
   [any,8,9]]
b=[[3],
   [6]]
c=[[4,5],
   [1,any]]

Output = [
[a,a,b],
[c,c,b],
[c,a,a]]

感谢您的关注,如果我的格式有任何错误,请告诉我这是我在Stack上的第一篇文章,这里有一个可能的方法。这只是尝试一次匹配模式的一行,如果成功,则移动到下一行,直到实现完全匹配。如果发生这种情况,它将用传递的替换字符串替换保存的索引

def replace_pattern(matrix, pattern_list, replacement, start_row, col=None, idxs=[]):

    # if the pattern list is empty we found our pattern, lets replace the idxs
    if (pattern_list == []):
        replace_idx(matrix, replacement, idxs)
        return True

    n_rows = len(matrix)
    n_cols = len(matrix[0])
    pattern = pattern_list[0]
    pattern_size = len(pattern)

    # impossible to complete pattern if we have more lines remaining in the pattern than in the matrix
    if (start_row + len(pattern_list) > n_rows):
        return False

    for row in range(start_row, n_rows):
        # if we already found part of the pattern previously we only need to check a fixed position
        if col != None:
            new_idxs = idxs + [(row, filter_idx(pattern, col))]
            if match(matrix[row][col : (col + pattern_size)], pattern) and replace_pattern(matrix, pattern_list[1:], replacement, row + 1, col, new_idxs):
                return True
        # if we have not found part of the pattern yet, we can search in every position of the current line
        else:
            for pos in range(0, n_cols - pattern_size + 1):
                new_idxs = idxs + [(row, filter_idx(pattern, pos))]
                if match(matrix[row][pos : (pos + pattern_size)], pattern) and replace_pattern(matrix, pattern_list[1:], replacement, row + 1, pos, new_idxs):
                    return True
    return False
def replace_idx(matrix, replacement, idxs):
    for entry in idxs:
        row = entry[0]
        for col in entry[1]:
            matrix[row][col] = replacement
此函数使用了一些帮助函数:

这个函数确定模式是否与某些值匹配

def match(values, pattern):
    for i in range(len(values)):
        if values[i] == 'any' or pattern[i] == 'any':
            continue
        else:
            if values[i] != pattern[i]:
                return False
    return True
这一个用“any”过滤掉模式索引,因为您不希望替换这些单元格

def filter_idx(pattern, col):
    pattern_size = len(pattern)
    l = []
    for i in range(col, col + pattern_size):
        if pattern[i - col] != 'any':
            l.append(i)
    return l
最后一个用传递的替换字符串替换(行,[cols])对

def replace_pattern(matrix, pattern_list, replacement, start_row, col=None, idxs=[]):

    # if the pattern list is empty we found our pattern, lets replace the idxs
    if (pattern_list == []):
        replace_idx(matrix, replacement, idxs)
        return True

    n_rows = len(matrix)
    n_cols = len(matrix[0])
    pattern = pattern_list[0]
    pattern_size = len(pattern)

    # impossible to complete pattern if we have more lines remaining in the pattern than in the matrix
    if (start_row + len(pattern_list) > n_rows):
        return False

    for row in range(start_row, n_rows):
        # if we already found part of the pattern previously we only need to check a fixed position
        if col != None:
            new_idxs = idxs + [(row, filter_idx(pattern, col))]
            if match(matrix[row][col : (col + pattern_size)], pattern) and replace_pattern(matrix, pattern_list[1:], replacement, row + 1, col, new_idxs):
                return True
        # if we have not found part of the pattern yet, we can search in every position of the current line
        else:
            for pos in range(0, n_cols - pattern_size + 1):
                new_idxs = idxs + [(row, filter_idx(pattern, pos))]
                if match(matrix[row][pos : (pos + pattern_size)], pattern) and replace_pattern(matrix, pattern_list[1:], replacement, row + 1, pos, new_idxs):
                    return True
    return False
def replace_idx(matrix, replacement, idxs):
    for entry in idxs:
        row = entry[0]
        for col in entry[1]:
            matrix[row][col] = replacement
使用以下输入:

m = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

pattern_a = [[1, 2, 'any'],['any', 8, 9]]
pattern_b =[[3], [6]]
pattern_c = [[4, 5], [7, 'any']]

replace_pattern(m, pattern_a, 'a', 0)
replace_pattern(m, pattern_b, 'b', 0)
replace_pattern(m, pattern_c, 'c', 0)
print(m)
我得到了输出:

[['a', 'a', 'b'], ['c', 'c', 'b'], ['c', 'a', 'a']]

这是一个需求转储。请展示你所做的,并使用实数表示法,因为你目前所做的是不明确的/无意义的。你所说的实数表示法是什么意思,这个术语听起来非常模糊。例如,
[,5]
,什么是?啊,对了,我的错,我将用NA替换它,我只是不想让他们写实际的Python,并展示你的尝试这是完美的谢谢你,真聪明你是如何解决它的。现在我看到了,它看起来很简单,但我被困在这几天,所以再次感谢。