合并两个不同的Python RexEx搜索_Python

合并两个不同的Python RexEx搜索

python

合并两个不同的Python RexEx搜索,python,Python,Python正则表达式的问题… import re mystring="<https://myurl:6001/alerts|ebe182d2> [Open] Oracle Memory Alert - [Alerting] Oracle Memory Alert" a=re.compile('<.*\|(.*)>.*\[(.*)\].*([O|o]racle).*(Memory Alert)') matches=a.search(mystring) if matche

Python正则表达式的问题…

import re
mystring="<https://myurl:6001/alerts|ebe182d2> [Open] Oracle Memory Alert - [Alerting] Oracle Memory Alert"
a=re.compile('<.*\|(.*)>.*\[(.*)\].*([O|o]racle).*(Memory Alert)')

matches=a.search(mystring)
if matches:
 print("matching")
 print("ID=",matches.group(1),"Status=",matches.group(2),"alert=",matches.group(3))
else:
 print("no match")

mystring="<https://myurl:6001/alerts|xvf381h1> [Open] Mongo Disk Alert - [Causing] Disk is full"
a=re.compile('<.*\|(.*)>.*\[(.*)\].*([M|m]ongo).*(Disk Alert)')
matches=a.search(mystring)
if matches:
 print("matching")
 print("ID=",matches.group(1),"Status=",matches.group(2),"alert=",matches.group(3))
else:
 print("no match")

需要将这两个正则表达式搜索合并为一个（应该可以）基于我的两个不同输入（mystring）

不知道为什么第一次搜索总是考虑第二组括号（请参阅oracle警报）

但是，我的第二次搜索正确地返回了第一组支架（参考Mongo）

感谢您的帮助……

import re
mystring="<https://myurl:6001/alerts|ebe182d2> [Open] Oracle Memory Alert - [Alerting] Oracle Memory Alert"
a=re.compile('<.*\|(.*)>.*\[(.*)\].*([O|o]racle).*(Memory Alert)')

matches=a.search(mystring)
if matches:
 print("matching")
 print("ID=",matches.group(1),"Status=",matches.group(2),"alert=",matches.group(3))
else:
 print("no match")

mystring="<https://myurl:6001/alerts|xvf381h1> [Open] Mongo Disk Alert - [Causing] Disk is full"
a=re.compile('<.*\|(.*)>.*\[(.*)\].*([M|m]ongo).*(Disk Alert)')
matches=a.search(mystring)
if matches:
 print("matching")
 print("ID=",matches.group(1),"Status=",matches.group(2),"alert=",matches.group(3))
else:
 print("no match")

下面是一个版本的代码，它通过减少匹配的贪婪性来修复#2问题。最好同时更改这两种模式，因为这会使第二种模式更加正确，即使您以任何一种方式获得正确的结果：

import re

mystring = "<https://myurl:6001/alerts|ebe182d2> [Open] Oracle Memory Alert - [Alerting] Oracle Memory Alert"
a = re.compile('<.*\|(.*)>.*?\[(.*?)\].*?([O|o]racle).*?(Memory Alert)')

matches = a.search(mystring)
if matches:
    print("matching")
    print("ID=", matches.group(1), "Status=", matches.group(2), "alert=", matches.group(3))
else:
    print("no match")

mystring = "<https://myurl:6001/alerts|xvf381h1> [Open] Mongo Disk Alert - [Causing] Disk is full"
a = re.compile('<.*\|(.*)>.*?\[(.*?)\].*?([M|m]ongo).*?(Disk Alert)')
matches = a.search(mystring)
if matches:
    print("matching")
    print("ID=", matches.group(1), "Status=", matches.group(2), "alert=", matches.group(3))
else:
    print("no match")

为了让您了解我的问题（在我的评论中）…如果您关心速度，您只想搜索这两种情况，并且您相信子字符串“Oracle内存警报”和“Mongo磁盘警报”将很少出现，那么，您最好只查找那些根本没有正则表达式的字符串。如果您找到了它们，则应用正则表达式测试来查看候选项是否真的匹配。根据您对数据集的了解，有很多方法可以高度优化测试。例如，您不需要在字符串的开头查找这两种模式。想必你可以找到一个地方，开始寻找那些在这些模式出现之前，你感到舒适的字符串

如果您希望匹配的内容相对较少出现，那么在性能方面，您所关心的只是知道字符串在大多数情况下（但不一定在所有情况下）不匹配的速度。您需要考虑如何快速排除大量字符串。一旦你没能排除这个字符串，不管你花多长时间来判断它是否真的匹配，如果你只是千载难逢的话

如果您不关心速度，而只关心可读性，那么最好的方法可能是使用您现在拥有的大部分内容单独测试每个案例。大多数优化都会降低代码的可读性

对于这个问题，可能有几十种（如果不是几百种的话）潜在的“正确”解决方案。这完全取决于数据集的特征，即哪一个是最好的。你不会经常遇到这样的问题，没有一个或几个正确的答案

给我一些更多的信息，我可以根据您的要求帮助您优化

哦……这是对#1的一种回答。我额外假设a）你不会看到小写的Mongo或Oracle，b）单词之间的空格始终是一个字符。如果你能做出这两个假设，那么这个表达式比你原来的两个要快得多。你必须调整你的逻辑，只取两个主要匹配词中的第一个词。不过，如果你几乎看不到这些字符串，那么所需的时间就无关紧要了

a = re.compile('<.*\|(.*)>.*?\[(.*?)\].*?(Oracle Memory Alert|Mongo Disk Alert)')

a=re.compile（'.*？\[（.*？\].*（Oracle内存警报| Mongo磁盘警报）'））

您的限制条件是什么？这些都有多重要？1）积极匹配的速度2）消极匹配的速度3）可读性4）添加额外案例的能力（在可预见的未来有多少个）？5）算法很酷。成功与测试字符串总数的近似比率是多少？对#2和#3的回答是，您使用的是贪婪匹配，这会导致第一个案例找到“Oracle内存警报”的第二个实例，而不是第一个实例。在第二种情况下不会出现这种情况，因为“Mongo磁盘警报”在测试字符串中只出现一次-感谢史蒂夫的快速转变。我还没有收集类似于我分享的示例的项目列表。我相信这个列表会增加到50个警报。解决50个警报中的#1使它成为一个不同于您向我展示的问题。话虽如此，我希望您可以对数据进行概括，从而得出一个表达式。只是不要太在意那件事。如果你通常可以排除做多次测试的可能性，那么做多次测试本身并没有什么错。我真的很好奇。你知道比赛和非比赛的最终比例是多少吗？是的，你说得对！我会做更多的测试，找出所有的组合，找出一个共同的表达。谢谢！我相信非比赛的情况会少很多，但还没有确定下来。

a = re.compile('<.*\|(.*)>.*?\[(.*?)\].*?(Oracle Memory Alert|Mongo Disk Alert)')