Python 在正则表达式中查找介于“|”之间的句子_Python_Regex

Python 在正则表达式中查找介于“|”之间的句子

python regex

Python 在正则表达式中查找介于“|”之间的句子,python,regex,Python,Regex,我在寻找介于这两者之间的任何东西；“|”在我从网站上搜集的数据中。我注意到，“|”分隔了我感兴趣的所有东西 [{somethingssomething | title=你好！\n | subtitle=你好吗\n | subsubtitle=我很好，谢谢\n}] 我想打印： title=hello there! subtitle=how are you subsubtitle= I'm good, thanks 我想我应该使用“向后看”和“向前看”，比如，但是当它位于“|”字符之间时，它就不

我在寻找介于这两者之间的任何东西；“|”在我从网站上搜集的数据中。我注意到，“|”分隔了我感兴趣的所有东西

[{somethingssomething | title=你好！\n | subtitle=你好吗\n | subsubtitle=我很好，谢谢\n}]

我想打印：

title=hello there!
subtitle=how are you
subsubtitle= I'm good, thanks

我想我应该使用“向后看”和“向前看”，比如，但是当它位于“|”字符之间时，它就不起作用了

我猜是这样的：

一种可能的解决方案：

regex = re.compile(r'\["\{([^}]+)\}"\]')
match = regex.match('["{somethingsomething|title=hello there!\n|subtitle=how are you\n|subsubtitle=I\'m good, thanks\n}"]')
match.groups()[0].split('|')

-> ['somethingsomething', 'title=hello there!\n', 'subtitle=how are you\n', "subsubtitle=I'm good, thanks\n"]

您可能希望稍后重新触发字符串。

一个可能的解决方案：

regex = re.compile(r'\["\{([^}]+)\}"\]')
match = regex.match('["{somethingsomething|title=hello there!\n|subtitle=how are you\n|subsubtitle=I\'m good, thanks\n}"]')
match.groups()[0].split('|')

-> ['somethingsomething', 'title=hello there!\n', 'subtitle=how are you\n', "subsubtitle=I'm good, thanks\n"]

您可能希望稍后重新触发字符串。

您可以使用以下方法：

只有在处理复杂字符串时才需要正则表达式。这样的简单字符串只能使用字符串函数处理：

a = "[\"{somethingsomething|title=hello there!\n|subtitle=how are you\n|subsubtitle=I'm good, thanks\n}\"]"
b = a.lstrip('["{')
c = b.rstrip('}"]')
c.split('|')
# ['somethingsomething',
# 'title=hello there!\n',
# 'subtitle=how are you\n',
# "subsubtitle=I'm good, thanks\n"]

只有在处理复杂字符串时才需要正则表达式。这样的简单字符串只能使用字符串函数处理：

a = "[\"{somethingsomething|title=hello there!\n|subtitle=how are you\n|subsubtitle=I'm good, thanks\n}\"]"
b = a.lstrip('["{')
c = b.rstrip('}"]')
c.split('|')
# ['somethingsomething',
# 'title=hello there!\n',
# 'subtitle=how are you\n',
# "subsubtitle=I'm good, thanks\n"]

我想你可以做到：

string = '["{somethingsomething|title=hello there!\n|subtitle=how are you\n|subsubtitle=I\'m good, thanks\n}"]'
string = string[3:-3]
# crop the three first and last characters from the string
sentences = string.split('|')
title = sentences[1]
...

这将包括结果中的title=，我认为您可以：

string = '["{somethingsomething|title=hello there!\n|subtitle=how are you\n|subsubtitle=I\'m good, thanks\n}"]'
string = string[3:-3]
# crop the three first and last characters from the string
sentences = string.split('|')
title = sentences[1]
...

这将包含结果中的title=，如果您确实必须为此使用正则表达式，请不要使用不必要的lookback和lookahead使其过于复杂。这些位是您试图匹配的模式的一部分，只需按如下方式使用它们：

title=(.*?)[|]subtitle=(.*?)[|]subsubtitle=(.*?)}

请注意，我还将|包含在前缀中，因为否则|字符将作为每个组的一部分结束。我把你们每个贪婪的群体都变成了非贪婪的群体？。如果要匹配所有的组，这实际上是不必要的，但在您的原始示例中，这就是标题最终包括sub之前的所有内容，并且subsubtitle最终作为副标题的原因。最后，我把}放在末尾，这样你就不会把整个外部分组作为子标题的一部分。

如果你真的必须为此使用正则表达式，不要用不必要的向后看和向前看将它们过度复杂化。这些位是您试图匹配的模式的一部分，只需按如下方式使用它们：

title=(.*?)[|]subtitle=(.*?)[|]subsubtitle=(.*?)}

如果你想用正则表达式来解决这个问题，那么有一种方法如下

s = ["{somethingsomething|title=hello there!\n|subtitle=how are you\n|subsubtitle=I'm good, thanks\n}"]

match = re.search(r'title=(.*)\n', s[0])
if match:
    print "title={0}".format(match.group(1))

match = re.search(r'subtitle=(.*)\n', s[0])
if match:
    print "subtitle={0}".format(match.group(1))

match = re.search(r'subsubtitle=(.*)\n', s[0])
if match:
    print "subsubtitle={0}".format(match.group(1))

如果您想使用正则表达式解决这个问题，那么有一种方法如下

s = ["{somethingsomething|title=hello there!\n|subtitle=how are you\n|subsubtitle=I'm good, thanks\n}"]

match = re.search(r'title=(.*)\n', s[0])
if match:
    print "title={0}".format(match.group(1))

match = re.search(r'subtitle=(.*)\n', s[0])
if match:
    print "subtitle={0}".format(match.group(1))

match = re.search(r'subsubtitle=(.*)\n', s[0])
if match:
    print "subsubtitle={0}".format(match.group(1))

如果您希望正则表达式具有lookahead和lookahead，可以尝试以下操作：

In [1]: import re

In [2]: s = "{somethingsomething|title=hello there!\n|subtitle=how are you\n|subsubtitle=I'm good, thanks\n}"

In [3]: m = re.findall(r"""(?<=\|)(?P<foo>.*?)(?:\=)(?P<bar>.*?(?=\n))""", s)

In [4]: for i,j in m:
   ...:     print "{} = {}".format(i,j)
   ...:     
title = hello there!
subtitle = how are you
subsubtitle = I'm good, thanks

如果您希望正则表达式具有lookahead和lookahead，可以尝试以下操作：

In [1]: import re

In [2]: s = "{somethingsomething|title=hello there!\n|subtitle=how are you\n|subsubtitle=I'm good, thanks\n}"

In [3]: m = re.findall(r"""(?<=\|)(?P<foo>.*?)(?:\=)(?P<bar>.*?(?=\n))""", s)

In [4]: for i,j in m:
   ...:     print "{} = {}".format(i,j)
   ...:     
title = hello there!
subtitle = how are you
subsubtitle = I'm good, thanks

subsubtitle=我很好，谢谢，这有什么资格？？？它不在两者之间|……您的模式有两个问题。首先，*是一个贪婪的匹配。其次，你没有把|放在图案的任何地方。组合这两个，标题将匹配到最后的子标题=，这恰好是子字幕中间的一个=。你可以这样做。*？、或者？=\\\\ subtitle=。但是，更简单地说，不要一开始就使用那些“向后看”和“向前看”；简单的标题=.*有什么问题？\\124; subtitle=..*？\\124; subsubtitle=.*？subsubtitle=我怎么样了，谢谢？？？它不在两者之间|……您的模式有两个问题。首先，*是一个贪婪的匹配。其次，你没有把|放在图案的任何地方。组合这两个，标题将匹配到最后的子标题=，这恰好是子字幕中间的一个=。你可以这样做。*？、或者？=\\\\ subtitle=。但是，更简单地说，不要一开始就使用那些“向后看”和“向前看”；简单的标题有什么问题？\\124; subtitle=.*？\\124; subsubtitle=.*句子[0]是一些东西，而不是标题。仅仅使用句子[1]是没有帮助的；你仍然需要拆分=。哦，糟糕，我以为他想打印“标签”句子[0]是一些东西，而不是标题。仅仅使用句子[1]是没有帮助的；你仍然需要拆分=。哦，糟糕，我以为他想打印“标签”