Python 使用通配符查找复杂的子字符串_Python_String_Wildcard

Python 使用通配符查找复杂的子字符串

python string

Python 使用通配符查找复杂的子字符串,python,string,wildcard,Python,String,Wildcard,我试图在长字符串中定位表达式的位置。表达式的工作原理如下。它由list1的任何元素给出，后跟1到5个单词的通配符（用空格分隔），后跟list2的任何元素。例如： list1=["a","b"], list2=["c","d"] text = "bla a tx fg hg gfgf tzt zt blaa a bli blubb d muh meh muh d" 应该返回“37”，因为这是表达式（“bli blubb d”）所在的位置。我研究了regex通配符，但我很难将其与列表的不同

我试图在长字符串中定位表达式的位置。表达式的工作原理如下。它由list1的任何元素给出，后跟1到5个单词的通配符（用空格分隔），后跟list2的任何元素。例如：

list1=["a","b"], list2=["c","d"]
text = "bla a tx fg hg gfgf tzt zt blaa  a  bli blubb d  muh meh  muh d"

应该返回“37”，因为这是表达式（“bli blubb d”）所在的位置。我研究了regex通配符，但我很难将其与列表的不同元素以及通配符的可变长度结合起来

谢谢你的建议

您可以构造一个正则表达式：

import re

pref=["a","b"]
suff=["c","d"]

# the pattern is dynamically constructed from your pref and suff lists.
patt = r"(?:\W|^)((?:" + '|'.join(pref) + r")(?: +[^ ]+){1,5} +(?:" + '|'.join(suff) + r"))(?:\W|$)"

text = "bla a tx fg hg gfgf tzt zt blaa  a  bli blubb d  muh meh  muh d"

print(patt)

for k in re.findall(patt,text):
    print(k, "\n", text.index(k))

输出：

(?:\W|^)((?:a|b)(?: +[^ ]+){1,5} +(?:c|d))(?:\W|$)  # pattern
a  bli blubb d                                      # found text
33                                                  # position (your 37 is wrong btw.)

公平警告：这不是一个非常稳健的方法

正则表达式类似于：

Either start of line or non-text character (not captured) followed by
one of your prefs. followed by 1-n spaces, followed by 1-5 non-space things that 
are seperated by 1-n spaces, followed by something from suff followed
by (non captured non-Word-Character or end of line)

对于演示和组装正则表达式的更完整描述：请参见

是否“a blaa a bli blubb d”也是有效的结果？是的，我明白你的观点，该示例选择不当，因此我对其进行了编辑。这种模式不会自然出现在文本中…哇，你打败了我。我有一个几乎相同的解决方案！有一些细微的区别，我的最终输出模式看起来像

（\W+|^）（（A | b）\W+（\W+\W+{1,5}（c | d））（\W+|$）

以允许在行尾的开始处使用多个非单词字符，我在模式中为m使用了

。finditer（text）：print（m.start（2））

，而不使用非捕获组，只需从迭代器中抓取第二组。@Davos将其发布为第二种方式：）不，我认为它的不同程度不足以保证更多的评论：）