python正则表达式匹配"；ab"；或；ba"；话_Python_Regex

python正则表达式匹配"；ab"；或；ba"；话

python regex

python正则表达式匹配"；ab"；或；ba"；话,python,regex,Python,Regex,我试着匹配单词，包括字母“ab”或“ba”，例如“ab”lition，f“ab”rics，pro“ba”ble。我提出了以下正则表达式： r"[Aa](?=[Bb])[Bb]|[Bb](?=[Aa])[Aa]" 但它包括以“，（，），/…非字母数字字符开头或结尾的单词。我如何删除它？我只想匹配单词列表 import sys import re word=[] dict={} f = open('C:/Python27/brown_half.txt', 'rU') w = open('C:

我试着匹配单词，包括字母“ab”或“ba”，例如“ab”lition，f“ab”rics，pro“ba”ble。我提出了以下正则表达式：

r"[Aa](?=[Bb])[Bb]|[Bb](?=[Aa])[Aa]"

但它包括以“，（，），/…非字母数字字符开头或结尾的单词。我如何删除它？我只想匹配单词列表

import sys
import re

word=[]

dict={}

f = open('C:/Python27/brown_half.txt', 'rU')
w = open('C:/Python27/brown_halfout.txt', 'w')

data = f.read()
word = data.split() # word is list

f.close()

for num2 in word:
    match2 = re.findall("\w*(ab|ba)\w*", num2)
    if match2:
        dict[num2] = (dict[num2] + 1) if num2 in dict.keys() else 1

for key2 in sorted(dict.iterkeys()):print "%s: %s" % (key2, dict[key2])
print len(dict.keys())

在这里，我不知道如何将它与第一条评论中提到的“re.compile~~~”方法混为一谈……

将所有单词与ab或ba（不区分大小写）匹配：

string = "your string here"
lowercase = string.lower()
if 'ab' in lowercase or 'ba' in lowercase:
    print(true)
else:
    print(false)

在这种情况下，正则表达式不是最好的工具。对于这种简单的情况，正则表达式会使内容变得太复杂。您可以使用Python的内置

in

操作符（适用于Python 2和3）

如您所见，如果字符串

'ab'

按原样找到，则word中的

'ab'的计算结果为True
在word
中，或False
中，否则。例如'ba'在'probable'==True
中，以及'ab'在'discreation'==False
中。第二行用于将句子拆分为单词并去掉任何标点符号。word=word.lower（）
在比较之前将word
设置为小写，因此对于word='discreation'
，'ab'在word==True
中尝试此选项
[(),/]*([a-z]|(ba|ab))+[(),/]*

我会这样做：
使用以下两种方法去除不需要的字符
技术，您的选择：
a-通过建立翻译词典并使用翻译
方法：
>>> import string
>>> del_punc = dict.fromkeys(ord(c) for c in string.punctuation)
s = 'abolition, fabrics, probable, test, case, bank;, halfback 1(ablution).'
>>> s = s.translate(del_punc)
>>> print(s)
'abolition fabrics probable test case bank halfback 1ablution'

>>> import string
>>> import re
>>> s = 'abolition, fabrics, probable, test, case, bank;, halfback 1(ablution).'
>>> s = re.sub(r'[%s]'%string.punctuation, '', s)
>>> print(s)
'abolition fabrics probable test case bank halfback 1ablution'

>>> pat
re.compile('\\b.*?(ab|ba).*?\\b', re.IGNORECASE)
>>> for m in pat.finditer(s):
        print(m.group())


abolition
fabrics
probable
test case bank
halfback
1ablution

b-使用re.sub
方法：
>>> import string
>>> del_punc = dict.fromkeys(ord(c) for c in string.punctuation)
s = 'abolition, fabrics, probable, test, case, bank;, halfback 1(ablution).'
>>> s = s.translate(del_punc)
>>> print(s)
'abolition fabrics probable test case bank halfback 1ablution'

>>> import string
>>> import re
>>> s = 'abolition, fabrics, probable, test, case, bank;, halfback 1(ablution).'
>>> s = re.sub(r'[%s]'%string.punctuation, '', s)
>>> print(s)
'abolition fabrics probable test case bank halfback 1ablution'

>>> pat
re.compile('\\b.*?(ab|ba).*?\\b', re.IGNORECASE)
>>> for m in pat.finditer(s):
        print(m.group())


abolition
fabrics
probable
test case bank
halfback
1ablution


接下来将查找包含“ab”或“ba”的单词：
a-拆分空白并查找所需字符串的匹配项，这是我向您推荐的：
>>> [x for x in s.split() if 'ab' in x.lower() or 'ba' in x.lower()]
['abolition', 'fabrics', 'probable', 'bank', 'halfback', '1ablution']

b-使用re.finditer
方法：
>>> import string
>>> del_punc = dict.fromkeys(ord(c) for c in string.punctuation)
s = 'abolition, fabrics, probable, test, case, bank;, halfback 1(ablution).'
>>> s = s.translate(del_punc)
>>> print(s)
'abolition fabrics probable test case bank halfback 1ablution'

>>> import string
>>> import re
>>> s = 'abolition, fabrics, probable, test, case, bank;, halfback 1(ablution).'
>>> s = re.sub(r'[%s]'%string.punctuation, '', s)
>>> print(s)
'abolition fabrics probable test case bank halfback 1ablution'

>>> pat
re.compile('\\b.*?(ab|ba).*?\\b', re.IGNORECASE)
>>> for m in pat.finditer(s):
        print(m.group())


abolition
fabrics
probable
test case bank
halfback
1ablution


老师们应该停止说正则表达式可以解决人类已知的每一个问题…@KemyLand:这应该是公认的答案：）你的单词是一个字符列表，可能你想在你的理解列表句子中使用。split（）
取而代之？@IronFist:我在发帖前测试了代码，但在写答案时忘记了。谢谢注意！这个不区分大小写还是区分大小写？…它不匹配'Ablotion'！…要使它不区分大小写，请添加re.IGNORECASE
标志。我尝试了你的一个，但它仍然像这样匹配。仍然包括标点符号和sp特殊字符。e.x.“放弃：1”不可或缺：1“可能：1”无法：1（中卫：1 2-baser，：1银行：1银行家：2银行家，：1银行家：有没有办法打印第1组，在重新搜索方法中？重新搜索将只返回第一个结果。这是您想要的吗？