如何在python中使用正则表达式从文件中提取模式_Python_Regex

如何在python中使用正则表达式从文件中提取模式

python regex

如何在python中使用正则表达式从文件中提取模式,python,regex,Python,Regex,我有一个如下所示的输入文件，需要提取以nsub、rcmod、ccomp、acomp开头的单词模式，并打印在两个输出文件中，如下所示，我是python新手，这里不知道如何使用regex 输入文件 nsubj(believe-4, i-1) aux(believe-4, ca-2) neg(believe-4, n't-3) root(ROOT-0, believe-4) acomp(believe-4, @mistamau-5) aux(know-8, does-6) neg(know-8, n'

我有一个如下所示的输入文件，需要提取以nsub、rcmod、ccomp、acomp开头的单词模式，并打印在两个输出文件中，如下所示，我是python新手，这里不知道如何使用regex

输入文件

nsubj(believe-4, i-1)
aux(believe-4, ca-2)
neg(believe-4, n't-3)
root(ROOT-0, believe-4)
acomp(believe-4, @mistamau-5)
aux(know-8, does-6)
neg(know-8, n't-7)
ccomp(@mistamau-5, know-8)
dobj(is-12, who-9)
amod(tatum-11, channing-10)
nsubj(is-12, tatum-11)
ccomp(know-8, is-12)
root(ROOT-0, What-1)
cop(What-1, is-2)
amod(people-4, worse-3)
xsubj(hear-9, I-5)
aux(talking-7, am-6)
rcmod(people-4, talking-7)
xcomp(talking-7, hear-9)
dobj(hear-9, me-10)
advmod(poorly-12, very-11)

输出文件_1

nsubj(believe-4, i-1)
nsubj(is-12, tatum-11)
acomp(believe-4, @mistamau-5)
rcmod(people-4, talking-7)
ccomp(know-8, is-12)
ccomp(@mistamau-5, know-8)

输出文件2

believe, i
is, tatum
believe, @mistamau
people, talking
know, is
@mistamau, know

这里有一个程序，它从stdin中提取单词并打印“匹配”或“不匹配”，这取决于单词是以“Big”还是“Daddy”开头

import re
import sys
prog = re.compile('((Big)|(Daddy))[a-z]*')
while True:
    line = sys.stdin.readline()
    if not line: break
    if prog.match(line):
        print 'matched'
    else:
        print 'not matched'

只需将正则表达式模式替换为您自己的模式以及来自文件的输入，而不是中的标准模式，就可以设置~

regex = re.compile(r"""
    ^          # Start of line (re.M modifier set!)
    (          # Start of capturing group 1:
     (?:nsubj|rcmod|ccomp|acomp) # Match one of these
     \(        # Match (
     ([^-]*)   # Match and capture in group 2 any no. of non-dash characters
     -\d+,[ ]  # Match a dash and a number, a comma and a space
     ([^-]*)   # Match and capture in group 3 any no. of non-dash characters
     -\d+      # Match a dash and a number
     \)        # Match )
    )          # End of group 1""", re.M|re.X)

如果我正确理解您的要求，应该可以工作

当应用于整个文件s=myfile.read时，将得到以下结果：

>>> regex.findall(s)
[('nsubj(believe-4, i-1)', 'believe', 'i'), 
 ('acomp(believe-4, @mistamau-5)', 'believe', '@mistamau'), 
 ('ccomp(@mistamau-5, know-8)', '@mistamau', 'know'), 
 ('nsubj(is-12, tatum-11)', 'is', 'tatum'), 
 ('ccomp(know-8, is-12)', 'know', 'is'), 
 ('rcmod(people-4, talking-7)', 'people', 'talking')]

这个问题似乎离题了，因为这不是一个代码编写服务。