如何在python中使用正则表达式从文件中提取模式
我有一个如下所示的输入文件,需要提取以nsub、rcmod、ccomp、acomp开头的单词模式,并打印在两个输出文件中,如下所示,我是python新手,这里不知道如何使用regex 输入文件如何在python中使用正则表达式从文件中提取模式,python,regex,Python,Regex,我有一个如下所示的输入文件,需要提取以nsub、rcmod、ccomp、acomp开头的单词模式,并打印在两个输出文件中,如下所示,我是python新手,这里不知道如何使用regex 输入文件 nsubj(believe-4, i-1) aux(believe-4, ca-2) neg(believe-4, n't-3) root(ROOT-0, believe-4) acomp(believe-4, @mistamau-5) aux(know-8, does-6) neg(know-8, n'
nsubj(believe-4, i-1)
aux(believe-4, ca-2)
neg(believe-4, n't-3)
root(ROOT-0, believe-4)
acomp(believe-4, @mistamau-5)
aux(know-8, does-6)
neg(know-8, n't-7)
ccomp(@mistamau-5, know-8)
dobj(is-12, who-9)
amod(tatum-11, channing-10)
nsubj(is-12, tatum-11)
ccomp(know-8, is-12)
root(ROOT-0, What-1)
cop(What-1, is-2)
amod(people-4, worse-3)
xsubj(hear-9, I-5)
aux(talking-7, am-6)
rcmod(people-4, talking-7)
xcomp(talking-7, hear-9)
dobj(hear-9, me-10)
advmod(poorly-12, very-11)
输出文件_1
nsubj(believe-4, i-1)
nsubj(is-12, tatum-11)
acomp(believe-4, @mistamau-5)
rcmod(people-4, talking-7)
ccomp(know-8, is-12)
ccomp(@mistamau-5, know-8)
输出文件2
believe, i
is, tatum
believe, @mistamau
people, talking
know, is
@mistamau, know
这里有一个程序,它从stdin中提取单词并打印“匹配”或“不匹配”,这取决于单词是以“Big”还是“Daddy”开头
import re
import sys
prog = re.compile('((Big)|(Daddy))[a-z]*')
while True:
line = sys.stdin.readline()
if not line: break
if prog.match(line):
print 'matched'
else:
print 'not matched'
只需将正则表达式模式替换为您自己的模式以及来自文件的输入,而不是中的标准模式,就可以设置~
regex = re.compile(r"""
^ # Start of line (re.M modifier set!)
( # Start of capturing group 1:
(?:nsubj|rcmod|ccomp|acomp) # Match one of these
\( # Match (
([^-]*) # Match and capture in group 2 any no. of non-dash characters
-\d+,[ ] # Match a dash and a number, a comma and a space
([^-]*) # Match and capture in group 3 any no. of non-dash characters
-\d+ # Match a dash and a number
\) # Match )
) # End of group 1""", re.M|re.X)
如果我正确理解您的要求,应该可以工作
当应用于整个文件s=myfile.read时,将得到以下结果:
>>> regex.findall(s)
[('nsubj(believe-4, i-1)', 'believe', 'i'),
('acomp(believe-4, @mistamau-5)', 'believe', '@mistamau'),
('ccomp(@mistamau-5, know-8)', '@mistamau', 'know'),
('nsubj(is-12, tatum-11)', 'is', 'tatum'),
('ccomp(know-8, is-12)', 'know', 'is'),
('rcmod(people-4, talking-7)', 'people', 'talking')]
这个问题似乎离题了,因为这不是一个代码编写服务。