Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/298.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python-regex关系抽取_Python_Regex_Nltk - Fatal编程技术网

Python-regex关系抽取

Python-regex关系抽取,python,regex,nltk,Python,Regex,Nltk,作为学校作业的一部分,我们收到了以下代码: >>> IN = re.compile(r'.*\bin\b(?!\b.+ing)') >>> for doc in nltk.corpus.ieer.parsed_docs('NYT_19980315'): ... for rel in nltk.sem.extract_rels('ORG', 'LOC', doc, ... corpus='ieer', pattern = I

作为学校作业的一部分,我们收到了以下代码:

>>> IN = re.compile(r'.*\bin\b(?!\b.+ing)')
>>> for doc in nltk.corpus.ieer.parsed_docs('NYT_19980315'):
...     for rel in nltk.sem.extract_rels('ORG', 'LOC', doc,
...                corpus='ieer', pattern = IN):
...         print(nltk.sem.rtuple(rel))
>>> from extract import extract
>>> extract("The Whitehouse in Washington")
我们被要求用我们自己的一些句子来试一下,以查看输出,因此为此,我决定定义一个函数:

def extract(sentence):
    import re
    import nltk

    IN = re.compile(r'.*\bin\b(?!\b.+ing)')
    for rel in nltk.sem.extract_rels('ORG', 'LOC', sentence, corpus='ieer', pattern = IN):
        print(nltk.sem.rtuple(rel))
当我尝试运行此代码时:

>>> IN = re.compile(r'.*\bin\b(?!\b.+ing)')
>>> for doc in nltk.corpus.ieer.parsed_docs('NYT_19980315'):
...     for rel in nltk.sem.extract_rels('ORG', 'LOC', doc,
...                corpus='ieer', pattern = IN):
...         print(nltk.sem.rtuple(rel))
>>> from extract import extract
>>> extract("The Whitehouse in Washington")
我得到gollowing错误:

Traceback (most recent call last):
  File "<pyshell#1>", line 1, in <module>
    extract("The Whitehouse in Washington")
  File "C:/Python34/My Scripts\extract.py", line 6, in extract
    for rel in nltk.sem.extract_rels('ORG', 'LOC', sentence, corpus='ieer', pattern = IN):
  File "C:\Python34\lib\site-packages\nltk\sem\relextract.py", line 216, in extract_rels
    pairs = tree2semi_rel(doc.text) + tree2semi_rel(doc.headline)
AttributeError: 'str' object has no attribute 'text'

如果您看到的是的方法定义,它希望解析后的文档作为第三个参数。
这是你的判决。要克服此错误,可以执行以下操作:

tagged_sentences = [ nltk.pos_tag(token) for token in tokens]
class doc():
    pass
IN = re.compile(r'.*\bin\b(?!\b.+ing)')
doc.headline=["test headline for sentence"]
for i,sent in enumerate(tagged_sentences):
    doc.text = nltk.ne_chunk(sent)
    for rel in nltk.sem.relextract.extract_rels('ORG', 'LOC', doc, corpus='ieer', pattern=IN):
        print(nltk.sem.rtuple(rel) )// you can change it according

试试看

nltk.corpus.ieer.parsed_docs('NYT_19980315;')
返回什么?应该是吗?它看起来(基于您的代码)像是返回一个带有
text
属性的对象列表,而您只是在使用字符串。你能创建那些对象吗(不管它们是什么)?@KSFT>>c=nltk.corpus.ieer.parsed_docs('NYT_19980315;')>>c[1]returns=除此之外,我和你一样清楚,这就是我们得到的所有信息。对于nltk.corpus.ieer.parsed_docs('NYT_19980315;'):行中的文档,我们的
提取
函数遗漏了
。这可能是问题吗?@Stribizev我相信line是从语料库中提取大量文档(字符串),然后检查这些字符串以找到关系。但是,我希望使用自己的自定义字符串,而不是使用语料库中的字符串。