Python 如何将对象从主模块传递到另一个模块_Python_Spacy

Python 如何将对象从主模块传递到另一个模块

python

Python 如何将对象从主模块传递到另一个模块,python,spacy,Python,Spacy,下面的代码运行时没有错误 import spacy from spacy.matcher import PhraseMatcher nlp = spacy.load('en_core_web_lg') test = nlp(' FWCA rate of pay') phrase_pattern = [r'Rate of Pay'] pattern_name = 'RATES' patterns = [nlp.make_doc(name) for name in phrase_pattern]

下面的代码运行时没有错误

import spacy
from spacy.matcher import PhraseMatcher
nlp = spacy.load('en_core_web_lg')

test = nlp(' FWCA rate of pay')

phrase_pattern = [r'Rate of Pay']
pattern_name = 'RATES'
patterns = [nlp.make_doc(name) for name in phrase_pattern]

matcher = PhraseMatcher(nlp.vocab, attr='LOWER')
matcher.add(pattern_name, None, *patterns)

matches = matcher(test)
for match_id, start, end in matches:
    matched_span = test[start:end]
    print(matched_span.text)     
    print('- ', matched_span.sent.text)

# Returned:
rate of pay
-   FWCA rate of pay

然后，我将部分代码移动到一个单独的模块中，以便在另一个项目中使用它

# my_module.py

def find_matches(pattern_name, phrase_pattern, doc, attr="LOWER"):
    import spacy
    from spacy.matcher import PhraseMatcher
    nlp = spacy.load('en_core_web_lg')

    patterns = [nlp.make_doc(name) for name in phrase_pattern]
    matcher = PhraseMatcher(nlp.vocab, attr='LOWER')
    matcher.add(pattern_name, None, *patterns)

    matches = matcher(doc)
    for match_id, start, end in matches:
        matched_span = doc[start:end]
        print(matched_span.text)     
        print('- ', matched_span.sent.text)

但是当我运行这个代码时，我得到了一个错误

import spacy
from spacy.matcher import PhraseMatcher
from my_module import find_matches

nlp1 = spacy.load('en_core_web_lg')
test = nlp1(' FWCA rate of pay')

find_matches(pattern_name, phrase_pattern, test, attr="LOWER")

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-63-1bc18aa51d61> in <module>()
     10 matcher.add(pattern_name, None, *patterns)
     11 
---> 12 matches = matcher(test)
     13 for match_id, start, end in matches:
     14     matched_span = test[start:end]

phrasematcher.pyx in spacy.matcher.phrasematcher.PhraseMatcher.__call__()

phrasematcher.pyx in spacy.matcher.phrasematcher.PhraseMatcher.get_lex_value()

strings.pyx in spacy.strings.StringStore.__getitem__()

KeyError: "[E018] Can't retrieve string for hash '12488114723688465754'."

效率高吗？

回答您的问题：

1如何将nlp1对象传递到函数find_matches中，也就是说，我不会在函数中声明nlp=spacy.load'en_core\u web_lg'？可能吗

您可以将nlp作为属性传递，因为它是一个对象。当您执行spacy.load时，您正在使用嵌入、配置和机器学习模型构建一个管道对象，这些模型作为参数传递，例如en_core_web_lg

2如果函数不能固有nlp1对象，如何克服该问题

您可以将对象作为参数传递，如上所述。然而，为什么这会成为一个问题？事实上，如果您正在为部署做一些产品，我建议您将nlp作为初始化时实例化的类变量

除非您使用不同的管道，否则没有理由多次加载spacy，特别是考虑到这是一个从磁盘读取的缓慢过程

最后：

find_matches(pattern_name, phrase_pattern, text, attr="LOWER"):
    doc = nlp(text)
    ....
    ....

效率高吗

是的，这是一种非常有效的方法。当您使doc=nlptext时，您正在使用这个nlp全局对象来生成管道处理的结果，这对于每个文本都是单独的，因为它携带文本标记、跨距等

-补充后，作者自己的解决方案-

另一个有效的解决方案是将doc对象作为引用传递，特别是当它将由函数式编程风格中的多个不同函数使用时。此文档对象包含处理文本的所有相关结果。

为什么要使用spacy.load。。。两次？因为在本例中，一个用于my_模块，一个用于main，Jupyter笔记本，否则会导致错误nlp未定义Hanks，Tiago。昨晚我在睡前发布了这个问题，到了早上，根据你的建议，我有了相同的想法，那就是在find_比赛中通过nlp作为论点。因此，我认为在函数中传递对象文档而不是文本会更有效，因为全局对象文档在管道的其他前面步骤中使用。

find_matches(pattern_name, phrase_pattern, text, attr="LOWER"):
    doc = nlp(text)
    ....
    ....