Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/324.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 陈词滥调_Python_Regex_Nltk_Spacy - Fatal编程技术网

Python 陈词滥调

Python 陈词滥调,python,regex,nltk,spacy,Python,Regex,Nltk,Spacy,我试图在pandas dataframe中保存一个陈词滥调列表,并希望通过一个文本文件来运行它并查找extact匹配项。可以使用spaCy吗 熊猫的样本清单 Abandon ship About face Above board All ears 例句 This is a sample sentence containing a cliche abandon ship. He was all ears for the problem. 预期产出: abandon ship all ears

我试图在pandas dataframe中保存一个陈词滥调列表,并希望通过一个文本文件来运行它并查找extact匹配项。可以使用spaCy吗

熊猫的样本清单

Abandon ship
About face
Above board
All ears
例句

This is a sample sentence containing a cliche abandon ship. He was all ears for the problem.
预期产出:

abandon ship
all ears
它必须考虑列表和句子之间的大小写敏感性

目前,我正在使用这种方法来实现单词匹配


您正在寻找Spacy's,您可以了解更多。它可以为您找到任意长/复杂的令牌序列,并且您可以轻松地将其并行化(请参阅pipe()的matcher文档)。默认情况下,它返回文本中匹配项的位置,尽管您可以对找到的标记执行任何操作,也可以在匹配上添加
回调函数

也就是说,我认为您的用例相当简单。我提供了一个示例,让您开始学习

import spacy
from spacy.matcher import Matcher

nlp = spacy.load('en')

cliches = ['Abandon ship',
'About face',
'Above board',
'All ears']

cliche_patterns = [[{'LOWER':token.text.lower()} for token in nlp(cliche)] for cliche in cliches]

matcher = Matcher(nlp.vocab)
for counter, pattern in enumerate(cliche_patterns):
    matcher.add("Cliche "+str(counter), None, pattern)

example_1 = nlp("Turn about face!")
example_2 = nlp("We must abandon ship! It's the only way to stay above board.")

matches_1 = matcher(example_1)
matches_2 = matcher(example_2)

for match in matches_1:
    print(example_1[match[1]:match[2]])

print("--------")
for match in matches_2:
    print(example_2[match[1]:match[2]])

>>> about face
>>> --------
>>> abandon ship
>>> above board

只需确保您拥有Spacy(2.0.0+)的最新版本,因为matcher API最近发生了变化

您正在寻找Spacy's,您可以阅读更多有关它的信息。它可以为您找到任意长/复杂的令牌序列,并且您可以轻松地将其并行化(请参阅pipe()的matcher文档)。默认情况下,它返回文本中匹配项的位置,尽管您可以对找到的标记执行任何操作,也可以在匹配上添加
回调函数

也就是说,我认为您的用例相当简单。我提供了一个示例,让您开始学习

import spacy
from spacy.matcher import Matcher

nlp = spacy.load('en')

cliches = ['Abandon ship',
'About face',
'Above board',
'All ears']

cliche_patterns = [[{'LOWER':token.text.lower()} for token in nlp(cliche)] for cliche in cliches]

matcher = Matcher(nlp.vocab)
for counter, pattern in enumerate(cliche_patterns):
    matcher.add("Cliche "+str(counter), None, pattern)

example_1 = nlp("Turn about face!")
example_2 = nlp("We must abandon ship! It's the only way to stay above board.")

matches_1 = matcher(example_1)
matches_2 = matcher(example_2)

for match in matches_1:
    print(example_1[match[1]:match[2]])

print("--------")
for match in matches_2:
    print(example_2[match[1]:match[2]])

>>> about face
>>> --------
>>> abandon ship
>>> above board
只需确保您拥有Spacy(2.0.0+)的最新版本,因为matcher API最近发生了变化