Python 在空间中寻找最长和最长的链_Python_Python 3.x_Text_Nlp_Spacy

Python 在空间中寻找最长和最长的链

python python-3.x text nlp

Python 在空间中寻找最长和最长的链,python,python-3.x,text,nlp,spacy,Python,Python 3.x,Text,Nlp,Spacy,如何匹配某些文本中可用的最长“和链” 例如，考虑 “论坛上有果酱、浆果和葡萄酒，还有面包、黄油、奶酪和牛奶，甚至还有巧克力和比萨！” 我怎样才能匹配 'jam and berry and wine' 及不知道“和”-分隔项的数量这就是我试过的 import spacy from spacy.matcher import Matcher nlp = spacy.load('en_core_web_sm') matcher = Matcher(nlp.vocab) pattern = [

如何匹配某些文本中可用的最长“和链”

例如，考虑

“论坛上有果酱、浆果和葡萄酒，还有面包、黄油、奶酪和牛奶，甚至还有巧克力和比萨！”

我怎样才能匹配

'jam and berry and wine'

及

不知道“和”-分隔项的数量

这就是我试过的

import spacy
from spacy.matcher import Matcher

nlp = spacy.load('en_core_web_sm')
matcher = Matcher(nlp.vocab)

pattern = [{'IS_ASCII': True}, {'LOWER': 'and'}, {'IS_ASCII': True}]
matcher.add("AND_PAT", None, pattern)

doc = nlp("The forum had jam and berry and wine along with bread and butter and cheese and milk, even chocolate and pista!")

for match_id, start, end in matcher(doc):
    print(doc[start: end].text)

但这并不是我需要的那种“懒惰”的匹配

我看了一下，它提到了制定规则的

OP

键，但这似乎只有在连续重复相同的标记时才有用

此外，匹配应该是贪婪的，不应该在找到可接受的模式后立即给出结果。在上面的示例中，期望的结果是而不是like（如在我的程序中）
但作为

jam and berry and wine
这是一个可能可以用正则表达式解决的问题，但我希望能用spaCy的规则匹配来解决。最好不要使用前面提到的
REGEX
操作符。
尝试以下操作：

l = [{t.nbor(-1).i, t.i, t.nbor().i} for t in doc if t.text == 'and'] bag = set().union(*l) #The * operator unpacks an argument list st = " ".join([t.text if t.i in bag else '\n' for t in doc]) result = [part.strip() for part in st.split('\n') if part.strip()] # result = ['jam and berry and wine', # 'bread and butter and cheese and milk', # 'chocolate and pista']
注意，这假设第一个和最后一个标记不是“和”标记。
尝试以下操作：

l = [{t.nbor(-1).i, t.i, t.nbor().i} for t in doc if t.text == 'and'] bag = set().union(*l) #The * operator unpacks an argument list st = " ".join([t.text if t.i in bag else '\n' for t in doc]) result = [part.strip() for part in st.split('\n') if part.strip()] # result = ['jam and berry and wine', # 'bread and butter and cheese and milk', # 'chocolate and pista']

注意，这假设第一个和最后一个标记不是“and”标记。
谢谢您的回答。尽管如此，我还是希望有一种不那么复杂、更直接的方式。。。。谢谢你的回答。尽管如此，我还是希望有一种不那么复杂、更直接的方式。。。。哦，好吧。。
l = [{t.nbor(-1).i, t.i, t.nbor().i} for t in doc if t.text == 'and'] bag = set().union(*l) #The * operator unpacks an argument list st = " ".join([t.text if t.i in bag else '\n' for t in doc]) result = [part.strip() for part in st.split('\n') if part.strip()] # result = ['jam and berry and wine', # 'bread and butter and cheese and milk', # 'chocolate and pista']