Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/templates/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python SpaCy:仅匹配多个模式的第一个实例_Python_Spacy - Fatal编程技术网

Python SpaCy:仅匹配多个模式的第一个实例

Python SpaCy:仅匹配多个模式的第一个实例,python,spacy,Python,Spacy,我想使用以下代码查找iphonex或iphone的模式: import spacy from spacy.matcher import Matcher TEXTS = ['How to preorder the iPhone X', 'iPhone X is coming', 'Should I pay $1,000 for the iPhone X?', 'The iPhone 8 reviews are here', 'Your iPhone goes up to 11 today', '

我想使用以下代码查找
iphonex
iphone
的模式:

import spacy
from spacy.matcher import Matcher

TEXTS = ['How to preorder the iPhone X', 'iPhone X is coming', 'Should I pay $1,000 for the iPhone X?', 'The iPhone 8 reviews are here', 'Your iPhone goes up to 11 today', 'I need a new phone! Any tips?']

# Create pattern to match 'iphone' and 'x', or 'iphone' and optional number
pattern1 = [{'LOWER': 'iphone'}, {'LOWER': 'x'}]
pattern2 = [{'LOWER': 'iphone'}, {'IS_DIGIT': True, 'OP': '?'}]

# Add patterns to the matcher
nlp = spacy.load('en')
matcher = Matcher(nlp.vocab)
matcher.add('GADGET', None, pattern1, pattern2)

TRAINING_DATA = []

for doc in nlp.pipe(TEXTS):
    # Match on the doc and create a list of matched spans
    spans = [doc[start:end] for match_id, start, end in matcher(doc)]
    # Get (start character, end character, label) tuples of matches
    entities = [(span.start_char, span.end_char, 'GADGET') for span in spans]    
    # Format the matches as a (doc.text, entities) tuple
    training_example = (doc.text, {'entities': entities})
    # Append the example to the training data
    TRAINING_DATA.append(training_example)

print(*TRAINING_DATA, sep='\n')  
这些产出是:

('How to preorder the iPhone X', {'entities': [(20, 28, 'GADGET'), (20, 26, 'GADGET')]})
('iPhone X is coming', {'entities': [(0, 8, 'GADGET'), (0, 6, 'GADGET')]})
('Should I pay $1,000 for the iPhone X?', {'entities': [(28, 36, 'GADGET'), (28, 34, 'GADGET')]})
('The iPhone 8 reviews are here', {'entities': [(4, 12, 'GADGET')]})
('Your iPhone goes up to 11 today', {'entities': [(5, 11, 'GADGET')]})
('I need a new phone! Any tips?', {'entities': []})
你能告诉我如何修改模式,以便我获得这个结果吗

('How to preorder the iPhone X', {'entities': [(20, 28, 'GADGET')]})
('iPhone X is coming', {'entities': [(0, 8, 'GADGET')]})
('Should I pay $1,000 for the iPhone X?', {'entities': [(28, 36, 'GADGET')]})
('The iPhone 8 reviews are here', {'entities': [(4, 12, 'GADGET')]})
('Your iPhone goes up to 11 today', {'entities': [(5, 11, 'GADGET')]})
('I need a new phone! Any tips?', {'entities': []})

提前感谢。

解决方案是仅从字典
实体的值列表中提取第一项。也就是说,循环应该是:

TRAINING_DATA = []

for doc in nlp.pipe(TEXTS):
    # Match on the doc and create a list of matched spans
    spans = [doc[start:end] for match_id, start, end in matcher(doc)]
    # Get (start character, end character, label) tuples of matches
    entities = [(span.start_char, span.end_char, 'GADGET') for span in spans]    
    # Format the matches as a (doc.text, entities) tuple
    training_example = (doc.text, {'entities': entities})
    # Append the example to the training data
    if len(entities) > 1:
        TRAINING_DATA.append((training_example[0], {'entities':entities[0]}))  
    else:
        TRAINING_DATA.append(training_example)

删除了我的答案,它不完整:我一直在尝试安装spacey进行测试,但安装了很多其他依赖项,这有点烦人。如果我是你,我会试着把所有的东西组合成一个模式。我想这可能是你想要的,但无法测试Yet考虑到这一点谢谢,我明天早上(这里几乎是午夜)第一件事就是测试你的想法倒数第二个值不应该也是一个空匹配吗?11不是一个数字,它是一个数字!我尝试了各种组合,比如
[{'LOWER':'iphone'},{'LOWER':'x'},{{'LOWER':'iphone','IS_DIGIT':True,'OP':'?'}]
[{'LOWER':'iphone'},{'LOWER x':'OP':'x',{'iphone IS_DIGIT':True,'OP':'IS(DIGIT':True,{/code>或
[{'LOWER LOWER LOWER OP':'iphone OP':'s':'s','
但是没有一个给我想要的输出