Nlp 如何使用空格逻辑地分割句子?

Nlp 如何使用空格逻辑地分割句子?,nlp,spacy,Nlp,Spacy,我对Spacy是个新手,我试着从逻辑上分割一个句子,这样我就可以分别处理每个部分。e、 g "If the country selected is 'US', then the zip code should be numeric" 这需要分为: If the country selected is 'US', then the zip code should be numeric If the country selected is 'US', then the zip code shou

我对Spacy是个新手,我试着从逻辑上分割一个句子,这样我就可以分别处理每个部分。e、 g

"If the country selected is 'US', then the zip code should be numeric"

If the country selected is 'US',
then the zip code should be numeric
If the country selected is 'US',
then the zip code should be numeric

The allowed states are NY, NJ and CT




#Importing spacy and Matcher to merge matched patterns
import spacy
from spacy.matcher import Matcher
nlp = spacy.load('en')

#Defining pattern i.e any text surrounded with '' should be merged into single token
matcher = Matcher(nlp.vocab)
pattern = [{'ORTH': "'"},
           {'IS_ALPHA': True},
           {'ORTH': "'"}]

#Adding pattern to the matcher
matcher.add('special_merger', None, pattern)

#Method to merge matched patterns
def special_merger(doc):
    matched_spans = []
    matches = matcher(doc)
    for match_id, start, end in matches:
        span = doc[start:end]
    for span in matched_spans:
    return doc

#To determine whether a token can be start of the sentence.
def should_sentence_start(doc):
    for token in doc:
        if should_be_sentence_start(token):
            token.is_sent_start = True
    return doc

#Defining rule such that, if previous toke is "," and previous to previous token is "'US'"
#Then current token should be start of the sentence.
def should_be_sentence_start(token):
    if token.i >= 2 and token.nbor(-1).text == "," and token.nbor(-2).text == "'US'"  :
        return True
        return False

#Adding matcher and sentence tokenizing to nlp pipeline.
nlp.add_pipe(special_merger, first=True)
nlp.add_pipe(should_sentence_start, before='parser')

#Applying NLP on requried text
sent_texts = "If the country selected is 'US', then the zip code should be numeric"
doc = nlp(sent_texts)
for sent in doc.sents: