
使用包含缩写的正则表达式拆分Python中的段落,python,regex,split,Python,Regex,Split,尝试在包含3个字符串和缩写的段落上使用此函数 #!/usr/bin/env python # -*- coding: UTF-8 -*- def splitParagraphIntoSentences(paragraph): ''' break a paragraph into sentences and return a list ''' import re # to split by multile characters # regul


#!/usr/bin/env python
# -*- coding: UTF-8 -*-

def splitParagraphIntoSentences(paragraph):
    ''' break a paragraph into sentences
        and return a list '''
    import re
    # to split by multile characters

    #   regular expressions are easiest (and fastest)
    sentenceEnders = re.compile('[.!?][\s]{1,2}[A-Z]')
    sentenceList = sentenceEnders.split(paragraph)
    return sentenceList

if __name__ == '__main__':
    p = "While other species (e.g. horse mango, M. foetida) are also grown ,Mangifera indica – the common mango or Indian mango – is the only mango tree. Commonly cultivated in many tropical and subtropical regions, and its fruit is distributed essentially worldwide.In several cultures, its fruit and leaves are ritually used as floral decorations at weddings, public celebrations and religious "

    sentences = splitParagraphIntoSentences(p)
    for s in sentences:
        print s.strip()

O/p Recieved: While other Mangifera species (e.g. horse mango, M. foetida) are also grown on a more localized basis, Mangifera indica ΓÇô the common mango or Indian mango ΓÇô is the only mango tree ommonly cultivated in many tropical and subtropical regions, and its fruit is di stributed essentially worldwide.In several cultures, its fruit and leaves are ri tually used as floral decorations at weddings, public celebrations and religious. 因此,甚至缩写也会被拆分。



