Python 选择包含选定单词的句子

Python 选择包含选定单词的句子,python,nltk,Python,Nltk,假设我有一段话: text = '''Darwin published his theory of evolution with compelling evidence in his 1859 book On the Origin of Species, overcoming scientific rejection of earlier concepts of transmutation of species.[4][5] By the 1870s the scientific communi


text = '''Darwin published his theory of evolution with compelling evidence in his 1859 book On the Origin of Species, overcoming scientific rejection of earlier concepts of transmutation of species.[4][5] By the 1870s the scientific community and much of the general public had accepted evolution as a fact. However, many favoured competing explanations and it was not until the emergence of the modern evolutionary synthesis from the 1930s to the 1950s that a broad consensus developed in which natural selection was the basic mechanism of evolution.[6][7] In modified form, Darwin's scientific discovery is the unifying theory of the life sciences, explaining the diversity of life.[8][9]'''
如果说我输入了一个单词(Favored),那么如何删除该单词所在的整个句子。 我以前用的方法很乏味;我会使用sent_tokenize来打断段落(超过13000个单词),因为我必须检查1000多个单词,所以我会运行一个循环来检查每个句子中的每个单词。这需要很多时间,因为有超过400个句子



text = 'whatever....'
sentences = text.split('.')
good_sentences = [e for e in sentences if 'my_word' not in e]



def remove_sentence(input, word):
    return ".".join((sentence for sentence in input.split(".")
                    if word not in sentence))

>>>> remove_sentence(text, "published")
"[4][5] By the 1870s the scientific community and much of the general public had accepted evolution as a fact. However, many favoured competing explanations and it was not until the emergence of the modern evolutionary synthesis from the 1930s to the 1950s that a broad consensus developed in which natural selection was the basic mechanism of evolution.[6][7] In modified form, Darwin's scientific discovery is the unifying theory of the life sciences, explaining the diversity of life.[8][9]"
>>> remove_sentence(text, "favoured")
"Darwin published his theory of evolution with compelling evidence in his 1859 book On the Origin of Species, overcoming scientific rejection of earlier concepts of transmutation of species.[4][5] By the 1870s the scientific community and much of the general public had accepted evolution as a fact.[6][7] In modified form, Darwin's scientific discovery is the unifying theory of the life sciences, explaining the diversity of life.[8][9]"


import re

SENTENCES = ('This is a sentence.',
             'Hello, world!',
             'Where do you want to go today?',
             'The apple does not fall far from the tree.',
             'Sally sells sea shells by the sea shore.',
             'The Jungle Book has several stories in it.',
             'Have you ever been up to the moon?',
             'Thank you for helping with my problem!')

BAD_WORDS = frozenset(map(str.lower, ('to', 'sea')))

def main():
    for index, sentence in enumerate(SENTENCES):
        if frozenset(words(sentence.lower())) & BAD_WORDS:
            print('Delete:', repr(sentence))

words = lambda sentence: ( for m in re.finditer('\w+', sentence))

if __name__ == '__main__':
  • 你从你想过滤的句子和你想查找的单词开始
  • 你将每个句子的一组单词与你正在寻找的一组单词进行比较
  • 如果有交叉点,你看到的句子就是你要删除的句子
  • 输出
    Delete: 'Where do you want to go today?'
    Delete: 'Sally sells sea shells by the sea shore.'
    Delete: 'Have you ever been up to the moon?'