Python 选择包含选定单词的句子

Python 选择包含选定单词的句子,python,nltk,Python,Nltk,假设我有一段话: text = '''Darwin published his theory of evolution with compelling evidence in his 1859 book On the Origin of Species, overcoming scientific rejection of earlier concepts of transmutation of species.[4][5] By the 1870s the scientific communi

假设我有一段话:

text = '''Darwin published his theory of evolution with compelling evidence in his 1859 book On the Origin of Species, overcoming scientific rejection of earlier concepts of transmutation of species.[4][5] By the 1870s the scientific community and much of the general public had accepted evolution as a fact. However, many favoured competing explanations and it was not until the emergence of the modern evolutionary synthesis from the 1930s to the 1950s that a broad consensus developed in which natural selection was the basic mechanism of evolution.[6][7] In modified form, Darwin's scientific discovery is the unifying theory of the life sciences, explaining the diversity of life.[8][9]'''
如果说我输入了一个单词(Favored),那么如何删除该单词所在的整个句子。 我以前用的方法很乏味;我会使用sent_tokenize来打断段落(超过13000个单词),因为我必须检查1000多个单词,所以我会运行一个循环来检查每个句子中的每个单词。这需要很多时间,因为有超过400个句子


相反,我想检查段落中的1000个单词,当找到这个单词时,它会选择前面的所有单词直到句号,后面的所有单词直到句号

我不确定是否理解您的问题,但您可以做如下操作:

text = 'whatever....'
sentences = text.split('.')
good_sentences = [e for e in sentences if 'my_word' not in e]

这就是你要找的吗?

这将删除所有包含单词的句子(以
为界)

def remove_sentence(input, word):
    return ".".join((sentence for sentence in input.split(".")
                    if word not in sentence))

>>>> remove_sentence(text, "published")
"[4][5] By the 1870s the scientific community and much of the general public had accepted evolution as a fact. However, many favoured competing explanations and it was not until the emergence of the modern evolutionary synthesis from the 1930s to the 1950s that a broad consensus developed in which natural selection was the basic mechanism of evolution.[6][7] In modified form, Darwin's scientific discovery is the unifying theory of the life sciences, explaining the diversity of life.[8][9]"
>>>
>>> remove_sentence(text, "favoured")
"Darwin published his theory of evolution with compelling evidence in his 1859 book On the Origin of Species, overcoming scientific rejection of earlier concepts of transmutation of species.[4][5] By the 1870s the scientific community and much of the general public had accepted evolution as a fact.[6][7] In modified form, Darwin's scientific discovery is the unifying theory of the life sciences, explaining the diversity of life.[8][9]"

您可能有兴趣尝试类似于以下程序的内容:

import re

SENTENCES = ('This is a sentence.',
             'Hello, world!',
             'Where do you want to go today?',
             'The apple does not fall far from the tree.',
             'Sally sells sea shells by the sea shore.',
             'The Jungle Book has several stories in it.',
             'Have you ever been up to the moon?',
             'Thank you for helping with my problem!')

BAD_WORDS = frozenset(map(str.lower, ('to', 'sea')))

def main():
    for index, sentence in enumerate(SENTENCES):
        if frozenset(words(sentence.lower())) & BAD_WORDS:
            print('Delete:', repr(sentence))

words = lambda sentence: (m.group() for m in re.finditer('\w+', sentence))

if __name__ == '__main__':
    main()
理由
  • 你从你想过滤的句子和你想查找的单词开始
  • 你将每个句子的一组单词与你正在寻找的一组单词进行比较
  • 如果有交叉点,你看到的句子就是你要删除的句子
  • 输出
    一个不以句号结尾的句子怎么样?您想删除单词所在的第一个句子,还是要删除单词所在的所有句子?是否使用NLTK?如果是这样的话,你应该把它作为一个标记添加…
    “”。join([文本中的句子对句子。split(“.”)如果“favored”不在句子中。)
    就像我的问题一样,我选择单词的字典大约有1000个单词,所以你的方法将花费永远。是的,我这样做了,但你会看到我的“whatever…”很大,我的单词很多。如果我有一个单词,1000个句子,那么我会重复1000次,但是如果我有10个单词,10个句子,那么我会重复10000次。这需要时间。你仍然需要通读你所有的句子。列表理解使它更快。或者尝试使用使其更快速。我的示例使用生成器。这可能比列表理解更快,因为它不需要分配大量内存。测试一下。是的,那可能会快一点,汉克斯。这是一个伟大的解决方案!
    Delete: 'Where do you want to go today?'
    Delete: 'Sally sells sea shells by the sea shore.'
    Delete: 'Have you ever been up to the moon?'