Python 如何检查列表中的一组连续条目?

Python 如何检查列表中的一组连续条目?,python,list,Python,List,我正在经历一个代码学院的挑战,我打算定义一个功能,将电子邮件与“负面词语和短语”列表进行比较,当列表中的两个词语/短语出现在电子邮件中后,其余出现的词语/短语应该被审查 这就是我所处的位置: #Censors a list of negative words after the second appearance of a word from the list def censor3(email): negative_words = [ 'concerned', 'beh

我正在经历一个代码学院的挑战,我打算定义一个功能,将电子邮件与“负面词语和短语”列表进行比较,当列表中的两个词语/短语出现在电子邮件中后,其余出现的词语/短语应该被审查

这就是我所处的位置:

#Censors a list of negative words after the second appearance of a word from the list
def censor3(email):
    negative_words = [
        'concerned', 'behind', 'dangerous',
        'alarming', 'alarmed', 'out of control',
        'help', 'unhappy', 'bad',
        'upset', 'awful', 'broken',
        'damage', 'damaging', 'dismal',
        'distressed', 'distressing', 'concerning',
        'horrible', 'horribly', 'questionable',
        'danger'
    ]
    ecensored3s = re.split('(\W)', email)
    negative_count = 0
    for i in range(len(ecensored3s)):
        word = ecensored3s[i]
        if word in negative_words or word.lower() in negative_words:
            negative_count += 1
            if negative_count > 2:
                ecensored3s[i] = 'REDACTED'
    ecensored3 = "".join(ecensored3s)
    return (ecensored3)
这适用于否定词中的所有单字,但不会找到“失控”之类的短语。有没有一种方法可以在每个单词都是自己条目的列表中检查短语

输入电子邮件示例:

Board of Investors,

Things have taken a concerning turn down in the Lab.  Helena (she has insisted on being called  Helena, we're unsure how she came to that moniker) is still 
progressing at a rapid rate. Every day we see new developments in her thought patterns, but recently those developments have been more alarming than 
exciting.

Let me give you one of the more distressing examples of this.  We had begun testing hypothetical humanitarian crises to observe how Helena determines 
best solutions. One scenario involved a famine plaguing an unresourced country.

Horribly, Helena quickly recommended a course of action involving culling more than 60% of the local population. When pressed on reasoning, she stated 
that this method would maximize "reduction in human suffering."

This dangerous line of thinking has led many of us to think that we must have taken some wrong turns when developing some of the initial learning 
algorithms. We are considering taking Helena offline for the time being before the situation can spiral out of control.

More updates soon,
Francine, Head Scientist

那是一个有趣的练习

要找到比一个单词更长的短语,不要使用
split
——你可以,但你必须跳转才能让它工作。使用
re.findall
,这样您就可以在每个短语前后添加分词(以防止在
Barbados
中匹配
bad
),并使用一个标志使搜索不区分大小写

这将为您提供出现的单词列表。您可以使用从开始到结束查找所有短语的确切位置。将它们存储在一个列表中(我还存储了用于调试的单词本身——您不需要它)。然后对
开始
项上的列表进行排序,丢弃前两个项以保留它们,并循环遍历此列表,将每个
开始结束
片段替换为
编辑的
。当您在此处更改实际文本时,需要从头到尾进行更改,否则所有后续短语的位置都将关闭

import re, pprint

text = '''
Board of Investors,

Things have taken a concerning turn down in the Lab.  Helena (she has insisted on being called  Helena, we're unsure how she came to that moniker) is still 
progressing at a rapid rate. Every day we see new developments in her thought patterns, but recently those developments have been more alarming than 
exciting.

Let me give you one of the more distressing examples of this.  We had begun testing hypothetical humanitarian crises to observe how Helena determines 
best solutions. One scenario involved a famine plaguing an unresourced country.

Horribly, Helena quickly recommended a course of action involving culling more than 60% of the local population. When pressed on reasoning, she stated 
that this method would maximize "reduction in human suffering."

This dangerous line of thinking has led many of us to think that we must have taken some wrong turns when developing some of the initial learning 
algorithms. We are considering taking Helena offline for the time being before the situation can spiral out of control.

More updates soon,
Francine, Head Scientist
'''

negative_phrases = [
    'concerned', 'behind', 'dangerous',
    'alarming', 'alarmed', 'out of control',
    'help', 'unhappy', 'bad',
    'upset', 'awful', 'broken',
    'damage', 'damaging', 'dismal',
    'distressed', 'distressing', 'concerning',
    'horrible', 'horribly', 'questionable',
    'danger'
]

# mark occurrences of all negative phrases
occurrences = []
for phrase in negative_phrases:
    search_result = [(m.group(0),m.start(0),m.end(0)) for m in re.finditer(r'\b'+phrase+r'\b', text, re.IGNORECASE)]
    if search_result:
        occurrences += search_result

occurrences.sort(key=lambda match: match[1])
pprint.pprint (occurrences)

# skip the first two
occurrences = occurrences[2:]

# remove the remaining phrases from the text -- this must be done in reverse!
for _,start,end in reversed(occurrences):
    text = text[:start]+'REDACTED'+text[end:]

print (text)
。。。结果似乎就是你想要的。前两个危险的词“关于”和“警告”被留下,其余的被[修订]:


[('concerning', 42, 52),
 ('alarming', 314, 322),
 ('distressing', 372, 383),
 ('Horribly', 572, 580),
 ('dangerous', 794, 803),
 ('out of control', 1040, 1054)]

Board of Investors,

Things have taken a concerning turn down in the Lab.  Helena (she has insisted on being called  Helena, we're unsure how she came to that moniker) is still 
progressing at a rapid rate. Every day we see new developments in her thought patterns, but recently those developments have been more alarming than 
exciting.

Let me give you one of the more REDACTED examples of this.  We had begun testing hypothetical humanitarian crises to observe how Helena determines 
best solutions. One scenario involved a famine plaguing an unresourced country.

REDACTED, Helena quickly recommended a course of action involving culling more than 60% of the local population. When pressed on reasoning, she stated 
that this method would maximize "reduction in human suffering."

This REDACTED line of thinking has led many of us to think that we must have taken some wrong turns when developing some of the initial learning 
algorithms. We are considering taking Helena offline for the time being before the situation can spiral REDACTED.

More updates soon,
Francine, Head Scientist


你能提供一个
电子邮件的例子吗?
?同样的原因,列表中的元素['out','of','control']都不等于字符串'outcontrol'。您正在将单词列表(
ecensered3s
)与可以由一个或多个单词组成的字符串列表(
negative\u words
)进行比较。为什么要拆分字符串并对其进行迭代?为什么不迭代所有的否定词,然后发送电子邮件。替换(foo,'REDACTED')?不相关,但将此
替换为范围内的i(len(ecensered3s)):
和此
word=ecensered3s[i]
为i,枚举中的单词(ecensered3s):。电子邮件示例:谢谢,这是非常有趣的,我开始倾诉它,但你会不会有一个建议,如何找到一个短语,而不是在分裂列表?你的方法显然是优越的,而且“环”更少,但我希望我的答案尽可能接近我已经学会的方法。你的代码有足够的方法,我从来没有见过,我只是想你的答案。@ JoaaNeMm:如果你想继续使用<代码>分裂< /代码>,考虑替换“失控”的空间(和其他多字危险字)与另一个符号。无论是在列表中还是在文本本身中。然后split将它们放在一起,您可以将它们与您的列表进行比较。