使用Python'；s replace（）方法来实现stopword功能_Python

使用Python'；s replace（）方法来实现stopword功能

python

使用Python'；s replace（）方法来实现stopword功能,python,Python,我试图从字符串列表中的每个元素中去掉子字符串。我很难理解如何处理一个字符串有多个子字符串（stopwords）的情况 wines = ("2008 Chardonnay", "Cabernet Sauvignon 2009", "Bordeaux 2005 Cotes du Rhone") stop_words = ("2005", "2008", "2009", "Cotes du Rhone") result = [] for wine in wines: for stop in

我试图从字符串列表中的每个元素中去掉子字符串。我很难理解如何处理一个字符串有多个子字符串（stopwords）的情况

wines = ("2008 Chardonnay", "Cabernet Sauvignon 2009", "Bordeaux 2005 Cotes du Rhone")
stop_words = ("2005", "2008", "2009", "Cotes du Rhone")
result = []

for wine in wines:
    for stop in stop_words:
        if stop in wine:
            x = wine.replace(stop, "")
            result.append(x)

print result

将if语句更改为for或while将返回垃圾或挂起。有什么建议吗？

一点缩进和改变变量就能解决你的问题

for wine in wines:
    glass=wine #Lets pour your wine in a glass
    for stop in stop_words:
        if stop in glass: #Is stop in your glass? 
            #Replace stop in glass and pour it in the glass again
            glass = glass.replace(stop, "") 
    result.append(glass) #Finally pour the content from your glass to result


result
[' Chardonnay', 'Cabernet Sauvignon ', 'Bordeaux  ']

如果你想冒险，你可以使用正则表达式。我相信在这种情况下，正则表达式可能比简单循环更快

>>> for wine in wines:
    result.append(re.sub('('+'|'.join(stop_words)+')','',wine))    

>>> result
[' Chardonnay', 'Cabernet Sauvignon ', 'Bordeaux  ']
>>>

或者把它列为一个列表

>>> [re.sub('('+'|'.join(stop_words)+')','',wine) for wine in wines]
[' Chardonnay', 'Cabernet Sauvignon ', 'Bordeaux  ']
>>>

稍微缩进和改变变量就能解决你的问题

for wine in wines:
    glass=wine #Lets pour your wine in a glass
    for stop in stop_words:
        if stop in glass: #Is stop in your glass? 
            #Replace stop in glass and pour it in the glass again
            glass = glass.replace(stop, "") 
    result.append(glass) #Finally pour the content from your glass to result


result
[' Chardonnay', 'Cabernet Sauvignon ', 'Bordeaux  ']

如果你想冒险，你可以使用正则表达式。我相信在这种情况下，正则表达式可能比简单循环更快

>>> for wine in wines:
    result.append(re.sub('('+'|'.join(stop_words)+')','',wine))    

>>> result
[' Chardonnay', 'Cabernet Sauvignon ', 'Bordeaux  ']
>>>

或者把它列为一个列表

>>> [re.sub('('+'|'.join(stop_words)+')','',wine) for wine in wines]
[' Chardonnay', 'Cabernet Sauvignon ', 'Bordeaux  ']
>>>

使用

regex

会更好

>>> wines = ("2008 Chardonnay", "Cabernet Sauvignon 2009", "Bordeaux 2005 Cotes du Rhone")
>>> stop_words = ("2005", "2008", "2009", "Cotes du Rhone")
>>> import re
>>> [re.sub('|'.join(stop_words),'',wine) for wine in wines]
[' Chardonnay', 'Cabernet Sauvignon ', 'Bordeaux  ']

使用

regex

会更好

>>> wines = ("2008 Chardonnay", "Cabernet Sauvignon 2009", "Bordeaux 2005 Cotes du Rhone")
>>> stop_words = ("2005", "2008", "2009", "Cotes du Rhone")
>>> import re
>>> [re.sub('|'.join(stop_words),'',wine) for wine in wines]
[' Chardonnay', 'Cabernet Sauvignon ', 'Bordeaux  ']

考虑到jamylaks建议使用

strip（）

，作为一个单行程序：

请注意，这在Python2.x中可以正常工作，但在Python3中不行，因为

reduce（）

已移动到一个单独的库中。如果您使用的是Python 3，请执行以下操作：

import functools as ft
[ft.reduce(lambda x,y: x.replace(y, "").strip(), stop_words, wine) for wine in wines]

考虑到jamylaks建议使用

strip（）

，作为一个单行程序：

请注意，这在Python2.x中可以正常工作，但在Python3中不行，因为

reduce（）

已移动到一个单独的库中。如果您使用的是Python 3，请执行以下操作：

import functools as ft
[ft.reduce(lambda x,y: x.replace(y, "").strip(), stop_words, wine) for wine in wines]

@贾米拉克：我想你就在我旁边偷看我的笔记本电脑：-）哈哈，谢谢大家，我跟着第一篇帖子说：）我也更喜欢正则表达式的实现。@贾米拉克：我想你就在我旁边偷看我的笔记本电脑：-）哈哈，谢谢大家，我使用了第一篇文章：）我也更喜欢正则表达式的实现。您可能希望在每个要删除的字符串上使用

x.strip（）

whitespace@jamylak我在我的脚本和标点符号的其他地方也在这样做，这只是一个更大项目的一小部分。我不认为“子字符串”是正确的术语。可以将其具体称为“字符串元组”，或者更一般地称为字符串的“序列”，但不能将其称为“子字符串字符串”，例如，您可能希望在每个要删除的字符串上使用

x.strip（）

whitespace@jamylak我在脚本和标点符号的其他地方也这么做，这只是一个较大项目的一小部分。我认为“子字符串”不是正确的术语。可以将其具体称为“字符串元组”，或者更一般地称为字符串的“序列”，但不能称为“子字符串字符串”。请记住，每个人都警告不要使用reduce，并且它在python3中不能作为内置项使用。这是使用

reduce（）

是合适的情况之一，我会在Python 3中错过它…请记住，每个人都警告不要使用reduce，并且它在Python 3中不能作为内置项使用。这是使用

reduce（）

是合适的情况之一，我会在Python 3中错过它。。。