从python字符串中删除字符
我有几个python字符串,希望从中删除不需要的字符 示例:从python字符串中删除字符,python,regex,Python,Regex,我有几个python字符串,希望从中删除不需要的字符 示例: "This is '-' a test" should be "This is a test" "This is a test L)[_U_O-Y OH : l’J1.l'}/" should be "This is a test" "> FOO < BAR" should be "FOO BAR" "I<<W5§!‘1“¢!°\" I" should be ""
"This is '-' a test"
should be "This is a test"
"This is a test L)[_U_O-Y OH : l’J1.l'}/"
should be "This is a test"
"> FOO < BAR"
should be "FOO BAR"
"I<<W5§!‘1“¢!°\" I"
should be ""
(because if only words are extracted then it returns I W I and none of them form words)
"l‘?£§l%nbia ;‘\\~siI.ve_rswinq m"
should be ""
"2|'J]B"
should be ""
def myfilter(string):
words = {'this', 'test', 'i', 'a', 'foo', 'bar'}
return ' '.join(word for word in line.split() if word.isalpha() and word.lower() in words)
>>> myfilter("This is '-' a test")
'This a test'
>>> myfilter("This is a test L)[_U_O-Y OH : l’J1.l'}/")
'This a test'
>>> myfilter("> FOO < BAR")
'FOO BAR'
>>> myfilter("I<<W5§!‘1“¢!°\" I")
'I'
>>> myfilter("l‘?£§l%nbia ;‘\\~siI.ve_rswinq m")
''
>>> myfilter("2|'J]B")
''
“这是一个测试”
应该是“这是一个测试”
“这是一个测试L)[[U_O-Y OH:L'J1.L'}/”
应该是“这是一个测试”
“>FOOline=re.sub(r“\W+”,“这是一个测试”)
>>>线
“这是个测试”
>>>line=re.sub(r“\W+”,“,”这是一个测试L)[U_O-Y OH:L'J1.L'}/”)
>>>线
“这是一个测试”
#虽然我希望这是“这是一个测试”,但如果不可能,我会
更喜欢“这是一个测试”
>>>line=re.sub(r“\W+”,“,”>FOO>>线
“FOOBAR”
>>>line=re.sub(r“\W+”,“”,“I>行
“IW51I”
>>>line=re.sub(r“\W+”,“l”?§l%nbia;“\\~siI.ve\rswinq m”)
>>>线
“llnbiasilive_rswinqm”
>>>line=re.sub(r“\W+”,“,”2 |'J]B“)
>>>线
“2JB”
稍后,我将通过预定义单词列表筛选正则表达式清理过的单词。我将使用拆分和筛选,如下所示:
' '.join(word for word in line.split() if word.isalpha() and word.lower() in list)
这将删除列表中不包含的所有非字母单词和字母单词
示例:
"This is '-' a test"
should be "This is a test"
"This is a test L)[_U_O-Y OH : l’J1.l'}/"
should be "This is a test"
"> FOO < BAR"
should be "FOO BAR"
"I<<W5§!‘1“¢!°\" I"
should be ""
(because if only words are extracted then it returns I W I and none of them form words)
"l‘?£§l%nbia ;‘\\~siI.ve_rswinq m"
should be ""
"2|'J]B"
should be ""
def myfilter(string):
words = {'this', 'test', 'i', 'a', 'foo', 'bar'}
return ' '.join(word for word in line.split() if word.isalpha() and word.lower() in words)
>>> myfilter("This is '-' a test")
'This a test'
>>> myfilter("This is a test L)[_U_O-Y OH : l’J1.l'}/")
'This a test'
>>> myfilter("> FOO < BAR")
'FOO BAR'
>>> myfilter("I<<W5§!‘1“¢!°\" I")
'I'
>>> myfilter("l‘?£§l%nbia ;‘\\~siI.ve_rswinq m")
''
>>> myfilter("2|'J]B")
''
def myfilter(字符串):
单词={'this','test','i','a','foo','bar'}
返回“”。join(如果word.isalpha()和word.lower()在words中,则在.split()行中逐字连接)
>>>myfilter(“这是一个测试”)
“这是一个测试”
>>>myfilter(“这是一个测试L)[[U_O-Y OH:L'J1.L'}/”)
“这是一个测试”
>>>myfilter(“>FOO>>myfilter(“I>myfilter(“l'?§l%nbia;'\\~siI.ve\u rswinq m”)
''
>>>myfilter(“2 |'J]B”)
''
此选项可清除至少包含一个非字母字符的任何非空格符号组。但会留下一些不需要的字母组:
re.sub(r"\w*[^a-zA-Z ]+\w*","","This is a test L)[_U_O-Y OH : l’J1.l'}/")
给出:
'This is a test OH '
它还将保留多个空间的组:
re.sub(r"[^a-zA-Z ]+\w*","","This is '-' a test")
'This is a test' # two spaces
不过,有一个字母的单词——“I”和“a”/“a”在本例中更新了,我不会匹配字典中的单词,而是预定义的单词列表。所以,是的,如果“I”在预定义的列表中,那么就可以了……删除我的答案,因为我没有仔细遵守单词提取要求。不过,对于字符串”l'?§l%nbia;'\\~siI.ve\rswinq m“,是否应该提取任何单词?
r'[^\w\s]+'
将匹配所有非单词非空格字符……将您的过滤器描述为“在空格上拆分字符串,删除所有包含非字母字符的元素,在空格上连接它们”是否正确“?这是一个很好的答案,因为它可以很好地扩展,但需要进行两次调整。列表
应该是一个集合
,用于O(1)查找。2)不要用局部变量(列表)来隐藏内置类型。@roippi:谢谢你的建议,我会在我的答案中加入它们。”。