Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/18.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
从python字符串中删除字符_Python_Regex - Fatal编程技术网

从python字符串中删除字符

从python字符串中删除字符,python,regex,Python,Regex,我有几个python字符串,希望从中删除不需要的字符 示例: "This is '-' a test" should be "This is a test" "This is a test L)[_U_O-Y OH : l’J1.l'}/" should be "This is a test" "> FOO < BAR" should be "FOO BAR" "I<<W5§!‘1“¢!°\" I" should be ""

我有几个python字符串,希望从中删除不需要的字符

示例:

"This is '-' a test" 
     should be "This is a test"
"This is a test L)[_U_O-Y OH : l’J1.l'}/"
     should be "This is a test"
"> FOO < BAR" 
     should be "FOO BAR"
"I<<W5§!‘1“¢!°\" I" 
     should be "" 
     (because if only words are extracted then it returns I W I and none of them form words)
"l‘?£§l%nbia  ;‘\\~siI.ve_rswinq m"
     should be ""
"2|'J]B"
     should be ""
def myfilter(string):
    words = {'this', 'test', 'i', 'a', 'foo', 'bar'}
    return ' '.join(word for word in line.split() if word.isalpha() and word.lower() in words)

>>> myfilter("This is '-' a test")
'This a test'
>>> myfilter("This is a test L)[_U_O-Y OH : l’J1.l'}/")
'This a test'
>>> myfilter("> FOO < BAR")
'FOO BAR'
>>> myfilter("I<<W5§!‘1“¢!°\" I")
'I'
>>> myfilter("l‘?£§l%nbia  ;‘\\~siI.ve_rswinq m")
''
>>> myfilter("2|'J]B")
''
“这是一个测试”
应该是“这是一个测试”
“这是一个测试L)[[U_O-Y OH:L'J1.L'}/”
应该是“这是一个测试”
“>FOOline=re.sub(r“\W+”,“这是一个测试”)
>>>线
“这是个测试”
>>>line=re.sub(r“\W+”,“,”这是一个测试L)[U_O-Y OH:L'J1.L'}/”)
>>>线
“这是一个测试”
#虽然我希望这是“这是一个测试”,但如果不可能,我会
更喜欢“这是一个测试”
>>>line=re.sub(r“\W+”,“,”>FOO>>线
“FOOBAR”
>>>line=re.sub(r“\W+”,“”,“I>行
“IW51I”
>>>line=re.sub(r“\W+”,“l”?§l%nbia;“\\~siI.ve\rswinq m”)
>>>线
“llnbiasilive_rswinqm”
>>>line=re.sub(r“\W+”,“,”2 |'J]B“)
>>>线
“2JB”

稍后,我将通过预定义单词列表筛选正则表达式清理过的单词。

我将使用拆分和筛选,如下所示:

' '.join(word for word in line.split() if word.isalpha() and word.lower() in list)
这将删除列表中不包含的所有非字母单词和字母单词

示例:

"This is '-' a test" 
     should be "This is a test"
"This is a test L)[_U_O-Y OH : l’J1.l'}/"
     should be "This is a test"
"> FOO < BAR" 
     should be "FOO BAR"
"I<<W5§!‘1“¢!°\" I" 
     should be "" 
     (because if only words are extracted then it returns I W I and none of them form words)
"l‘?£§l%nbia  ;‘\\~siI.ve_rswinq m"
     should be ""
"2|'J]B"
     should be ""
def myfilter(string):
    words = {'this', 'test', 'i', 'a', 'foo', 'bar'}
    return ' '.join(word for word in line.split() if word.isalpha() and word.lower() in words)

>>> myfilter("This is '-' a test")
'This a test'
>>> myfilter("This is a test L)[_U_O-Y OH : l’J1.l'}/")
'This a test'
>>> myfilter("> FOO < BAR")
'FOO BAR'
>>> myfilter("I<<W5§!‘1“¢!°\" I")
'I'
>>> myfilter("l‘?£§l%nbia  ;‘\\~siI.ve_rswinq m")
''
>>> myfilter("2|'J]B")
''
def myfilter(字符串):
单词={'this','test','i','a','foo','bar'}
返回“”。join(如果word.isalpha()和word.lower()在words中,则在.split()行中逐字连接)
>>>myfilter(“这是一个测试”)
“这是一个测试”
>>>myfilter(“这是一个测试L)[[U_O-Y OH:L'J1.L'}/”)
“这是一个测试”
>>>myfilter(“>FOO>>myfilter(“I>myfilter(“l'?§l%nbia;'\\~siI.ve\u rswinq m”)
''
>>>myfilter(“2 |'J]B”)
''

此选项可清除至少包含一个非字母字符的任何非空格符号组。但会留下一些不需要的字母组:

re.sub(r"\w*[^a-zA-Z ]+\w*","","This is a test L)[_U_O-Y OH : l’J1.l'}/")
给出:

'This is a test  OH  '
它还将保留多个空间的组:

re.sub(r"[^a-zA-Z ]+\w*","","This is '-' a test")
'This is  a test'  # two spaces

不过,有一个字母的单词——“I”和“a”/“a”在本例中更新了,我不会匹配字典中的单词,而是预定义的单词列表。所以,是的,如果“I”在预定义的列表中,那么就可以了……删除我的答案,因为我没有仔细遵守单词提取要求。不过,对于字符串”l'?§l%nbia;'\\~siI.ve\rswinq m“,是否应该提取任何单词?
r'[^\w\s]+'
将匹配所有非单词非空格字符……将您的过滤器描述为“在空格上拆分字符串,删除所有包含非字母字符的元素,在空格上连接它们”是否正确“?这是一个很好的答案,因为它可以很好地扩展,但需要进行两次调整。
列表
应该是一个
集合
,用于O(1)查找。2)不要用局部变量(列表)来隐藏内置类型。@roippi:谢谢你的建议,我会在我的答案中加入它们。”。