Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/18.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何删除包含数字、特殊字符、网站url或电子邮件的整个句子?_Python_Regex - Fatal编程技术网

Python 如何删除包含数字、特殊字符、网站url或电子邮件的整个句子?

Python 如何删除包含数字、特殊字符、网站url或电子邮件的整个句子?,python,regex,Python,Regex,如何删除包含数字、特殊字符、网站url或电子邮件的整个句子 示例输入选项A: ['Hi my name is blank.', 'Do it 3 times.', 'Check out this website: https://blah.com', 'I like pie.', 'My email is asdf@jkl@gmail.com.'] 示例输入选项B: ['Hi my name is blank. Do it 3 times. Check out this website: ht

如何删除包含数字、特殊字符、网站url或电子邮件的整个句子

示例输入选项A:

['Hi my name is blank.', 'Do it 3 times.', 'Check out this website: https://blah.com', 'I like pie.', 'My email is asdf@jkl@gmail.com.']
示例输入选项B:

['Hi my name is blank. Do it 3 times. Check out this website: https://blah.com', 'I like pie. My email is asdf@jkl@gmail.com.]
样本输出:

['Hi my name is blank.','I like pie']
当前代码:

def remove_emails(self, dataframe):
    self.log.info('Removing emails from text data')
    no_emails = dataframe.str.replace('\S*@\S*\s?', '')
    return no_emails

def remove_website_links(self, dataframe):
    self.log.info('Removing website links from text data')
    no_website_links = dataframe.str.replace('http\S+', '')
    return no_website_links

def remove_special_characters(self, dataframe):
    self.log.info('Removing special characters from text data')
    no_special_characters = dataframe.replace(r'[^A-Za-z0-9 ]+', '', regex=True)
    return no_special_characters

def remove_numbers(self, dataframe):
    self.log.info('Removing numbers from text data')
    no_numbers = dataframe.str.replace('\d+', '')
    return no_numbers

问题是上面的代码可以用来将不需要的字符串替换为空字符串,但是如果一个列表元素与上面给出的任何正则表达式匹配,我不知道如何删除整个列表元素。我也不希望对每一个句子摘录都多次浏览这个列表。总的来说,我正在从语料库中删除不好的句子。

您可以使用这个正则表达式检查各种情况,并拒绝匹配它的字符串

https?:|@\w+|\d
Python代码

import re

arr = ['Hi my name is blank.', 'Do it 3 times.', 'Check out this website: https://blah.com', 'I like pie', 'My email is asdf@jkl@gmail.com']

for s in arr:
 m = re.search(r'https?:|@\w+|\d',s)
 if (m):
  pass
 else:
  print(s)
结果只有你想要的句子

Hi my name is blank.
I like pie

您可以使用这个正则表达式检查各种情况,并拒绝匹配它的字符串

https?:|@\w+|\d
Python代码

import re

arr = ['Hi my name is blank.', 'Do it 3 times.', 'Check out this website: https://blah.com', 'I like pie', 'My email is asdf@jkl@gmail.com']

for s in arr:
 m = re.search(r'https?:|@\w+|\d',s)
 if (m):
  pass
 else:
  print(s)
结果只有你想要的句子

Hi my name is blank.
I like pie

那么问题出在哪里?@Alderven在问题中补充了澄清。那么问题出在哪里?@Alderven在问题中补充了澄清。