Python 正则表达式来清理所有'；查尔斯_Python_Regex

Python 正则表达式来清理所有'；查尔斯

python regex

Python 正则表达式来清理所有'；查尔斯,python,regex,Python,Regex,我有一个RSS提要解析器，我正在使用Regex清理标记。我在使用reg4清理所有字符时遇到问题，我想知道如何使用reg4： reg1 = re.compile(r'<br />') #Regex to replace <br /> with \n (see reg1.sub) reg2 = re.compile(r'(|<[^>]*>)') #Regex to clean all html tags (anything w

我有一个RSS提要解析器，我正在使用Regex清理标记。我在使用reg4清理所有字符时遇到问题，我想知道如何使用reg4：

reg1 = re.compile(r'<br />') #Regex to replace <br /> with \n (see reg1.sub)
reg2 = re.compile(r'(<!--.*?-->|<[^>]*>)') #Regex to clean all html tags (anything with <something>)
reg3 = re.compile(r'&nbsp') #Regex to clean all &nbsp 
reg4 = re.compile(r'') #Regex to clean all ' chars (this is causing me issues for some reason)

def parseFeeds( str ):
 d = feedparser.parse(str)
 print "There are", len(d['items']), "items in", str
 FILE_INPUT = open("outputNewsFeed.txt","w")
 for item in d['items']:
  first_filter = reg1.sub('\n', item.description)
  second_filter = reg2.sub('', first_filter)
  third_filter = reg3.sub(' ', second_filter)
  item_description = reg4.sub('', third_filter)
  try:
   FILE_INPUT.write(item_description)
  except IOError:
   print "Error: can\'t find file or read data"
 FILE_INPUT.close

如果只需要删除单引号，可以按如下方式对其进行转义：

reg4 = re.compile(r'\'')

或者，如果您不介意更改编写字符串的方式，您可以使用：

reg4 = re.compile(r"'")

清理标记的正则表达式？使用BeautifulSoup会更简单、更持久。

r''

只是一个空字符串。我使用了BeautifulSoup和xml.etree，但为了练习，我在这里使用正则表达式。是的，reg4=re.compile（r''）是空的，我正在努力插入什么来删除'char'。

reg4 = re.compile(r"'")