Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/351.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在python中替换HTML代码_Python_Html_Regex_Replace_Tkinter - Fatal编程技术网

在python中替换HTML代码

在python中替换HTML代码,python,html,regex,replace,tkinter,Python,Html,Regex,Replace,Tkinter,我使用正则表达式解析网站的源代码,并在Tkinter窗口中显示新闻标题。有人告诉我,用正则表达式解析HTML不是最好的主意,但不幸的是,我现在没有时间改变 我似乎无法替换特殊字符(如撇号(“)的HTML代码 目前我有以下几点: union_url = 'http://www.news.com.au/sport/rugby' def union(): union_string = urlopen(union_url).read() union_string.replace("&a

我使用正则表达式解析网站的源代码,并在
Tkinter
窗口中显示新闻标题。有人告诉我,用正则表达式解析HTML不是最好的主意,但不幸的是,我现在没有时间改变

我似乎无法替换特殊字符(如撇号(
)的HTML代码

目前我有以下几点:

union_url = 'http://www.news.com.au/sport/rugby'

def union():
    union_string = urlopen(union_url).read()
    union_string.replace("’", "'")
    union_headline = re.findall('(?:sport/rugby/.*) >(.*)<', union_string)
    union_headline_label= Label(union_window, text = union_headline[0], font=('Times',20,'bold'),  bg = 'White', width = 85, height = 3, wraplength = 500)

我试图找到一个答案,但运气不好。非常感谢您提供的任何帮助。

您可以使用re.sub()的“可调用”功能来取消浏览(或删除)任何转义的内容:

>>> import re
>>> def htmlUnescape(m):
...     return unichr(int(m.group(1), 16))
...
>>> re.sub('&#([^;]+);', htmlUnescape, "This is something &#8217; with an HTML-escaped character in it.")
u'This is something \u8217 with an HTML-escaped character in it.'
>>>

您是否正在尝试从html源获取数据或解析数据?抱歉-获取数据以显示在tkinter Widget上听说过您的生活会更好。。。解析HTML可能很困难。
>>> import re
>>> def htmlUnescape(m):
...     return unichr(int(m.group(1), 16))
...
>>> re.sub('&#([^;]+);', htmlUnescape, "This is something &#8217; with an HTML-escaped character in it.")
u'This is something \u8217 with an HTML-escaped character in it.'
>>>