在python中解码html编码字符串

在python中解码html编码字符串,python,html,xml,Python,Html,Xml,我有以下字符串 "Scam, hoax, or the real deal, he’s gonna work his way to the bottom of the sordid tale, and hopefully end up with an arcade game in the process." 我需要把它变成这根绳子 "Scam, hoax, or the real deal, he’s gonna work his way to the

我有以下字符串

"Scam, hoax, or the real deal, he’s gonna work his way to the bottom of the sordid tale, and hopefully end up with an arcade game in the process."
我需要把它变成这根绳子

"Scam, hoax, or the real deal, he’s gonna work his way to the bottom of the sordid tale, and hopefully end up with an arcade game in the process."
骗局,骗局,或是真正的交易, 他会努力工作到最后 肮脏故事的最底层,以及 希望以一个街机游戏结束 在这个过程中

这是相当标准的HTML编码,我一辈子都不知道如何在python中转换它

我发现:

它非常接近于工作,但是它不输出撇号,而是输出一些非unicode字符

下面是GitHub脚本的输出示例

骗局、骗局或真正的交易,他 他会一路走到底的 肮脏的故事,并希望结束 在这个过程中有一个街机游戏


您试图做的是所谓的“HTML实体解码”,它包含在许多过去的堆栈溢出问题中,例如:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from BeautifulSoup import BeautifulSoup

string = "Scam, hoax, or the real deal, he’s gonna work his way to the bottom of the sordid tale, and hopefully end up with an arcade game in the process."
s = BeautifulSoup(string,convertEntities=BeautifulSoup.HTML_ENTITIES).contents[0]
print s
以下是使用HTML解析库解码示例的代码片段:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from BeautifulSoup import BeautifulSoup

string = "Scam, hoax, or the real deal, he’s gonna work his way to the bottom of the sordid tale, and hopefully end up with an arcade game in the process."
s = BeautifulSoup(string,convertEntities=BeautifulSoup.HTML_ENTITIES).contents[0]
print s
以下是输出:

骗局,骗局,或真正的交易,他 他会一路走到底的 肮脏的故事,并希望结束 在这个过程中有一个街机游戏


您试图做的是所谓的“HTML实体解码”,它包含在许多过去的堆栈溢出问题中,例如:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from BeautifulSoup import BeautifulSoup

string = "Scam, hoax, or the real deal, he’s gonna work his way to the bottom of the sordid tale, and hopefully end up with an arcade game in the process."
s = BeautifulSoup(string,convertEntities=BeautifulSoup.HTML_ENTITIES).contents[0]
print s
以下是使用HTML解析库解码示例的代码片段:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from BeautifulSoup import BeautifulSoup

string = "Scam, hoax, or the real deal, he’s gonna work his way to the bottom of the sordid tale, and hopefully end up with an arcade game in the process."
s = BeautifulSoup(string,convertEntities=BeautifulSoup.HTML_ENTITIES).contents[0]
print s
以下是输出:

骗局,骗局,或真正的交易,他 他会一路走到底的 肮脏的故事,并希望结束 在这个过程中有一个街机游戏


我之前尝试过BeautifulSoup代码,但仍然遇到异常。原来是我代码中的其他地方在使用解码的Unicode字符时遇到了问题。我之前尝试过BeautifulSoup代码,但仍然遇到异常。原来是我代码中的其他地方在使用解码的Unicode字符时遇到了问题。