使用beautifulsoup的Python编码问题

使用beautifulsoup的Python编码问题,python,encoding,utf-8,ascii,beautifulsoup,Python,Encoding,Utf 8,Ascii,Beautifulsoup,你好.我有个问题.哪种编码 当我把字符串放到beautifulsoup时,所有的国家字符都丢失了 addr = "http://zjazdowa.com.pl/index.php/aktualne-ceny-warzyw-i-owocow-.html" content = urllib2.urlopen(addr) .read() html_pag = BeautifulSoup(content) #<- there i lo

你好.我有个问题.哪种编码

当我把字符串放到beautifulsoup时,所有的国家字符都丢失了

addr = "http://zjazdowa.com.pl/index.php/aktualne-ceny-warzyw-i-owocow-.html"                                
content = urllib2.urlopen(addr) .read()
html_pag = BeautifulSoup(content) #<- there i lost all national letters 
table_html= html_pag.find("div",  id="808") 

根据文档,所有输入在内部转换为UTF8:

from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup("Hello")
soup.contents[0]
# u'Hello'
soup.originalEncoding
# 'ascii'
如果您的输入未指定编码(例如,元标记),则BeautifulSoup猜测。您可以通过
fromEncoding
参数指定输入的编码来禁用猜测:

soup = BeautifulSoup("hello", fromEncoding="UTF-8")

或者,您真正的问题是将结果“中断”输出到控制台吗?

并且您的代码工作正常:

>>> addr = "http://zjazdowa.com.pl/index.php/aktualne-ceny-warzyw-i-owocow-.html"                                
>>> content = urllib2.urlopen(addr) .read()
>>> html_pag = BeautifulSoup(content) #<- there i lost all national letters 
>>> table_html= html_pag.find("div",  id="808")
>>> print table_html.findAll('td')[8].string
Kapusta włoska

重新加载
重新加载模块。我不确定你希望通过重新加载
sys
来做什么,但这不会给你带来任何好处。

FYI:他的网页使用内容类型标题和标记正确地指定了编码。我猜你的“真正问题”是猜测实际问题是什么……注意,在BeautifulSoup 4中,fromEncoding被重命名为from_encoding。你发布的代码有效,并保留了所有“国家”字符。
>>> addr = "http://zjazdowa.com.pl/index.php/aktualne-ceny-warzyw-i-owocow-.html"                                
>>> content = urllib2.urlopen(addr) .read()
>>> html_pag = BeautifulSoup(content) #<- there i lost all national letters 
>>> table_html= html_pag.find("div",  id="808")
>>> print table_html.findAll('td')[8].string
Kapusta włoska
#!/usr/bin/python2.7
# -*- coding: utf-8 -*-
from BeautifulSoup import BeautifulSoup
import urllib2, string, re , sys
reload(sys)
sys.setdefaultencoding("utf-8")