使用beautifulsoup的Python编码问题_Python_Encoding_Utf 8_Ascii_Beautifulsoup

使用beautifulsoup的Python编码问题

python encoding utf-8

使用beautifulsoup的Python编码问题,python,encoding,utf-8,ascii,beautifulsoup,Python,Encoding,Utf 8,Ascii,Beautifulsoup,你好.我有个问题.哪种编码当我把字符串放到beautifulsoup时，所有的国家字符都丢失了 addr = "http://zjazdowa.com.pl/index.php/aktualne-ceny-warzyw-i-owocow-.html" content = urllib2.urlopen(addr) .read() html_pag = BeautifulSoup(content) #<- there i lo

你好.我有个问题.哪种编码

当我把字符串放到beautifulsoup时，所有的国家字符都丢失了

addr = "http://zjazdowa.com.pl/index.php/aktualne-ceny-warzyw-i-owocow-.html"                                
content = urllib2.urlopen(addr) .read()
html_pag = BeautifulSoup(content) #<- there i lost all national letters 
table_html= html_pag.find("div",  id="808")

根据文档，所有输入在内部转换为UTF8：

from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup("Hello")
soup.contents[0]
# u'Hello'
soup.originalEncoding
# 'ascii'

如果您的输入未指定编码（例如，元标记），则BeautifulSoup猜测。您可以通过

fromEncoding

参数指定输入的编码来禁用猜测：

soup = BeautifulSoup("hello", fromEncoding="UTF-8")

或者，您真正的问题是将结果“中断”输出到控制台吗？

并且您的代码工作正常：

>>> addr = "http://zjazdowa.com.pl/index.php/aktualne-ceny-warzyw-i-owocow-.html"                                
>>> content = urllib2.urlopen(addr) .read()
>>> html_pag = BeautifulSoup(content) #<- there i lost all national letters 
>>> table_html= html_pag.find("div",  id="808")
>>> print table_html.findAll('td')[8].string
Kapusta włoska

重新加载

重新加载模块。我不确定你希望通过重新加载

sys

来做什么，但这不会给你带来任何好处。

FYI：他的网页使用内容类型标题和标记正确地指定了编码。我猜你的“真正问题”是猜测实际问题是什么……注意，在BeautifulSoup 4中，fromEncoding被重命名为from_encoding。你发布的代码有效，并保留了所有“国家”字符。

>>> addr = "http://zjazdowa.com.pl/index.php/aktualne-ceny-warzyw-i-owocow-.html"                                
>>> content = urllib2.urlopen(addr) .read()
>>> html_pag = BeautifulSoup(content) #<- there i lost all national letters 
>>> table_html= html_pag.find("div",  id="808")
>>> print table_html.findAll('td')[8].string
Kapusta włoska

#!/usr/bin/python2.7
# -*- coding: utf-8 -*-
from BeautifulSoup import BeautifulSoup
import urllib2, string, re , sys
reload(sys)
sys.setdefaultencoding("utf-8")