Python 2.7 Python2.7.x-unicode问题
我正在清理这个网站Python 2.7 Python2.7.x-unicode问题,python-2.7,unicode,encoding,Python 2.7,Unicode,Encoding,我正在清理这个网站www.soundkartell.de,我面临一些unicode问题: results =[] for article in soup.find_all('article'): if article.select('a[href*="alternative"]'): artist = article.h2.text results.append(artist.encode('latin1').de
www.soundkartell.de
,我面临一些unicode问题:
results =[]
for article in soup.find_all('article'):
if article.select('a[href*="alternative"]'):
artist = article.h2.text
results.append(artist.encode('latin1').decode("utf-8"))
print artist # Din vän Skuggan
print results # [u'Din v\xe4n Skuggan']
我的文件顶部有-*-编码:utf-8-*-
我使用的是
Python2.7.x
您可能实际上没有问题。您看到的是python打印内容的副作用:
示例代码:
artist = 'Din vän Skuggan'
artists = [artist]
print 'artist:', artist
print 'artists:', artists
print 'str:', str(artist)
print 'repr:', repr(artist)
artist: Din vän Skuggan
artists: ['Din v\xc3\xa4n Skuggan']
str: Din vän Skuggan
repr: 'Din v\xc3\xa4n Skuggan'
# -*- coding: utf-8 -*-
产生:
artist = 'Din vän Skuggan'
artists = [artist]
print 'artist:', artist
print 'artists:', artists
print 'str:', str(artist)
print 'repr:', repr(artist)
artist: Din vän Skuggan
artists: ['Din v\xc3\xa4n Skuggan']
str: Din vän Skuggan
repr: 'Din v\xc3\xa4n Skuggan'
# -*- coding: utf-8 -*-
如上所述,当python打印列表时,它对列表中的项目使用repr()
。在这两种情况下,您都有相同的内容,python只是以不同的方式显示它
旁注:
artist = 'Din vän Skuggan'
artists = [artist]
print 'artist:', artist
print 'artists:', artists
print 'str:', str(artist)
print 'repr:', repr(artist)
artist: Din vän Skuggan
artists: ['Din v\xc3\xa4n Skuggan']
str: Din vän Skuggan
repr: 'Din v\xc3\xa4n Skuggan'
# -*- coding: utf-8 -*-
在脚本的顶部,对于代码中带有unicode文本的字符串文字非常有用。您可能实际上没有问题。您看到的是python打印内容的副作用: 示例代码:
artist = 'Din vän Skuggan'
artists = [artist]
print 'artist:', artist
print 'artists:', artists
print 'str:', str(artist)
print 'repr:', repr(artist)
artist: Din vän Skuggan
artists: ['Din v\xc3\xa4n Skuggan']
str: Din vän Skuggan
repr: 'Din v\xc3\xa4n Skuggan'
# -*- coding: utf-8 -*-
产生:
artist = 'Din vän Skuggan'
artists = [artist]
print 'artist:', artist
print 'artists:', artists
print 'str:', str(artist)
print 'repr:', repr(artist)
artist: Din vän Skuggan
artists: ['Din v\xc3\xa4n Skuggan']
str: Din vän Skuggan
repr: 'Din v\xc3\xa4n Skuggan'
# -*- coding: utf-8 -*-
如上所述,当python打印列表时,它对列表中的项目使用repr()
。在这两种情况下,您都有相同的内容,python只是以不同的方式显示它
旁注:
artist = 'Din vän Skuggan'
artists = [artist]
print 'artist:', artist
print 'artists:', artists
print 'str:', str(artist)
print 'repr:', repr(artist)
artist: Din vän Skuggan
artists: ['Din v\xc3\xa4n Skuggan']
str: Din vän Skuggan
repr: 'Din v\xc3\xa4n Skuggan'
# -*- coding: utf-8 -*-
在脚本顶部,对于代码中带有unicode文本的字符串文字非常有用。另外,
de
是德国(即德国)的国家代码。丹麦是丹麦。另外,de
是德国的国家代码。丹麦是丹麦。