Python 2.7 Python2.7.x-unicode问题

Python 2.7 Python2.7.x-unicode问题,python-2.7,unicode,encoding,Python 2.7,Unicode,Encoding,我正在清理这个网站www.soundkartell.de,我面临一些unicode问题: results =[] for article in soup.find_all('article'): if article.select('a[href*="alternative"]'): artist = article.h2.text results.append(artist.encode('latin1').de

我正在清理这个网站
www.soundkartell.de
,我面临一些unicode问题:

results =[] 
for article in soup.find_all('article'):
    if article.select('a[href*="alternative"]'):
        artist = article.h2.text                    
        results.append(artist.encode('latin1').decode("utf-8"))

 print artist # Din vän Skuggan 
 print results # [u'Din v\xe4n Skuggan']
我的文件顶部有
-*-编码:utf-8-*-

  • 为什么python正确地打印刮取的数据而不是附加的数据

  • 如何解决unicode问题


  • 我使用的是
    Python2.7.x

    您可能实际上没有问题。您看到的是python打印内容的副作用:

    示例代码:

    artist = 'Din vän Skuggan'
    artists = [artist]
    print 'artist:', artist
    print 'artists:', artists
    print 'str:', str(artist)
    print 'repr:', repr(artist)
    
    artist: Din vän Skuggan
    artists: ['Din v\xc3\xa4n Skuggan']
    str: Din vän Skuggan
    repr: 'Din v\xc3\xa4n Skuggan'
    
    # -*- coding: utf-8 -*- 
    
    产生:

    artist = 'Din vän Skuggan'
    artists = [artist]
    print 'artist:', artist
    print 'artists:', artists
    print 'str:', str(artist)
    print 'repr:', repr(artist)
    
    artist: Din vän Skuggan
    artists: ['Din v\xc3\xa4n Skuggan']
    str: Din vän Skuggan
    repr: 'Din v\xc3\xa4n Skuggan'
    
    # -*- coding: utf-8 -*- 
    
    如上所述,当python打印列表时,它对列表中的项目使用
    repr()
    。在这两种情况下,您都有相同的内容,python只是以不同的方式显示它

    旁注:

    artist = 'Din vän Skuggan'
    artists = [artist]
    print 'artist:', artist
    print 'artists:', artists
    print 'str:', str(artist)
    print 'repr:', repr(artist)
    
    artist: Din vän Skuggan
    artists: ['Din v\xc3\xa4n Skuggan']
    str: Din vän Skuggan
    repr: 'Din v\xc3\xa4n Skuggan'
    
    # -*- coding: utf-8 -*- 
    

    在脚本的顶部,对于代码中带有unicode文本的字符串文字非常有用。

    您可能实际上没有问题。您看到的是python打印内容的副作用:

    示例代码:

    artist = 'Din vän Skuggan'
    artists = [artist]
    print 'artist:', artist
    print 'artists:', artists
    print 'str:', str(artist)
    print 'repr:', repr(artist)
    
    artist: Din vän Skuggan
    artists: ['Din v\xc3\xa4n Skuggan']
    str: Din vän Skuggan
    repr: 'Din v\xc3\xa4n Skuggan'
    
    # -*- coding: utf-8 -*- 
    
    产生:

    artist = 'Din vän Skuggan'
    artists = [artist]
    print 'artist:', artist
    print 'artists:', artists
    print 'str:', str(artist)
    print 'repr:', repr(artist)
    
    artist: Din vän Skuggan
    artists: ['Din v\xc3\xa4n Skuggan']
    str: Din vän Skuggan
    repr: 'Din v\xc3\xa4n Skuggan'
    
    # -*- coding: utf-8 -*- 
    
    如上所述,当python打印列表时,它对列表中的项目使用
    repr()
    。在这两种情况下,您都有相同的内容,python只是以不同的方式显示它

    旁注:

    artist = 'Din vän Skuggan'
    artists = [artist]
    print 'artist:', artist
    print 'artists:', artists
    print 'str:', str(artist)
    print 'repr:', repr(artist)
    
    artist: Din vän Skuggan
    artists: ['Din v\xc3\xa4n Skuggan']
    str: Din vän Skuggan
    repr: 'Din v\xc3\xa4n Skuggan'
    
    # -*- coding: utf-8 -*- 
    

    在脚本顶部,对于代码中带有unicode文本的字符串文字非常有用。

    另外,
    de
    是德国(即德国)的国家代码。丹麦是丹麦。另外,
    de
    是德国的国家代码。丹麦是丹麦。