Python 已解码字符串的Unicode错误?
我在连接我先前在代码中已解码的字符串时遇到问题:Python 已解码字符串的Unicode错误?,python,json,unicode,python-requests,Python,Json,Unicode,Python Requests,我在连接我先前在代码中已解码的字符串时遇到问题: import json import requests import jsonobject for i in range(0, 3): #for loop to feed parameter to url params if i == 0: var = "0" var2 = "Home" elif i == 1: var = "1" var2 = "Away"
import json
import requests
import jsonobject
for i in range(0, 3): #for loop to feed parameter to url params
if i == 0:
var = "0"
var2 = "Home"
elif i == 1:
var = "1"
var2 = "Away"
elif i == 2:
var = "2"
var2 = "Overall"
url = 'http://www.whoscored.com/StatisticsFeed/1/GetPlayerStatistics'
params = {
'category': 'tackles',
'subcategory': 'success',
'statsAccumulationType': '0',
'isCurrent': 'true',
'playerId': '',
'teamIds': '',
'matchId': '',
'stageId': '9155',
'tournamentOptions': '2',
'sortBy': 'Rating',
'sortAscending': '',
'age': '',
'ageComparisonType': '',
'appearances': '',
'appearancesComparisonType': '0',
'field': var2, #from for loop
'nationality': '',
'positionOptions': "'FW','AML','AMC','AMR','ML','MC','MR','DMC','DL','DC','DR','GK','Sub'",
'timeOfTheGameEnd': '5',
'timeOfTheGameStart': '0',
'isMinApp': '',
'page': '1',
'includeZeroValues': '',
'numberOfPlayersToPick': '10'
}
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36',
'X-Requested-With': 'XMLHttpRequest',
'Host': 'www.whoscored.com',
'Referer': 'http://www.whoscored.com/'}
responser = requests.get(url, params=params, headers=headers)
responser = responser.json()
playerTableStats = responser[u'playerTableStats']
for statDict in playerTableStats:
mylookup = ("{name},{firstName},{lastName},{positionText},{tournamentId},{tournamentShortName},{regionCode}"
"{tournamentRegionId},{seasonId},{seasonName},{teamName},{teamId},{playerId}"
"{minsPlayed},{ranking},{rating:.2f},{apps},{weight:.2f},{height:.2f},{playedPositions}"
"{isManOfTheMatch},{isOpta},{subOn},".decode('cp1252').format(**statDict)) #generates none match data about players
print mylookup
mykey2 = (var2)
print mykey2
mykey3 = {}
#create dynamic variables and join match and none match data together
mykey3[mykey2] = ("{challengeLost:.2f},{tackleWonTotal:.2f},{tackleTotalAttempted:.2f},".decode('cp1252').format(**statDict))
print mykey3[mykey2]
mykey3[mykey2] = mykey3[mykey2],'*,'
mykey3[mykey2] = str(''.join(mykey3[mykey2][0:2]))
mykey3[mykey2] = mylookup,mykey3[mykey2]
mykey3[mykey2] = str(''.join(mykey3[mykey2][0:2]))
print mykey3[mykey2]
mykey3[mykey2] = mykey3[mykey2],'*,'
mykey3[mykey2] = str(''.join(mykey3[mykey2][0:2]))
我得到一个错误,上面写着:
Traceback (most recent call last):
File "C:\Python27\counter.py", line 72, in <module>
mykey3[mykey2] = str(''.join(mykey3[mykey2][0:2]))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe0' in position 6: ordinal not in range(128)
…或:
mykey3[mykey2] = mykey3[mykey2],'*,'
mykey3[mykey2] = str(''.join(mykey3[mykey2][0:2])).decode('cp1252')
…但是,这仍然会产生相同的错误
有人知道我做错了什么吗?您正在尝试以非常迂回的方式用逗号连接两个值,方法是创建一个元组,然后将元组转换回字符串。不要这样做,只需使用字符串格式 您需要使用Unicode文字,而不是解码字符串:
mykey3[mykey2] = u"{challengeLost:.2f},{tackleWonTotal:.2f},{tackleTotalAttempted:.2f},".format(**statDict)
注意字符串上的u
前缀。实际上,在字符串文本中没有使用任何非ASCII字符,因此甚至不需要在其中声明编码
但是使用元组,然后在这些元组上使用str()
,会导致异常。只是根本不要在这里使用str()
;您尝试再次将连接在一起的Unicode字符串转换为字节字符串,然后尝试使用Unicode值将该字节字符串重新连接,然后再次转换为字节字符串,但失败:
>>> mylookup = ("{name},{firstName},{lastName},{positionText},{tournamentId},{tournamentShortName},{regionCode}"
... "{tournamentRegionId},{seasonId},{seasonName},{teamName},{teamId},{playerId}"
... "{minsPlayed},{ranking},{rating:.2f},{apps},{weight:.2f},{height:.2f},{playedPositions}"
... "{isManOfTheMatch},{isOpta},{subOn},".decode('cp1252').format(**statDict))
>>> ''.join(mykey3[mykey2][0:2])
u'Cesc F\xe0bregas,Cesc,F\xe0bregas,Midfielder,2,EPL,es252,4311,2014/2015,Chelsea,15,8040532,5,8.09,6,74.00,175.00,-FW-MC-ML-MR-False,True,0,2.83,1.17,4.00,*,*,2.83,1.17,4.00,*,'
>>> str(''.join(mykey3[mykey2][0:2]))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe0' in position 6: ordinal not in range(128)
或者只是附加到现有字符串:
mykey3[mykey2] += u',*,'
或者只需使用one字符串格式化操作将所有数据放在一个字符串中即可:
mylookup = (
u"{name},{firstName},{lastName},{positionText},{tournamentId},{tournamentShortName},{regionCode}"
u"{tournamentRegionId},{seasonId},{seasonName},{teamName},{teamId},{playerId}"
u"{minsPlayed},{ranking},{rating:.2f},{apps},{weight:.2f},{height:.2f},{playedPositions}"
u"{isManOfTheMatch},{isOpta},{subOn},"
u"{challengeLost:.2f},{tackleWonTotal:.2f},{tackleTotalAttempted:.2f},"
u"*,*,".format(**statDict)
)
与其解码字符串文字,不如使用
u'…'
,如果任何字符串不只是ASCII,则使用a。@MartijnPieters我现在已经更正了缩进。关于你的建议,这是一个我不熟悉的领域……是否有办法对其进行编码。在字符串join语句中的某个地方进行解码以使代码工作?在你的代码中,没有任何地方可以真正提取名称。您似乎只是在输入以逗号分隔的统计数据列表(浮点数)。@MartijnPieters如果您运行上述代码,它会将一系列字符串打印到日志中……当它遇到一个名为cesc fabregas的字符串时,它会抛出一个错误,因为名称中没有ascii字符。我不确定您为什么要使用str()
在您创建的2值元组上。为什么不将你的统计数据格式化为只在末尾包含*,
:u“{challengeLost:.2f},{tacklewonttal:.2f},{tackletotalattent:.2f},*,”。格式(**statict)
?
mykey3[mykey2] += u',*,'
mylookup = (
u"{name},{firstName},{lastName},{positionText},{tournamentId},{tournamentShortName},{regionCode}"
u"{tournamentRegionId},{seasonId},{seasonName},{teamName},{teamId},{playerId}"
u"{minsPlayed},{ranking},{rating:.2f},{apps},{weight:.2f},{height:.2f},{playedPositions}"
u"{isManOfTheMatch},{isOpta},{subOn},"
u"{challengeLost:.2f},{tackleWonTotal:.2f},{tackleTotalAttempted:.2f},"
u"*,*,".format(**statDict)
)