使用Python检索Twitter数据时出现Unicode解码错误_Python_Python 2.7_Twitter_Unicode_Tweepy

使用Python检索Twitter数据时出现Unicode解码错误

python python-2.7 twitter unicode

使用Python检索Twitter数据时出现Unicode解码错误,python,python-2.7,twitter,unicode,tweepy,Python,Python 2.7,Twitter,Unicode,Tweepy,检索特定阿拉伯语关键字的Twitter数据时，如下所示： #imports from tweepy import Stream from tweepy import OAuthHandler from tweepy.streaming import StreamListener #setting up the keys consumer_key = '………….' consumer_secret = '…………….' access_token = '…………..' access_secre

检索特定阿拉伯语关键字的Twitter数据时，如下所示：

#imports
from tweepy import Stream
from tweepy import OAuthHandler 
from tweepy.streaming import StreamListener

#setting up the keys
consumer_key = '………….' 
consumer_secret = '…………….'
access_token = '…………..'
access_secret = '……...'

class TweetListener(StreamListener):
    # A listener handles tweets are the received from the stream.
    #This is a basic listener that just prints received tweets to standard output

    def on_data(self, data):
        print (data)
        return True

    def on_error(self, status):
        print (status)

    #printing all the tweets to the standard output
    auth = OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_secret)

    stream = Stream(auth, TweetListener())
    stream.filter(track=['سوريا'])

我收到了以下错误消息：

Traceback (most recent call last):
File "/Users/Mona/Desktop/twitter.py", line 29, in <module>
stream.filter(track=['سوريا'])
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site- packages/tweepy/streaming.py", line 303, in filter
encoded_track = [s.encode(encoding) for s in track]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd8 in position 0: ordinal not in range(128)

回溯（最近一次呼叫最后一次）：
文件“/Users/Mona/Desktop/twitter.py”，第29行，在
stream.filter（磁道=['سويا']）
过滤器中的文件“/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tweepy/streaming.py”，第303行
encoded_track=[s.encode（编码）用于轨迹中的s]
UnicodeDecodeError:“ascii”编解码器无法解码位置0中的字节0xd8:序号不在范围内（128）

请帮忙

我查看了tweepy的源代码，在源代码中找到了该流的行。该行来自filter方法。当您在代码中调用

stream.filter（track=['سويا']）

时，流调用


s、 encode（'utf-8'）

其中s='سويا'（查看过滤器的源代码，您会发现utf-8是默认编码）。在这一点上，代码抛出一个异常

要解决这个问题，我们需要使用Unicode字符串

 t = u"سوريا"
 stream.filter(track=[t])

（为了清楚起见，我将字符串放入变量t中）