Python中TwitterSearch的编码错误_Python_Encoding_Utf 8

Python中TwitterSearch的编码错误

python encoding utf-8

Python中TwitterSearch的编码错误,python,encoding,utf-8,Python,Encoding,Utf 8,我想使用TwitterSearch将推文导入csv。但是，脚本不捕捉特殊字符（例如法语中的重音）。我尝试过几种方法，比如添加.encode（'utf-8'），但没有成功如果我试着写： tweet_text = tweet['text'].strip().encode('utf-8', 'ignore') 然后我得到 Traceback (most recent call last): File "/Users/usr/Documents/Python/twitter_search2.py

我想使用TwitterSearch将推文导入csv。但是，脚本不捕捉特殊字符（例如法语中的重音）。我尝试过几种方法，比如添加.encode（'utf-8'），但没有成功

如果我试着写：

tweet_text = tweet['text'].strip().encode('utf-8', 'ignore')

然后我得到

 Traceback (most recent call last): File "/Users/usr/Documents/Python/twitter_search2.py", line 56, in <module> get_tweets(query, max_tweets) File "/Users/usr/Documents/Python/twitter_search2.py", line 44, in get_tweets print('@%s: %s' % (user, tweet_text)) UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 32: ordinal not in range(128)

非常感谢你的帮助

您正在插入编码的tweet和用户名：

如果

用户

对象是Unicode字符串，则将失败：

>>> user = u'Héllo'
>>> tweet_text = u'Héllo'.encode('utf8')
>>> '@%s: %s' % (user, tweet_text)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)

那么你会得到什么错误呢？请包含完整的回溯。使用此代码，脚本可以正常工作，但特殊字符被忽略，不会出现在句子中。我正试图找到一种包含它们的方法。是的，因为您将文本编码为ASCII，而忽略所有不适合的内容。这有很多不适合的地方。我很困惑：/如果我试图在get_tweets（查询，max_tweets）文件的第56行写：tweet_text=tweet['text'].strip（）.encode（'utf-8'，'ignore'），那么我会得到回溯（最近的一次调用）：File“/Users/usr/Documents/Python/twitter_search2.py”“/Users/usr/Documents/Python/twitter_search2.py”，第44行，在get_tweets print（“@%s:%s”“（用户，tweet_文本））UnicodeDecodeError:“ascii”编解码器无法解码第32位的字节0xc3：序号不在范围内（128）看起来你的一些推文不是Unicode而是字节字符串。然后在编码之前先进行隐式解码。你能在你的问题中包括这一点吗？哇！它工作得很好，我理解它背后的逻辑。非常感谢。如果可以帮助其他人，我也会将评论发送给库的作者。谢谢。哦ps:）好了！再次谢谢你。

print('@%s: %s' % (user, tweet_text))

>>> user = u'Héllo'
>>> tweet_text = u'Héllo'.encode('utf8')
>>> '@%s: %s' % (user, tweet_text)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)

tweet_text = tweet['text'].strip()
tweet_text = u''.join(tweet_text.splitlines())
print i, time,
if tweet['geo'] and tweet['geo']['coordinates'][0]: 
    lat, long = tweet['geo']['coordinates'][:2]
    print u'@%s: %s' % (user, tweet_text), lat, long
else:
    print u'@%s: %s' % (user, tweet_text)

writer.writerow([user.encode('utf8'), time.encode('utf8'), 
                 tweet_text.encode('utf8'), lat, long])