Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/json/15.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 使用反斜杠转义表示的UTF-8编码字符转换原始ASCII文本_Python_Json_Twitter_Encoding_Utf 8 - Fatal编程技术网

Python 使用反斜杠转义表示的UTF-8编码字符转换原始ASCII文本

Python 使用反斜杠转义表示的UTF-8编码字符转换原始ASCII文本,python,json,twitter,encoding,utf-8,Python,Json,Twitter,Encoding,Utf 8,我收集运行以下python代码的波斯推文: #!/usr/bin/env python # -*- coding: UTF-8 -*- import sys import tweepy import json import os consumer_key ="xxxx" consumer_secret ="xxxx" access_key = "xxxx" access_secret = "xxxx" auth = tweepy.OAuthHandler(consumer_key, cons

我收集运行以下python代码的波斯推文:

#!/usr/bin/env python
# -*- coding: UTF-8 -*-
import sys
import tweepy
import json
import os

consumer_key ="xxxx"
consumer_secret ="xxxx"
access_key = "xxxx"
access_secret = "xxxx"

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)

save_file = open("Out.json", 'a')

t1 = u""

class CustomStreamListener(tweepy.StreamListener):
    def __init__(self, api):
        self.api = api
        super(tweepy.StreamListener, self).__init__()

        # self.list_of_tweets = []

    def on_data(self, tweet):
        print tweet
        save_file.write(str(tweet))

    def on_error(self, status_code):
        print >> sys.stderr, 'Encountered error with status code:', status_code
        return True # Don't kill the stream
        print "Stream restarted"

    def on_timeout(self):
        print >> sys.stderr, 'Timeout...'
        return True # Don't kill the stream
        print "Stream restarted"

def start_stream():
    while True:
        try:
            sapi = tweepy.streaming.Stream(auth, CustomStreamListener(api))
            sapi.filter(track=[t1])
        except: 
            continue

start_stream()
它返回原始ASCII文本中的tweets文本,其中包含由反斜杠转义表示的UTF-8编码字符。我想以一种直接将检索到的tweet以UTF-8编码格式保存在“Out.json”中的方式更改代码

{
    "created_at": "Tue Feb 07 08:04:17 +0000 2017",
    "id": 828877025049972737,
    "id_str": "828877025049972737",
    "text": "\u0644\u0637\u0641\u0627 \u0628\u0647 \u062d\u06cc\u0648\u0627\u0646\u0627\u062a \u063a\u06cc\u0631\u062e\u0627\u0646\u06af\u06cc \u063a\u0630\u0627\u00a0\u0646\u062f\u0647\u06cc\u062f https:\/\/t.co\/gFi5XCVQww https:\/\/t.co\/pQWPqbvJVF",
    "display_text_range": [0, 58],
    "source": "\u003ca href=\"http:\/\/publicize.wp.com\/\" rel=\"nofollow\"\u003eWordPress.com\u003c\/a\u003e",
    "truncated": false,
    "in_reply_to_status_id": null,
    "in_reply_to_status_id_str": null,
    "in_reply_to_user_id": null,
    "in_reply_to_user_id_str": null,
    "in_reply_to_screen_name": null,
    ...
    "lang": "fa",
    "timestamp_ms": "1486454657219"
}

删除
str
调用:

# ...

def on_data(self, tweet):
    print tweet
    save_file.write(tweet)

# ...
如果没有帮助,请使用模块打开您的文件:


StreamListener.on_data()
方法传递从Twitter接收的原始JSON数据。正是这些数据包含有效的JSON转义序列

如果您想直接保存UTF-8数据,那么使用实际的Unicode码点替换转义序列,您必须重新编码tweet。用于以后保存数据

请注意,将多个JSON对象写入一个文件会使该文件本身成为无效JSON。您可以通过注入新行来生成(标准的
json.dumps()
output不会在生成的json文档中生成新行),然后使用

因此,代码的重要部分应该如下所示:

import json

save_file = open("Out.json", 'a')

class CustomStreamListener(tweepy.StreamListener):
    # ...

    def on_data(self, tweet):
        tweet = json.loads(tweet)
        json_doc = json.dumps(tweet, ensure_ascii=False)
        save_file.write(json_doc.encode('utf8') + '\n')

不要使用编解码器。open();而是使用
io.open()
;这就是Python3框架,它比
编解码器
框架更健壮。谢谢你的回答。删除了“str”,但没有成功。@farlay:这是因为您的数据已经是JSON编码的,已经是字符串。这是有效的JSON数据。我不确定你到底期望什么。你想看到什么?请注意,UTF-8是ASCII的超集,因此其中的所有字符都是UTF-8。如果要生成JSON而不进行转义(因此将原始字符保存为UTF-8字节,而不是JSON转义序列),则必须解码并重新编码。看见
import json

save_file = open("Out.json", 'a')

class CustomStreamListener(tweepy.StreamListener):
    # ...

    def on_data(self, tweet):
        tweet = json.loads(tweet)
        json_doc = json.dumps(tweet, ensure_ascii=False)
        save_file.write(json_doc.encode('utf8') + '\n')