由于内存错误，在Python中将大型JSON文件解析为数据帧时出现问题_Python_Json_Parsing_Memory_Twitter

由于内存错误，在Python中将大型JSON文件解析为数据帧时出现问题

python json parsing memory twitter

由于内存错误，在Python中将大型JSON文件解析为数据帧时出现问题,python,json,parsing,memory,twitter,Python,Json,Parsing,Memory,Twitter,我在PyCharm中使用以下Python3代码从包含Twitter数据的JSON文件中解析出用户ID和tweet ID。我成功地创建了列表、数据帧，并导出了13个文件的CSV，这些文件的大小最大为57MB with open(file, 'r', encoding='utf8' , errors='ignore') as json_file: data = json.loads("[" + json_file.read().replace("}\n{", "},\n{") + "]")

我在PyCharm中使用以下Python3代码从包含Twitter数据的JSON文件中解析出用户ID和tweet ID。我成功地创建了列表、数据帧，并导出了13个文件的CSV，这些文件的大小最大为57MB

with open(file, 'r', encoding='utf8' , errors='ignore') as json_file:
    data = json.loads("[" + json_file.read().replace("}\n{", "},\n{") +  "]")

user_ids = []
for tweet in data:
    if 'user' in tweet.keys():
        if 'id_str' in tweet["user"].keys():
            user_ids.append(tweet["user"]["id_str"])

tweet_ids = []
for tweet in data:
    if 'id_str' in tweet.keys():
        tweet_ids.append(tweet['id_str'])

data_tuples = list(zip(user_ids, tweet_ids))

df = pd.DataFrame(data_tuples, columns = ['User ID', 'Tweet ID'])

print(df)
print('\nLength is ' + str(len(df)))
df.to_csv(outfile, encoding='utf-8', index=False)

但是，当我尝试在相同结构的26GBJSON文件上应用此代码时，我收到以下内存错误：

Traceback (most recent call last):
  File "C:/Users/taylo/PycharmProjects/TwitterTest/JSON_Flatten.py", line 23, in <module>
    data = json.loads("[" + json_file.read().replace("}\n{", "},\n{") +  "]")
MemoryError

回溯（最近一次呼叫最后一次）：
文件“C:/Users/taylo/PycharmProjects/TwitterTest/JSON_Flatten.py”，第23行，在
data=json.load（“[”+json_file.read（）.replace（“}\n{”，“}\n{”）+“]”）
记忆者

有没有一种方法可以将此文件分部分读取，然后依次将部分附加到输出文件中？

嗯……您确定您正在读取的是一个有效的json文件吗？您不必费劲地替换换行符。我对文件的整洁度表示怀疑，但以这种方式调整代码对其他13个文件有效。只需尝试一下g在不进行调整的情况下读取文件确实存在错误。@thb5018是否有任何方法可以共享文件的开头和结尾，以便我们至少可以检查其中的结构？