Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/321.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用Python将数据序列化为json_Python_Json_<img Src="//i.stack.imgur.com/RUiNP.png" Height="16" Width="18" Alt="" Class="sponsor Tag Img">elasticsearch - Fatal编程技术网 elasticsearch,Python,Json,elasticsearch" /> elasticsearch,Python,Json,elasticsearch" />

使用Python将数据序列化为json

使用Python将数据序列化为json,python,json,elasticsearch,Python,Json,elasticsearch,我有70 GB的文件,大部分是TXT、CSV和日志,来源于公开披露的信息,用于研究、训练神经网络等。我想将文件中的每一行序列化为json,并推动弹性搜索以利用它。行可能包含json编码器应该转义的特殊字符,如俄语字母、韩语等。由于Apache Lucene文件大小的限制,我不能将一个10 GB的文件编码为一个对象并将其推送到elastic 大多数条目包含: 9:username:someemail@gstuff:eafff17afbef485a894][;'.f6d39c56b79: 254:S

我有70 GB的文件,大部分是TXT、CSV和日志,来源于公开披露的信息,用于研究、训练神经网络等。我想将文件中的每一行序列化为json,并推动弹性搜索以利用它。行可能包含json编码器应该转义的特殊字符,如俄语字母、韩语等。由于Apache Lucene文件大小的限制,我不能将一个10 GB的文件编码为一个对象并将其推送到elastic

大多数条目包含:

9:username:someemail@gstuff:eafff17afbef485a894][;'.f6d39c56b79:
254:Starcius:someemail@gstuff:09160da290bcd1f83fssf0bd260e13d4f:
2:username:someemail@gstuff:104b77708bb7c19b9f913449c923a898:8
2:username:someemail@gstuff:efc38fca88d8e58089adccce3e05f93
254:username:someemail@gstuff:880896502dd68b546258\][;.'54cca34
2:username:someemail@gstuff:647b61ba8f0965e762c579e5b3da9eca:hUr
2:username:someemail@gstuff::3e9478fcecb4e90266art87g8fiuba90c6ed5473c:\^c
2:username:someemail@gstuff:9df5783228asdasddas796e18cb12e44da:,M|
我希望获取文件的每一行(由新行分隔)并生成类似(转义非法的json字符):

解决这个问题的最佳方法是什么

import json

read_my_file = open("my_file.txt","r") #open your file, I copied and paste your example in my file

lines= read_my_file.readlines()#read each line separatelly
my_list=[]#create my new list of items

for i in lines:#do a for loop for all the element in lines
    my_list.append({"data":i})#for each loop create a dictionary and append it on my list

print (my_list)#print my list to ensure that it's correct

my_json=json.dumps(my_list)#convert my list to json
print (my_json)#print my json

如果您需要更多详细信息,请告诉我;)

下面的代码不能读取内存中的所有内容。既然你今天谈论10Gb文件 可能很重要。我想这样做:

#!/usr/bin/env python3

import json


def convert2json(filename):
    with open(filename) as I:
        for line in I:
            d = {"data": line}
            print(json.dumps(d))

if __name__ == "__main__":
    import sys

    convert2json(sys.argv[1])

% python scriptname.py yourfile
{"data": "9:username:someemail@gstuff:eafff17afbef485a894][;'.f6d39c56b79:\n"}
{"data": "254:Starcius:someemail@gstuff:09160da290bcd1f83fssf0bd260e13d4f:\n"}
{"data": "2:username:someemail@gstuff:104b77708bb7c19b9f913449c923a898:8\n"}
{"data": "2:username:someemail@gstuff:efc38fca88d8e58089adccce3e05f93\n"}
{"data": "254:username:someemail@gstuff:880896502dd68b546258\\][;.'54cca34\n"}
{"data": "2:username:someemail@gstuff:647b61ba8f0965e762c579e5b3da9eca:hUr\n"}
{"data": "2:username:someemail@gstuff::3e9478fcecb4e90266art87g8fiuba90c6ed5473c:\\^c\n"}
{"data": "2:username:someemail@gstuff:9df5783228asdasddas796e18cb12e44da:,M|\n"}
#!/usr/bin/env python3

import json


def convert2json(filename):
    with open(filename) as I:
        for line in I:
            d = {"data": line}
            print(json.dumps(d))

if __name__ == "__main__":
    import sys

    convert2json(sys.argv[1])

% python scriptname.py yourfile
{"data": "9:username:someemail@gstuff:eafff17afbef485a894][;'.f6d39c56b79:\n"}
{"data": "254:Starcius:someemail@gstuff:09160da290bcd1f83fssf0bd260e13d4f:\n"}
{"data": "2:username:someemail@gstuff:104b77708bb7c19b9f913449c923a898:8\n"}
{"data": "2:username:someemail@gstuff:efc38fca88d8e58089adccce3e05f93\n"}
{"data": "254:username:someemail@gstuff:880896502dd68b546258\\][;.'54cca34\n"}
{"data": "2:username:someemail@gstuff:647b61ba8f0965e762c579e5b3da9eca:hUr\n"}
{"data": "2:username:someemail@gstuff::3e9478fcecb4e90266art87g8fiuba90c6ed5473c:\\^c\n"}
{"data": "2:username:someemail@gstuff:9df5783228asdasddas796e18cb12e44da:,M|\n"}