Python 将平面制表符分隔的文件转换为Json嵌套结构
我需要将以下格式的平面文件转换为JSON格式。输入和输出如下图所示。我遇到过这样一个问题:然而,我有一个额外的信息/字段Python 将平面制表符分隔的文件转换为Json嵌套结构,python,json,Python,Json,我需要将以下格式的平面文件转换为JSON格式。输入和输出如下图所示。我遇到过这样一个问题:然而,我有一个额外的信息/字段level,用于确定JSON输出中的嵌套结构。Pythonpandas确实有df.to_json,但找不到以所需输出格式编写的方法。任何帮助都将不胜感激 输入: name level children size aaa 7 aaab 2952 aaa 7 aaac 251 aaa 7 aaad 222 aaab 8 xxx
level
,用于确定JSON输出中的嵌套结构。Pythonpandas
确实有df.to_json
,但找不到以所需输出格式编写的方法。任何帮助都将不胜感激
输入:
name level children size
aaa 7 aaab 2952
aaa 7 aaac 251
aaa 7 aaad 222
aaab 8 xxx 45
aaab 8 xxy 29
aaab 8 xxz 28
aaab 8 xxa 4
aaac 8 ddd 7
aaac 8 xxt 4
aaac 8 xxu 1
aaac 8 xxv 1
ddd 9 ppp 4
ddd 9 qqq 2
输出:
{
"name": "aaa",
"size": 5000,
"children":
[
{
"name": "aaab",
"size": 2952,
"children": [
{"name": "xxx", "size": 45},
{"name": "xxy", "size": 29},
{"name": "xxz", "size": 28},
{"name": "xxa", "size": 4}
]
},
{
"name": "aaac",
"size": 251,
"children": [
{
"name": "ddd",
"size": 7,
"children": [
{"name": "ppp", "size": 4},
{"name": "qqq", "size": 2}
]
},
{"name": "xxt", "size": 4},
{"name": "xxu", "size": 1},
{"name": "xxv", "size": 1}
]
},
{"name": "aaad","size": 222}
]
}
使用两次通过的方法非常简单:首先,为每条线构造一个节点。然后,将每个节点连接到其子节点
with open("data.txt") as file:
lines = file.read().split("\n")
#remove header line.
lines = lines[1:]
entries = {}
#create an entry for each child node.
for line in lines:
name, level, child, size = line.split()
entries[child] = {"name": child, "size": int(size), "children": []}
#we now have an entry for all nodes that are a child of another node.
#but not for the topmost parent node, so we'll make one for it now.
parents = set(line.split()[0] for line in lines)
children = set(line.split()[2] for line in lines)
top_parent = (parents - children).pop()
#(just guess the size, since it isn't supplied in the file)
entries[top_parent] = {"name": top_parent, "size": 5000, "children": []}
#hook up each entry to its children
for line in lines:
name, level, child, size = line.split()
entries[name]["children"].append(entries[child])
#the nested structure is ready to use!
structure = entries[top_parent]
#display the beautiful result
import pprint
pprint.pprint(structure)
结果:
{'children': [{'children': [{'children': [], 'name': 'xxx', 'size': 45},
{'children': [], 'name': 'xxy', 'size': 29},
{'children': [], 'name': 'xxz', 'size': 28},
{'children': [], 'name': 'xxa', 'size': 4}],
'name': 'aaab',
'size': 2952},
{'children': [{'children': [{'children': [],
'name': 'ppp',
'size': 4},
{'children': [],
'name': 'qqq',
'size': 2}],
'name': 'ddd',
'size': 7},
{'children': [], 'name': 'xxt', 'size': 4},
{'children': [], 'name': 'xxu', 'size': 1},
{'children': [], 'name': 'xxv', 'size': 1}],
'name': 'aaac',
'size': 251},
{'children': [], 'name': 'aaad', 'size': 222}],
'name': 'aaa',
'size': 5000}
{'children': [{'children': [{'name': 'xxx', 'size': 45},
{'name': 'xxy', 'size': 29},
{'name': 'xxz', 'size': 28},
{'name': 'xxa', 'size': 4}],
'name': 'aaab',
'size': 2952},
{'children': [{'children': [{'name': 'ppp', 'size': 4},
{'name': 'qqq', 'size': 2}],
'name': 'ddd',
'size': 7},
{'name': 'xxt', 'size': 4},
{'name': 'xxu', 'size': 1},
{'name': 'xxv', 'size': 1}],
'name': 'aaac',
'size': 251},
{'name': 'aaad', 'size': 222}],
'name': 'aaa',
'size': 5000}
编辑:您可以使用
del
语句从叶节点中删除子属性
#execute this after the "hook up each entry to its children" section.
#remove "children" from leaf nodes.
for entry in entries.itervalues():
if not entry["children"]:
del entry["children"]
结果:
{'children': [{'children': [{'children': [], 'name': 'xxx', 'size': 45},
{'children': [], 'name': 'xxy', 'size': 29},
{'children': [], 'name': 'xxz', 'size': 28},
{'children': [], 'name': 'xxa', 'size': 4}],
'name': 'aaab',
'size': 2952},
{'children': [{'children': [{'children': [],
'name': 'ppp',
'size': 4},
{'children': [],
'name': 'qqq',
'size': 2}],
'name': 'ddd',
'size': 7},
{'children': [], 'name': 'xxt', 'size': 4},
{'children': [], 'name': 'xxu', 'size': 1},
{'children': [], 'name': 'xxv', 'size': 1}],
'name': 'aaac',
'size': 251},
{'children': [], 'name': 'aaad', 'size': 222}],
'name': 'aaa',
'size': 5000}
{'children': [{'children': [{'name': 'xxx', 'size': 45},
{'name': 'xxy', 'size': 29},
{'name': 'xxz', 'size': 28},
{'name': 'xxa', 'size': 4}],
'name': 'aaab',
'size': 2952},
{'children': [{'children': [{'name': 'ppp', 'size': 4},
{'name': 'qqq', 'size': 2}],
'name': 'ddd',
'size': 7},
{'name': 'xxt', 'size': 4},
{'name': 'xxu', 'size': 1},
{'name': 'xxv', 'size': 1}],
'name': 'aaac',
'size': 251},
{'name': 'aaad', 'size': 222}],
'name': 'aaa',
'size': 5000}
使用两次通过的方法非常简单:首先,为每条线构造一个节点。然后,将每个节点连接到其子节点
with open("data.txt") as file:
lines = file.read().split("\n")
#remove header line.
lines = lines[1:]
entries = {}
#create an entry for each child node.
for line in lines:
name, level, child, size = line.split()
entries[child] = {"name": child, "size": int(size), "children": []}
#we now have an entry for all nodes that are a child of another node.
#but not for the topmost parent node, so we'll make one for it now.
parents = set(line.split()[0] for line in lines)
children = set(line.split()[2] for line in lines)
top_parent = (parents - children).pop()
#(just guess the size, since it isn't supplied in the file)
entries[top_parent] = {"name": top_parent, "size": 5000, "children": []}
#hook up each entry to its children
for line in lines:
name, level, child, size = line.split()
entries[name]["children"].append(entries[child])
#the nested structure is ready to use!
structure = entries[top_parent]
#display the beautiful result
import pprint
pprint.pprint(structure)
结果:
{'children': [{'children': [{'children': [], 'name': 'xxx', 'size': 45},
{'children': [], 'name': 'xxy', 'size': 29},
{'children': [], 'name': 'xxz', 'size': 28},
{'children': [], 'name': 'xxa', 'size': 4}],
'name': 'aaab',
'size': 2952},
{'children': [{'children': [{'children': [],
'name': 'ppp',
'size': 4},
{'children': [],
'name': 'qqq',
'size': 2}],
'name': 'ddd',
'size': 7},
{'children': [], 'name': 'xxt', 'size': 4},
{'children': [], 'name': 'xxu', 'size': 1},
{'children': [], 'name': 'xxv', 'size': 1}],
'name': 'aaac',
'size': 251},
{'children': [], 'name': 'aaad', 'size': 222}],
'name': 'aaa',
'size': 5000}
{'children': [{'children': [{'name': 'xxx', 'size': 45},
{'name': 'xxy', 'size': 29},
{'name': 'xxz', 'size': 28},
{'name': 'xxa', 'size': 4}],
'name': 'aaab',
'size': 2952},
{'children': [{'children': [{'name': 'ppp', 'size': 4},
{'name': 'qqq', 'size': 2}],
'name': 'ddd',
'size': 7},
{'name': 'xxt', 'size': 4},
{'name': 'xxu', 'size': 1},
{'name': 'xxv', 'size': 1}],
'name': 'aaac',
'size': 251},
{'name': 'aaad', 'size': 222}],
'name': 'aaa',
'size': 5000}
编辑:您可以使用del
语句从叶节点中删除子属性
#execute this after the "hook up each entry to its children" section.
#remove "children" from leaf nodes.
for entry in entries.itervalues():
if not entry["children"]:
del entry["children"]
结果:
{'children': [{'children': [{'children': [], 'name': 'xxx', 'size': 45},
{'children': [], 'name': 'xxy', 'size': 29},
{'children': [], 'name': 'xxz', 'size': 28},
{'children': [], 'name': 'xxa', 'size': 4}],
'name': 'aaab',
'size': 2952},
{'children': [{'children': [{'children': [],
'name': 'ppp',
'size': 4},
{'children': [],
'name': 'qqq',
'size': 2}],
'name': 'ddd',
'size': 7},
{'children': [], 'name': 'xxt', 'size': 4},
{'children': [], 'name': 'xxu', 'size': 1},
{'children': [], 'name': 'xxv', 'size': 1}],
'name': 'aaac',
'size': 251},
{'children': [], 'name': 'aaad', 'size': 222}],
'name': 'aaa',
'size': 5000}
{'children': [{'children': [{'name': 'xxx', 'size': 45},
{'name': 'xxy', 'size': 29},
{'name': 'xxz', 'size': 28},
{'name': 'xxa', 'size': 4}],
'name': 'aaab',
'size': 2952},
{'children': [{'children': [{'name': 'ppp', 'size': 4},
{'name': 'qqq', 'size': 2}],
'name': 'ddd',
'size': 7},
{'name': 'xxt', 'size': 4},
{'name': 'xxu', 'size': 1},
{'name': 'xxv', 'size': 1}],
'name': 'aaac',
'size': 251},
{'name': 'aaad', 'size': 222}],
'name': 'aaa',
'size': 5000}
您如何确定“aaa”的尺码为5000?您如何确定“aaa”的尺码为5000?谢谢@Kevin。输出不是所需的格式。这里有两件事是不受欢迎的:1。带有子项的项:[]
不需要出现在输出和2中。排序已更改。您是指属性名称/大小/子级的排序吗?Python中的dict本质上是无序的,所以我的解释器的一个怪癖就是它是以这种方式打印的。如果我尝试的话,我无法更改订单。谢谢@Kevin。输出不是所需的格式。这里有两件事是不受欢迎的:1。带有子项的项:[]
不需要出现在输出和2中。排序已更改。您是指属性名称/大小/子级的排序吗?Python中的dict本质上是无序的,所以我的解释器的一个怪癖就是它是以这种方式打印的。如果我尝试的话,我无法更改订单。