从d3可折叠树的两列数据帧(Python3)创建json层次结构树

从d3可折叠树的两列数据帧(Python3)创建json层次结构树,json,python-3.x,dictionary,tree,Json,Python 3.x,Dictionary,Tree,我有一个包含两列的数据框:Employee和Reports\u to。每个员工都向某个人汇报,一直到首席执行官。我想将其转换为一个json文件,该文件可以被可折叠的d3树使用(根据这个伟大的链接:)。这将形成一个很好的组织结构图,只需很少/不需要手动操作即可保持最新 我已经能够将df转换为正确的json格式,如下面的简单示例所示。然而,我在Excel中非常痛苦地完成了这项工作,然后将.append字符串复制并粘贴到Jupyter(!)中这是我的问题:在Python3中有没有一种优雅的方式将2列

我有一个包含两列的数据框:Employee和Reports\u to。每个员工都向某个人汇报,一直到首席执行官。我想将其转换为一个json文件,该文件可以被可折叠的d3树使用(根据这个伟大的链接:)。这将形成一个很好的组织结构图,只需很少/不需要手动操作即可保持最新

我已经能够将df转换为正确的json格式,如下面的简单示例所示。然而,我在Excel中非常痛苦地完成了这项工作,然后将
.append
字符串复制并粘贴到Jupyter(!)中这是我的问题:在Python3中有没有一种优雅的方式将2列df转换成所需的dict?

import numpy as np
import pandas as pd
import json

#_Lx refers to the level in the organisation, where Jackie_L1 is the CEO
df = pd.DataFrame(np.array([
['Jo_L3','Jane_L2'],
['Jon_L3','Jane_L2'],
['James_L3','Jerry_L2'],
['Joan_L3','Jerry_L2'],
['Jane_L2','Jackie_L1'],
['Jerry_L2','Jackie_L1'],
['Jill_L2','Jackie_L1']]))
df.columns = ['Employee','Reports_to']
df 

Employee    Reports_to
Jo_L3       Jane_L2
Jon_L3      Jane_L2
James_L3    Jerry_L2
Joan_L3     Jerry_L2
Jane_L2     Jackie_L1
Jerry_L2    Jackie_L1
Jill_L2     Jackie_L1

#start with the root node and work over to the right (down the organisation) to provide the required json:
tree = {'parent': 'null', 'name': 'Jackie_L1', 'edge_name': 'Jackie_L1', 'children': []}

tree['children'].append({'parent': 'Jackie_L1', 'name': 'Jane_L2', 'edge_name': 'Jane_L2', 'children': []})
tree['children'].append({'parent': 'Jackie_L1', 'name': 'Jerry_L2', 'edge_name': 'Jerry_L2', 'children': []})
tree['children'].append({'parent': 'Jackie_L1', 'name': 'Jill_L2', 'edge_name': 'Jill_L2', 'children': []})

tree['children'][0]['children'].append({'parent': 'Jane_L2', 'name': 'Jo_L3', 'edge_name': 'Jo_L3', 'children': []})
tree['children'][0]['children'].append({'parent': 'Jane_L2', 'name': 'Jon_L3', 'edge_name': 'Jon_L3', 'children': []})
tree['children'][1]['children'].append({'parent': 'Jerry_L2', 'name': 'James_L3', 'edge_name': 'James_L3', 'children': []})
tree['children'][1]['children'].append({'parent': 'Jerry_L2', 'name': 'Joan_L3', 'edge_name': 'Joan_L3', 'children': []})
以下是d3树所需的结果dict:

{'parent': 'null',
 'name': 'Jackie_L1',
 'edge_name': 'Jackie_L1',
 'children': [{'parent': 'Jackie_L1',
   'name': 'Jane_L2',
   'edge_name': 'Jane_L2',
   'children': [{'parent': 'Jane_L2',
     'name': 'Jo_L3',
     'edge_name': 'Jo_L3',
     'children': []},
    {'parent': 'Jane_L2',
     'name': 'Jon_L3',
     'edge_name': 'Jon_L3',
     'children': []}]},
  {'parent': 'Jackie_L1',
   'name': 'Jerry_L2',
   'edge_name': 'Jerry_L2',
   'children': [{'parent': 'Jerry_L2',
     'name': 'James_L3',
     'edge_name': 'James_L3',
     'children': []},
    {'parent': 'Jerry_L2',
     'name': 'Joan_L3',
     'edge_name': 'Joan_L3',
     'children': []}]},
  {'parent': 'Jackie_L1',
   'name': 'Jill_L2',
   'edge_name': 'Jill_L2',
   'children': []}]}
我将
tree
转换为json文件,如下所示:

with open('C:/Python37/input_graph_tree.json', 'w') as outfile:
    json.dump(tree, outfile)

上面的链接提供了在桌面上运行可折叠树的说明,尽管您需要使用
python-mhttp.server8080
来启动它,而不是
python-msimplehttpserver 8080

多亏了Jonathan Eunice,我找到了一种方法


…并按照说明在浏览器中根据OP显示d3树。这将为您提供OP中所示的树。如果树太大,这将失败,但您可以一次创建一个主分支,并在最后将它们合并到一棵树中(尽管不容易)。

我找到了一种方法,感谢Jonathan Eunice

…并按照说明在浏览器中根据OP显示d3树。这将为您提供OP中所示的树。如果树太大,这将失败,但您可以一次创建一个主分支,并在最后将它们合并到一棵树中(尽管不容易)

#using the df example above, add a row for the top person:
lastrow = len(df)
df.loc[lastrow] = np.nan
df.loc[lastrow, 'Employee'] = 'Jackie_L1'
df.loc[lastrow, 'Reports_to'] = '' #top person does not report to anyone

#create a new column called eid that is a copy of Employee (to make the 'buildtree' function below work):
df['eid'] = df['Employee']
df = df[['eid', 'Employee', 'Reports_to']] #get the order right

#then run this
from pprint import pprint
from collections import defaultdict

def show_val(title, val):
    sep = '-' * len(title)
    print ("\n{0}\n{1}\n{2}\n".format(sep, title, sep))
    pprint(val)

def buildtree(t=None, parent_eid=''):
    """
    Given a parents lookup structure, construct
    a data hierarchy.
    """
    parent = parents.get(parent_eid, None)
    if parent is None:
        return t
    for eid, name, mid in parent:
        if mid == '': report = {'parent': 'null', 'name': name, 'edge_name': name }
        else : report = {'parent': mid, 'name': name, 'edge_name': name }
        if t is None:
            t = report
        else:
            reports = t.setdefault('children', [])
            reports.append(report)
        buildtree(report, eid)
    return t

people = list(df.itertuples(index=False, name=None))

parents = defaultdict(list)
for p in people:
    parents[p[2]].append(p)
tree = buildtree()
show_val("data", tree)

#which gives you:
----
data
----

{'children': [{'children': [{'edge_name': 'Jo_L3',
                             'name': 'Jo_L3',
                             'parent': 'Jane_L2'},
                            {'edge_name': 'Jon_L3',
                             'name': 'Jon_L3',
                             'parent': 'Jane_L2'}],
               'edge_name': 'Jane_L2',
               'name': 'Jane_L2',
               'parent': 'Jackie_L1'},
              {'children': [{'edge_name': 'James_L3',
                             'name': 'James_L3',
                             'parent': 'Jerry_L2'},
                            {'edge_name': 'Joan_L3',
                             'name': 'Joan_L3',
                             'parent': 'Jerry_L2'}],
               'edge_name': 'Jerry_L2',
               'name': 'Jerry_L2',
               'parent': 'Jackie_L1'},
              {'edge_name': 'Jill_L2',
               'name': 'Jill_L2',
               'parent': 'Jackie_L1'}],
 'edge_name': 'Jackie_L1',
 'name': 'Jackie_L1',
 'parent': 'null'}

#then write to json:
with open('C:/Python37/input_graph_tree.json', 'w') as outfile:
    json.dump(tree, outfile)