Python 多个Csv到json,没有重复的子项

Python 多个Csv到json,没有重复的子项,python,json,csv,d3.js,Python,Json,Csv,D3.js,我有一个csv列,如下所示,现在我正在尝试将其转换为JSON中D3所需的name/Children/Size格式。例如,有反复发生的儿童 在name=“type”中有children=“young”,大小=400000 L1 L2 L3 L4 L5 L6 Size Type cars young young young young 40000 Type cars

我有一个csv列,如下所示,现在我正在尝试将其转换为JSON中D3所需的name/Children/Size格式。例如,有反复发生的儿童 在name=“type”中有children=“young”,大小=400000

L1       L2     L3         L4        L5      L6          Size
Type    cars    young      young     young   young      40000
Type    cars    student    US        US      US         10000
Type    cars    student    UK        UK      UK         20000
Type    cars    Graduates  Young    India    Delhi      20000
Type    cars    Graduates  Old      UK       London     30000
Type    Bike    Undergrads CB       CB       UNC        6000
prime   prime   prime      prime    prime   prime       600
我得到的结果是:

{
    "name": "Segments",
    "children": [
        {
            "name": "Type",
            "children": [
                {
                    "name": "cars",
                    "children": [
                        {
                            "name": "young",
                            "children": [
                                {
                                    "name": "young",
                                    "children": [
                                        {
                                            "name": "young",
                                            "children": [
                                                {
                                                    "name": "young",
                                                    "size": "40000"
                                                }
                                            ]
                                        }
                                    ]
                                }
                            ]
                        },
                        {
                            "name": "student",
                            "children": [
                                {
                                    "name": "US",
                                    "children": [
                                        {
                                            "name": "US",
                                            "children": [
                                                {
                                                    "name": "US",
                                                    "size": "10000"
                                                }
                                            ]
                                        }
                                    ]
                                },
                                {
                                    "name": "UK",
                                    "children": [
                                        {
                                            "name": "UK",
                                            "children": [
                                                {
                                                    "name": "UK",
                                                    "size": "20000"
                                                }
                                            ]
                                        }
                                    ]
                                }
                            ]
                        }
                    ]
                }
            ]
        },
        {
            "name": "prime",
            "children": [
                {
                    "name": "prime",
                    "children": [
                        {
                            "name": "prime",
                            "children": [
                                {
                                    "name": "prime",
                                    "children": [
                                        {
                                            "name": "prime",
                                            "children": [
                                                {
                                                    "name": "prime",
                                                    "size": "600"
                                                }
                                            ]
                                        }
                                    ]
                                }
                            ]
                        }
                    ]
                }
            ]
        }
    ]
}
预期输出为:

{
    "name": "Segments",
    "children": [
        {
            "name": "Type",
            "children": [
                {
                    "name": "cars",
                    "children": [
                        {
                        "name": "young",
                        "size": "40000"
                        }
                                ]


                        },
                        {
                            "name": "student",
                            "children": [
                                {
                                 "name": "US",
                                 "size": "10000"
                                }

                                {
                                "name": "UK",
                                "size": "20000"
                                }
                                            ]
                                        }
                                    ]
                                },

        {
        "name": "prime",
        "size": "600"
        }
        ]
        }
我正在使用以下代码:

import json
import csv

class Node(object):
    def __init__(self, name, size=None):
        self.name = name
        self.children = []
        self.size = size

    def child(self, cname, size=None):
        child_found = [c for c in self.children if c.name == cname]
        if not child_found:
            _child = Node(cname, size)
            self.children.append(_child)
        else:
            _child = child_found[0]
        return _child

    def as_dict(self):
        res = {'name': self.name}
        if self.size is None:
            res['children'] = [c.as_dict() for c in self.children]
        else:
            res['size'] = self.size
        return res


root = Node('Segments')

with open('C:\\Users\\G01172472\\Desktop\\Book3.csv', 'r') as f:
    reader = csv.reader(f)
    p = list(reader)
    for row in range(1, len(p)):
        grp1, grp2, grp3, grp4, grp5, grp6, size = p[row]
        root.child(grp1).child(grp2).child(grp3).child(grp4).child(grp5).child(grp6, size)

print(json.dumps(root.as_dict(), indent=4))

因此,您首先要做的是从每一行中删除重复项,并相应地创建子项

以下是我所改变的:

with open('C:\\Users\\G01172472\\Desktop\\Book3.csv', 'r') as f:
    reader = csv.reader(f)
    p = list(reader)
    for row in range(1, len(p)):
        temp = []  
        for x in p[row]:
            if x not in temp:
                temp.append(x) 
                #Create a temporary list of the row but keep only unique elements

        ## Additional code according to your dictionary structure
        #if row != 1:
        #    if 'cars' in temp:
        #       temp.remove('cars')
        #   elif 'Bike' in temp:
        #       temp.remove('Bike')


        # Create a string to which will look similar to root.child(grp1)...
        evalStr = 'root'
        for i in range(len(temp)):
            if i == len(temp)-2:
                evalStr += '.child("' + temp[i] + '","' + temp[-1] + '")'
            else:
                evalStr += '.child("' + temp[i] + '")'

        # eval(string) will evaluate the string as python code
        eval(evalStr)

print(json.dumps(root.as_dict(),indent=2))

让我知道这是否有效。

首先,您需要从行中删除DUP。这可以通过以下方式实现:

p[row] = ('Type', 'cars', 'young', 'young', 'young', 'young', 'Size')
pp = set()

new_p_row = [el for el in p[row] if not (el in pp or pp.add(el))]
# ['Type', 'cars', 'young', 'Size']
然后将children添加到根中,直到最后两个

for r in new_p_row[:-2]:
    root.child(r)
将最后一个子项添加到根中,大小为

root.child(new_p_row[-2], new_p_row[-1])

删除那些额外的列怎么样?这样您的代码就可以按预期工作。@BcK我无法删除这些额外的列,因为在其他行中,它可能对所有列都有唯一的值。我已经编辑了csv,您现在可以查看了。我尝试了这种方法,但这里的问题是,如果在grp1中捕获的子对象中有一些子对象是唯一的,那么这些子对象就是唯一的itself@CyleySimon好吧,我有点明白你想做什么。请查看更新的答案。我如何在这里用json打印sresultformat@CyleySimon打印(json.dumps(root.as_dict(),indent=4))我试过了,但它只是将{“name”:“Segments”,“children”:[]}打开('C:\\Users\\G01172472\\Desktop\\Book3.csv','r')打印为p:reader=csv.reader(p)p=list(reader)pp=set()new_p_row=[el for el in p if not(el in pp或pp.add(el))]for r in new_p_row[:-2]:root.child(r)root.child(new_p_row[-2],new_p_row[-1]我使用过this@CyleySimon小心,您应该在p[row]中为el编写
el
,而不是p中el的
el。如果这解决了您的问题,请告诉我。但是我如何在p[row]@CyleySimon中获取csv数据。您的代码中已经有了行范围()中的行
:p[row]
,不是吗?