Python 多个Csv到json,没有重复的子项
我有一个csv列,如下所示,现在我正在尝试将其转换为JSON中D3所需的name/Children/Size格式。例如,有反复发生的儿童 在name=“type”中有children=“young”,大小=400000Python 多个Csv到json,没有重复的子项,python,json,csv,d3.js,Python,Json,Csv,D3.js,我有一个csv列,如下所示,现在我正在尝试将其转换为JSON中D3所需的name/Children/Size格式。例如,有反复发生的儿童 在name=“type”中有children=“young”,大小=400000 L1 L2 L3 L4 L5 L6 Size Type cars young young young young 40000 Type cars
L1 L2 L3 L4 L5 L6 Size
Type cars young young young young 40000
Type cars student US US US 10000
Type cars student UK UK UK 20000
Type cars Graduates Young India Delhi 20000
Type cars Graduates Old UK London 30000
Type Bike Undergrads CB CB UNC 6000
prime prime prime prime prime prime 600
我得到的结果是:
{
"name": "Segments",
"children": [
{
"name": "Type",
"children": [
{
"name": "cars",
"children": [
{
"name": "young",
"children": [
{
"name": "young",
"children": [
{
"name": "young",
"children": [
{
"name": "young",
"size": "40000"
}
]
}
]
}
]
},
{
"name": "student",
"children": [
{
"name": "US",
"children": [
{
"name": "US",
"children": [
{
"name": "US",
"size": "10000"
}
]
}
]
},
{
"name": "UK",
"children": [
{
"name": "UK",
"children": [
{
"name": "UK",
"size": "20000"
}
]
}
]
}
]
}
]
}
]
},
{
"name": "prime",
"children": [
{
"name": "prime",
"children": [
{
"name": "prime",
"children": [
{
"name": "prime",
"children": [
{
"name": "prime",
"children": [
{
"name": "prime",
"size": "600"
}
]
}
]
}
]
}
]
}
]
}
]
}
预期输出为:
{
"name": "Segments",
"children": [
{
"name": "Type",
"children": [
{
"name": "cars",
"children": [
{
"name": "young",
"size": "40000"
}
]
},
{
"name": "student",
"children": [
{
"name": "US",
"size": "10000"
}
{
"name": "UK",
"size": "20000"
}
]
}
]
},
{
"name": "prime",
"size": "600"
}
]
}
我正在使用以下代码:
import json
import csv
class Node(object):
def __init__(self, name, size=None):
self.name = name
self.children = []
self.size = size
def child(self, cname, size=None):
child_found = [c for c in self.children if c.name == cname]
if not child_found:
_child = Node(cname, size)
self.children.append(_child)
else:
_child = child_found[0]
return _child
def as_dict(self):
res = {'name': self.name}
if self.size is None:
res['children'] = [c.as_dict() for c in self.children]
else:
res['size'] = self.size
return res
root = Node('Segments')
with open('C:\\Users\\G01172472\\Desktop\\Book3.csv', 'r') as f:
reader = csv.reader(f)
p = list(reader)
for row in range(1, len(p)):
grp1, grp2, grp3, grp4, grp5, grp6, size = p[row]
root.child(grp1).child(grp2).child(grp3).child(grp4).child(grp5).child(grp6, size)
print(json.dumps(root.as_dict(), indent=4))
因此,您首先要做的是从每一行中删除重复项,并相应地创建子项 以下是我所改变的:
with open('C:\\Users\\G01172472\\Desktop\\Book3.csv', 'r') as f:
reader = csv.reader(f)
p = list(reader)
for row in range(1, len(p)):
temp = []
for x in p[row]:
if x not in temp:
temp.append(x)
#Create a temporary list of the row but keep only unique elements
## Additional code according to your dictionary structure
#if row != 1:
# if 'cars' in temp:
# temp.remove('cars')
# elif 'Bike' in temp:
# temp.remove('Bike')
# Create a string to which will look similar to root.child(grp1)...
evalStr = 'root'
for i in range(len(temp)):
if i == len(temp)-2:
evalStr += '.child("' + temp[i] + '","' + temp[-1] + '")'
else:
evalStr += '.child("' + temp[i] + '")'
# eval(string) will evaluate the string as python code
eval(evalStr)
print(json.dumps(root.as_dict(),indent=2))
让我知道这是否有效。首先,您需要从行中删除DUP。这可以通过以下方式实现:
p[row] = ('Type', 'cars', 'young', 'young', 'young', 'young', 'Size')
pp = set()
new_p_row = [el for el in p[row] if not (el in pp or pp.add(el))]
# ['Type', 'cars', 'young', 'Size']
然后将children添加到根中,直到最后两个
for r in new_p_row[:-2]:
root.child(r)
将最后一个子项添加到根中,大小为
root.child(new_p_row[-2], new_p_row[-1])
删除那些额外的列怎么样?这样您的代码就可以按预期工作。@BcK我无法删除这些额外的列,因为在其他行中,它可能对所有列都有唯一的值。我已经编辑了csv,您现在可以查看了。我尝试了这种方法,但这里的问题是,如果在grp1中捕获的子对象中有一些子对象是唯一的,那么这些子对象就是唯一的itself@CyleySimon好吧,我有点明白你想做什么。请查看更新的答案。我如何在这里用json打印sresultformat@CyleySimon打印(json.dumps(root.as_dict(),indent=4))我试过了,但它只是将{“name”:“Segments”,“children”:[]}打开('C:\\Users\\G01172472\\Desktop\\Book3.csv','r')打印为p:reader=csv.reader(p)p=list(reader)pp=set()new_p_row=[el for el in p if not(el in pp或pp.add(el))]for r in new_p_row[:-2]:root.child(r)root.child(new_p_row[-2],new_p_row[-1]我使用过this@CyleySimon小心,您应该在p[row]中为el编写
el
,而不是p中el的el。如果这解决了您的问题,请告诉我。但是我如何在p[row]@CyleySimon中获取csv数据。您的代码中已经有了行范围()中的行:p[row]
,不是吗?