Python 将纯文本结构化为JSON_Python_Ruby_Json_D3.js_Dendrogram

Python 将纯文本结构化为JSON

python ruby json d3.js

Python 将纯文本结构化为JSON,python,ruby,json,d3.js,dendrogram,Python,Ruby,Json,D3.js,Dendrogram,我正在尝试获取字符串集合，将字符串标记化转换为单个字符，并将其重新构造为JSON，以便构建聚类树状图可视化（有点类似，除了字符串而不是句子）。因此，有时字符序列在数据中共享（或重新共享）例如，假设我有一个文本文件，看起来像： xin_qn2 x_qing4n3 x_qing4nian_ 这是我对我的投入的全部期望；没有CSV标题或与数据相关的任何内容。JSON对象的外观如下所示： { "name": "x", "children": [ {

我正在尝试获取字符串集合，将字符串标记化转换为单个字符，并将其重新构造为JSON，以便构建聚类树状图可视化（有点类似，除了字符串而不是句子）。因此，有时字符序列在数据中共享（或重新共享）

例如，假设我有一个文本文件，看起来像：

xin_qn2
x_qing4n3
x_qing4nian_

这是我对我的投入的全部期望；没有CSV标题或与数据相关的任何内容。JSON对象的外观如下所示：

{
    "name": "x",
    "children": [
        {
            "name": i,
        },
        {
            "name": _,
            "children": [
                {
                    "name": "q"
                }
            ]
        }
    ]
}

等等。在将数据发送到D3.js之前，我一直在尝试提前构建数据结构，使用Ruby将行分割为单个字符，但我一直在尝试如何在分层JSON中构建数据结构

file_contents = File.open("single.txt", "r")

file_contents.readlines.each do |line|
  parse = line.scan(/[A-Za-z][^A-Za-z]*/)
  puts parse
end

我可以用d3.js在浏览器中实现这一点，但我还没有尝试过

只是想知道是否有任何建议、指针或现有的工具/脚本可以帮助我。谢谢

更新2014-10-02

所以我花了一点时间在Python中尝试这一点，但我一直被卡住。我现在明白了，我也没有正确处理“children”元素。有什么建议吗

尝试一次

#!/usr/bin/python

from collections import defaultdict
import json

def tree():
    return defaultdict(tree)

file_out = open('out.txt', 'wb')

nested = defaultdict(tree)

with open("single.txt") as f:
    for line in f:
        o = list(line)
        char_lst = []
        for chars in o:
            d = {}
            d['name']=chars
            char_lst.append(d)
        for word in d:
            node = nested
            for char in word:
                node = node[char.lower()]
                print node

print(json.dumps(nested))

尝试两次

#!/usr/bin/python

from collections import defaultdict
import json

def tree():
    return defaultdict(tree)

nested = defaultdict(tree)

words = list(open("single.txt"))
words_output = open("out.json", "wb")

for word in words:
    node = nested
    for char in word:
        node = node[char.lower()]

def print_nested(d, indent=0):
  for k, v in d.iteritems():
    print '{}{!r}:'.format(indent * ' ', k)
    print_nested(v, indent + 1)

print_nested(nested)

你的第二次尝试就快成功了。将

json.dumps（嵌套）

添加到末尾将打印以下json：

很接近，但不是你想要的。顺便说一下，您还可以使用以下函数将嵌套的defaultdict转换为常规dict：

def convert(d):
    return dict((key, convert(value)) for (key,value) in d.iteritems()) if isinstance(d, defaultdict) else d

但我们仍然只有一份口述（口述…）。使用递归，我们可以将其转换为您所需的格式，如下所示：

def format(d):
    children = []
    for (key, value) in d.iteritems():
        children += [{"name":key, "children":format(value)}]
    return children

最后，让我们打印出json：

print json.dumps(format(convert(nested)))

这将打印以下JSON（格式清晰）：

以下是完整的代码：

#!/usr/bin/python

from collections import defaultdict
import json

def tree():
    return defaultdict(tree)

nested = defaultdict(tree)

words = open("single.txt").read().splitlines()
words_output = open("out.json", "wb")

for word in words:
    node = nested
    for char in word:
        node = node[char.lower()]

def convert(d):
    return dict((key, convert(value)) for (key,value) in d.iteritems()) if isinstance(d, defaultdict) else d

def format(d):
    children = []
    for (key, value) in d.iteritems():
        children += [{"name":key, "children":format(value)}]
    return children

print json.dumps(format(convert(nested)))

你需要制作一堆字典，然后将它们存储在列表中。我不能说Ruby，但是Python让这变得非常简单。谢谢，OrionMelt！这太完美了。非常感谢。

#!/usr/bin/python

from collections import defaultdict
import json

def tree():
    return defaultdict(tree)

nested = defaultdict(tree)

words = open("single.txt").read().splitlines()
words_output = open("out.json", "wb")

for word in words:
    node = nested
    for char in word:
        node = node[char.lower()]

def convert(d):
    return dict((key, convert(value)) for (key,value) in d.iteritems()) if isinstance(d, defaultdict) else d

def format(d):
    children = []
    for (key, value) in d.iteritems():
        children += [{"name":key, "children":format(value)}]
    return children

print json.dumps(format(convert(nested)))