如何从Python中的数据帧创建嵌套JSON
我有一个包含Windows10日志的数据框。我想把这个文件转换成JSON。做这件事的有效方法是什么 我已经生成了一个默认的df,但是它不是嵌套的。我多么想要它如何从Python中的数据帧创建嵌套JSON,python,json,python-3.x,pandas,dataframe,Python,Json,Python 3.x,Pandas,Dataframe,我有一个包含Windows10日志的数据框。我想把这个文件转换成JSON。做这件事的有效方法是什么 我已经生成了一个默认的df,但是它不是嵌套的。我多么想要它 { "0": { "ProcessName": "Firefox", "time": "2019-07-12T00:00:00", "timeFloat": 1562882400.0, "internal_time": 0.0, "counter":
{
"0": {
"ProcessName": "Firefox",
"time": "2019-07-12T00:00:00",
"timeFloat": 1562882400.0,
"internal_time": 0.0,
"counter": 0
},
"1": {
"ProcessName": "Excel",
"time": "2019-07-12T00:00:00",
"timeFloat": 1562882400.0,
"internal_time": 0.0,
"counter": 0
},
"2": {
"ProcessName": "Word",
"time": "2019-07-12T01:30:00",
"timeFloat": 1562888000.0,
"internal_time": 1.5533333333,
"counter": 0
}
我希望它看起来像这样
{
"0": {
"time": "2019-07-12T00:00:00",
"timeFloat": 1562882400.0,
"internal_time": 0.0,
"Processes" : {
"Firefox" : 0 # ("counter" value),
"Excel" : 0
},
"1": ...
}
据我所知,您需要按“时间”对对象进行分组,并合并来自不同进程的计数器。如果是-以下是实施示例:
input_data = {
"0": {
"ProcessName": "Firefox",
"time": "2019-07-12T00:00:00",
"timeFloat": 1562882400.0,
"internal_time": 0.0,
"counter": 0
},
"2": {
"ProcessName": "ZXC",
"time": "2019-07-12T00:00:00",
"timeFloat": 1562882400.0,
"internal_time": 0.0,
"counter": 0
},
"3": {
"ProcessName": "QWE",
"time": "else_time",
"timeFloat": 1562882400.0,
"internal_time": 0.0,
"counter": 0
}
}
def group_input_data_by_time(dict_data):
time_data = {}
for value_dict in dict_data.values():
counter = value_dict["counter"]
process_name = value_dict["ProcessName"]
time_ = value_dict["time"]
common_data = {
"time": time_,
"timeFloat": value_dict["timeFloat"],
"internal_time": value_dict["internal_time"],
}
common_data = time_data.setdefault(time_, common_data)
processes = common_data.setdefault("Processes", {})
processes[process_name] = counter
# if required to change keys from time to enumerated
result_dict = {}
for ind, value in enumerate(time_data.values()):
result_dict[str(ind)] = value
return result_dict
print(group_input_data_by_time(input_data))
结果是:
{
"0": {
"time": "2019-07-12T00:00:00",
"timeFloat": 1562882400.0,
"internal_time": 0.0,
"Processes": {
"Firefox": 0,
"ZXC": 0
}
},
"1": {
"time": "else_time",
"timeFloat": 1562882400.0,
"internal_time": 0.0,
"Processes": {
"QWE": 0
}
}
}
在我看来,您似乎希望基于
['time',timeFloat',internal_time']
从聚合数据创建JSON,您可以这样做:
pd.groupby(['time', 'timeFloat', 'internal_time'])
但是,您的示例表明您希望维护索引键(“0”、“1”
,等等),这与前面所述的意图相反
来自一个时间点的聚合值:
"Firefox" : 0
"Excel" : 0
似乎与这些索引键相对应,这些索引键在进行聚合时将丢失
但是,如果您决定使用聚合,代码将如下所示:
# reading in data:
import pandas as pd
import json
json_data = {
"0": {
"ProcessName": "Firefox",
"time": "2019-07-12T00:00:00",
"timeFloat": 1562882400.0,
"internal_time": 0.0,
"counter": 0
},
"1": {
"ProcessName": "Excel",
"time": "2019-07-12T00:00:00",
"timeFloat": 1562882400.0,
"internal_time": 0.0,
"counter": 0
},
"2": {
"ProcessName": "Word",
"time": "2019-07-12T01:30:00",
"timeFloat": 1562888000.0,
"internal_time": 1.5533333333,
"counter": 0
}}
df = pd.DataFrame.from_dict(json_data)
df = df.T
df.set_index(["ProcessName", 'time', 'timeFloat', 'internal_time', 'counter'])
# processing:
ddf = df.groupby(['time', 'timeFloat', 'internal_time'], as_index=False).agg(lambda x: list(x))
ddf['Processes'] = ddf.apply(lambda r: dict(zip(r['ProcessName'], r['counter'])), axis=1)
ddf = ddf.drop(['ProcessName', 'counter'], axis=1).
# printing the result:
json2 = json.loads(ddf.to_json(orient="records"))
print(json.dumps(json2, indent=4, sort_keys=True))
结果:
[
{
"Processes": {
"Excel": 0,
"Firefox": 0
},
"internal_time": 0.0,
"time": "2019-07-12T00:00:00",
"timeFloat": 1562882400.0
},
{
"Processes": {
"Word": 0
},
"internal_time": 1.5533333333,
"time": "2019-07-12T01:30:00",
"timeFloat": 1562888000.0
}
]
这并没有利用数据在
dataframe中这一事实。与基于pandas
的解决方案相比,使用更多数据时,它的扩展速度会更慢。为了表示感谢,请您将其中一个答案标记为正确答案(答案左侧的勾号)?