Python 如何在流数据中的数据帧中创建复杂的字典_Python_Pandas

Python 如何在流数据中的数据帧中创建复杂的字典

python pandas

Python 如何在流数据中的数据帧中创建复杂的字典,python,pandas,Python,Pandas,各种嵌套字典和数据结构：）我有一本字典样本- stream= { "Outerclass": { "Main_ID": "1", "SetID": "1041", "Version": 2, "nestedData": { "time": [

各种嵌套字典和数据结构：）

我有一本字典样本-

stream= {
    "Outerclass": {
        "Main_ID": "1",
        "SetID": "1041",
        "Version": 2,
        "nestedData": {
            "time": ["5000", "6000", "7000"],
            "value": [1, 2, 3]
        }

    } }

我想用它像这样创建一个数据帧-

  Main_ID SetID  Version  Time  Value
0     1     1041      2.0  5000      1
1     1     1041      2.0  6000      2
2     1     1041      2.0  7000      3

我已经写了下面的代码来产生我所需要的，我知道这不是一个好方法，如果有人可以帮助建议，这将是伟大的。而且我确信，当我对流式数据运行它时，它的性能会非常糟糕。这3个数据帧将在一个循环中创建，时间和值列表中的数据范围为30000-100000

代码-

输出-

  Main_ID SetID  Version  Time  Value
0       1  1041      2.0  5000      1
1     NaN   NaN      NaN  6000      2
2     NaN   NaN      NaN  7000      3

使用

json\u normalize

将dict展平为数据帧，然后使用

explode

将列表转换为行：

stream= {
    "Outerclass": {
        "Main_ID": "1",
        "SetID": "1041",
        "Version": 2,
        "nestedData": {
            "time": ["5000", "6000", "7000"],
            "value": [1, 2, 3]
        }

    } }
df = pd.json_normalize(stream)
df = df.apply(pd.Series.explode).reset_index(drop=True)
print(df)


  Outerclass.Main_ID Outerclass.SetID  Outerclass.Version Outerclass.nestedData.time Outerclass.nestedData.value
0                  1             1041                   2                       5000                           1
1                  1             1041                   2                       6000                           2
2                  1             1041                   2                       7000                           3

使用

json\u normalize

将dict展平为数据帧，然后使用

explode

将列表转换为行：

stream= {
    "Outerclass": {
        "Main_ID": "1",
        "SetID": "1041",
        "Version": 2,
        "nestedData": {
            "time": ["5000", "6000", "7000"],
            "value": [1, 2, 3]
        }

    } }
df = pd.json_normalize(stream)
df = df.apply(pd.Series.explode).reset_index(drop=True)
print(df)


  Outerclass.Main_ID Outerclass.SetID  Outerclass.Version Outerclass.nestedData.time Outerclass.nestedData.value
0                  1             1041                   2                       5000                           1
1                  1             1041                   2                       6000                           2
2                  1             1041                   2                       7000                           3

我们可以试试

from pandas.io.json import json_normalize
s = json_normalize(stream['Outerclass'])
s = s.join(pd.concat([s.pop(x).explode()  for x in ['nestedData.time','nestedData.value']],axis=1))
s
Out[222]: 
  Main_ID SetID  Version nestedData.time nestedData.value
0       1  1041        2            5000                1
0       1  1041        2            6000                2
0       1  1041        2            7000                3

我们可以试试

from pandas.io.json import json_normalize
s = json_normalize(stream['Outerclass'])
s = s.join(pd.concat([s.pop(x).explode()  for x in ['nestedData.time','nestedData.value']],axis=1))
s
Out[222]: 
  Main_ID SetID  Version nestedData.time nestedData.value
0       1  1041        2            5000                1
0       1  1041        2            6000                2
0       1  1041        2            7000                3

谢谢你的解释：）你能告诉我这是否也适用于大数据吗？抱歉，如果这是一个基本问题……谢谢你的解释：）你能告诉我这是否也适用于大数据吗？对不起，如果这是一个基本的问题。。