Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/315.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 将嵌套在两个字典下的列表转换为DataFrame_Python_Python 3.x_Dictionary_Pandas - Fatal编程技术网

Python 将嵌套在两个字典下的列表转换为DataFrame

Python 将嵌套在两个字典下的列表转换为DataFrame,python,python-3.x,dictionary,pandas,Python,Python 3.x,Dictionary,Pandas,我试图用Python创建一个包含嵌套字典和列表的Pandas数据框架。我查看了有关转换嵌套字典的其他问题,但找不到充分的答案 我有一本字典,比如说,它是一本活动手册,记录着学校的课外课程。在本例中,有两个课程,每个课程都是嵌套在活动手册词典下的自己的词典。每本课程词典都包含一份由每个人编写的活动列表,并按月组织。每个月进行活动的学生人数是可变的,但结构始终是学生活动分钟数。例如: activity_dict = { 'lesson1' : { 'january' : [['Todd', 'R

我试图用Python创建一个包含嵌套字典和列表的Pandas数据框架。我查看了有关转换嵌套字典的其他问题,但找不到充分的答案

我有一本字典,比如说,它是一本活动手册,记录着学校的课外课程。在本例中,有两个课程,每个课程都是嵌套在活动手册词典下的自己的词典。每本课程词典都包含一份由每个人编写的活动列表,并按月组织。每个月进行活动的学生人数是可变的,但结构始终是学生活动分钟数。例如:

activity_dict = {

'lesson1' : {  'january' : [['Todd', 'Running', 30],['Christy', 'Studying', 25],['Alex','Soccer', 10]],
               'february' : [['Jim', 'Bobsledding', 5],['Frank', 'Jogging',8]]},

'lesson2' : {'february' : [['Todd', 'Running', 18],['John', 'Studying', 3],['Don','Soccer', 40]],
              'march' : [['Tom', 'Bobsledding', 10],['Sam', 'Yoga', 42]],
              'april' : [['Julie', 'Biking', 20],['Chris', 'Baseball', 10]]}
}  
我试图得到每个学生活动的输出,ColA=Lesson#,ColB=Month,ColC=student,ColD=activity,ColE=Minutes。样本输出为:

Lesson # Month Student Activity Minutes
Lesson 1 February Jim Bobsledding 5
Lesson 1 February Frank Jogging 8
Lesson 2 February Todd Running 18
我已经找到了一种方法来创建从C列到E列的数据帧,但是我无法包括a列和B列

我现在的代码如下:

import pandas

activity_log = []

for lesson, all_activities in activity_dict.items():
    for month, month_activities in all_activities.items():
        activity_log.append(pandas.DataFrame(month_activities))
我如何更新它以将字典键(lesson和month)包含为列A和列B?我不确定将列表列表更改为字典是否有帮助,但我将其保留为列表,因为这就是我接收数据的方式。

使用a将列表列表的dict of dict转换为列表列表:

In [99]: [(lesson, month, name, activity, minutes) 
          for lesson, dct in activity_dict.items() 
          for month, vals in dct.items() 
          for name, activity, minutes in vals]
Out[99]: 
[('lesson2', 'april', 'Julie', 'Biking', 20),
 ('lesson2', 'april', 'Chris', 'Baseball', 10),
 ('lesson2', 'february', 'Todd', 'Running', 18),
 ('lesson2', 'february', 'John', 'Studying', 3),
 ('lesson2', 'february', 'Don', 'Soccer', 40),
 ('lesson2', 'march', 'Tom', 'Bobsledding', 10),
 ('lesson2', 'march', 'Sam', 'Yoga', 42),
 ('lesson1', 'january', 'Todd', 'Running', 30),
 ('lesson1', 'january', 'Christy', 'Studying', 25),
 ('lesson1', 'january', 'Alex', 'Soccer', 10),
 ('lesson1', 'february', 'Jim', 'Bobsledding', 5),
 ('lesson1', 'february', 'Frank', 'Jogging', 8)]
In [98]: pd.DataFrame([(lesson, month, name, activity, minutes)
                       for lesson, dct in activity_dict.items() 
                       for month, vals in dct.items() 
                       for name, activity, minutes in vals], 
             columns=['Lesson', 'Month', 'Name', 'Activity', 'Minutes'])
Out[98]: 
     Lesson     Month     Name     Activity  Minutes
0   lesson2     april    Julie       Biking       20
1   lesson2     april    Chris     Baseball       10
2   lesson2  february     Todd      Running       18
3   lesson2  february     John     Studying        3
4   lesson2  february      Don       Soccer       40
5   lesson2     march      Tom  Bobsledding       10
6   lesson2     march      Sam         Yoga       42
7   lesson1   january     Todd      Running       30
8   lesson1   january  Christy     Studying       25
9   lesson1   january     Alex       Soccer       10
10  lesson1  february      Jim  Bobsledding        5
11  lesson1  february    Frank      Jogging        8
然后使用
pd.DataFrame
从列表中构建数据帧:

In [99]: [(lesson, month, name, activity, minutes) 
          for lesson, dct in activity_dict.items() 
          for month, vals in dct.items() 
          for name, activity, minutes in vals]
Out[99]: 
[('lesson2', 'april', 'Julie', 'Biking', 20),
 ('lesson2', 'april', 'Chris', 'Baseball', 10),
 ('lesson2', 'february', 'Todd', 'Running', 18),
 ('lesson2', 'february', 'John', 'Studying', 3),
 ('lesson2', 'february', 'Don', 'Soccer', 40),
 ('lesson2', 'march', 'Tom', 'Bobsledding', 10),
 ('lesson2', 'march', 'Sam', 'Yoga', 42),
 ('lesson1', 'january', 'Todd', 'Running', 30),
 ('lesson1', 'january', 'Christy', 'Studying', 25),
 ('lesson1', 'january', 'Alex', 'Soccer', 10),
 ('lesson1', 'february', 'Jim', 'Bobsledding', 5),
 ('lesson1', 'february', 'Frank', 'Jogging', 8)]
In [98]: pd.DataFrame([(lesson, month, name, activity, minutes)
                       for lesson, dct in activity_dict.items() 
                       for month, vals in dct.items() 
                       for name, activity, minutes in vals], 
             columns=['Lesson', 'Month', 'Name', 'Activity', 'Minutes'])
Out[98]: 
     Lesson     Month     Name     Activity  Minutes
0   lesson2     april    Julie       Biking       20
1   lesson2     april    Chris     Baseball       10
2   lesson2  february     Todd      Running       18
3   lesson2  february     John     Studying        3
4   lesson2  february      Don       Soccer       40
5   lesson2     march      Tom  Bobsledding       10
6   lesson2     march      Sam         Yoga       42
7   lesson1   january     Todd      Running       30
8   lesson1   january  Christy     Studying       25
9   lesson1   january     Alex       Soccer       10
10  lesson1  february      Jim  Bobsledding        5
11  lesson1  february    Frank      Jogging        8

杰出的现在尝试一下,会尽快回复!这管用!快速提问,为什么作为pd进口大熊猫如此普遍?仅仅是因为pd更方便吗?我认为它有两个优点:它节省了输入(如果你引用了很多函数或类)并且使代码更可读。