Dictionary 从字典字典创建数据框架

Dictionary 从字典字典创建数据框架,dictionary,pandas,dataframe,Dictionary,Pandas,Dataframe,我有一本字典,其形式如下: {'user':{movie:rating} } 比如说, {Jill': {'Avenger: Age of Ultron': 7.0, 'Django Unchained': 6.5, 'Gone Girl': 9.0, 'Kill the Messenger': 8.0} 'Toby': {'A

我有一本字典,其形式如下:

{'user':{movie:rating} }
比如说,

{Jill': {'Avenger: Age of Ultron': 7.0,
                            'Django Unchained': 6.5,
                            'Gone Girl': 9.0,
                            'Kill the Messenger': 8.0}
'Toby': {'Avenger: Age of Ultron': 8.5,
                                'Django Unchained': 9.0,
                                'Zoolander': 2.0}}
我想将这个dict-of-dict转换成一个pandas数据框,其中第1列是用户名,其他列是电影分级,即

user  Gone_Girl  Horrible_Bosses_2  Django_Unchained  Zoolander etc. \
但是,一些用户没有对电影进行评分,因此这些电影不包括在该用户密钥()的值()中。在这种情况下,只需在条目中填入NaN即可

现在,我迭代键,填充列表,然后使用此列表创建数据帧:

data=[] 
for i,key in enumerate(movie_user_preferences.keys() ):
    try:            
        data.append((key
                    ,movie_user_preferences[key]['Gone Girl']
                    ,movie_user_preferences[key]['Horrible Bosses 2']
                    ,movie_user_preferences[key]['Django Unchained']
                    ,movie_user_preferences[key]['Zoolander']
                    ,movie_user_preferences[key]['Avenger: Age of Ultron']
                    ,movie_user_preferences[key]['Kill the Messenger']))
    # if no entry, skip
    except:
        pass 
df=pd.DataFrame(data=data,columns=['user','Gone_Girl','Horrible_Bosses_2','Django_Unchained','Zoolander','Avenger_Age_of_Ultron','Kill_the_Messenger'])
但这只给了我一个用户的数据框,这些用户对片场中的所有电影都进行了评分


我的目标是通过迭代电影标签(而不是上面所示的暴力方法)来附加到数据列表中,其次,创建一个包含所有用户并在没有电影分级的元素中放置空值的数据帧

您可以将dict的dict传递给数据帧构造函数:

In [11]: d = {'Jill': {'Django Unchained': 6.5, 'Gone Girl': 9.0, 'Kill the Messenger': 8.0, 'Avenger: Age of Ultron': 7.0}, 'Toby': {'Django Unchained': 9.0, 'Zoolander': 2.0, 'Avenger: Age of Ultron': 8.5}}

In [12]: pd.DataFrame(d)
Out[12]:
                        Jill  Toby
Avenger: Age of Ultron   7.0   8.5
Django Unchained         6.5   9.0
Gone Girl                9.0   NaN
Kill the Messenger       8.0   NaN
Zoolander                NaN   2.0
或者使用来自dict的
方法:

In [13]: pd.DataFrame.from_dict(d)
Out[13]:
                        Jill  Toby
Avenger: Age of Ultron   7.0   8.5
Django Unchained         6.5   9.0
Gone Girl                9.0   NaN
Kill the Messenger       8.0   NaN
Zoolander                NaN   2.0

In [14]: pd.DataFrame.from_dict(d, orient='index')
Out[14]:
      Django Unchained  Gone Girl  Kill the Messenger  Avenger: Age of Ultron  Zoolander
Jill               6.5          9                   8                     7.0        NaN
Toby               9.0        NaN                 NaN                     8.5          2

这种暴力手段似乎也能奏效,但在我看来,在电影标签上进行迭代将更加稳健

data=[] 
for i,key in enumerate(movie_user_preferences.keys() ):
    try:            
        data.append((key
                    ,movie_user_preferences[key]['Gone Girl'] if 'Gone Girl' in movie_user_preferences[key] else 'NaN'
                    ,movie_user_preferences[key]['Horrible Bosses 2'] if 'Horrible Bosses 2' in movie_user_preferences[key] else 'NaN'
                    ,movie_user_preferences[key]['Django Unchained'] if 'Django Unchained' in movie_user_preferences[key] else 'NaN'
                    ,movie_user_preferences[key]['Zoolander'] if 'Zoolander' in movie_user_preferences[key] else 'NaN'
                    ,movie_user_preferences[key]['Avenger: Age of Ultron'] if 'Avenger: Age of Ultron' in movie_user_preferences[key] else 'NaN'
                    ,movie_user_preferences[key]['Kill the Messenger'] if 'Kill the Messenger' in movie_user_preferences[key] else 'NaN' ))

    # if no entry, skip
    except:
        pass


 user Gone_Girl Horrible_Bosses_2  Django_Unchained Zoolander  \
 0      Sam         6                 3               7.5         7   
 1      Max        10                 6               7.0        10   
 2   Robert       NaN                 5               7.0         9   
 3     Toby       NaN               NaN               9.0         2   
 4    Julia       6.5               NaN               6.0       6.5   
 5  William         7                 4               8.0         4   
 6     Jill         9               NaN               6.5       NaN   

 Avenger_Age_of_Ultron Kill_the_Messenger  
 0                   10.0                5.5  
 1                    7.0                  5  
 2                    8.0                  9  
 3                    8.5                NaN  
 4                   10.0                  6  
 5                    6.0                6.5  
 6                    7.0                  8  

有没有办法使用户名成为一个单独的列而不是索引?pd.DataFrame.from_dict(d,orient='index').reset_index()