List 熊猫分组词典_List_Pandas_Dictionary_Dataframe_Pandas Groupby

List 熊猫分组词典

list pandas dictionary dataframe

List 熊猫分组词典,list,pandas,dictionary,dataframe,pandas-groupby,List,Pandas,Dictionary,Dataframe,Pandas Groupby,熊猫新手，如果解决方案很明显的话，我很抱歉我有一个数据帧（见下文），其中包含不同的电影场景和该电影场景的环境 import pandas as pd data = [{'movie' : 'movie_X', 'scene' : '1', 'environment' : 'home'}, {'movie' : 'movie_X', 'scene' : '2', 'environment' : 'car'}, {'movie' : 'movie_X', 'sc

熊猫新手，如果解决方案很明显的话，我很抱歉

我有一个数据帧（见下文），其中包含不同的电影场景和该电影场景的环境

import pandas as pd
data = [{'movie' : 'movie_X', 'scene' : '1', 'environment' : 'home'}, 
        {'movie' : 'movie_X', 'scene' : '2', 'environment' : 'car'}, 
        {'movie' : 'movie_X', 'scene' : '3', 'environment' : 'home'}, 
        {'movie' : 'movie_Y', 'scene' : '1', 'environment' : 'home'}, 
        {'movie' : 'movie_Y', 'scene' : '2', 'environment' : 'office'}, 
        {'movie' : 'movie_Z', 'scene' : '1', 'environment' : 'boat'}, 
        {'movie' : 'movie_Z', 'scene' : '2', 'environment' : 'beach'}, 
        {'movie' : 'movie_Z', 'scene' : '3', 'environment' : 'home' }]
myDF = pd.DataFrame(data)

在这种情况下，电影有多种类型。我有一本字典（如下）描述每部电影属于哪种类型

genreDict = {'movie_X' : ['romance', 'action'],
           'movie_Y' : ['comedy', 'romance', 'action'],
           'movie_Z' : ['horror', 'thriller', 'romance']}

我想根据这本词典对myDF进行分组，特别是能够告诉特定类型中特定环境出现的次数（例如，在恐怖类型中，“船”被计数一次，“海滩”被计数一次，“家”被计数一次）。做这件事的最佳和最有效的方法是什么？我已尝试将字典映射到数据帧，然后按列表分组：

myDF['genres'] = myDF['movie'].map(genreDict)

   movie    scene    environment               genres
0  movie_X     1        home            [romance, action]
1  movie_X     2         car            [romance, action]
2  movie_X     3        home            [romance, action]
3  movie_Y     1        home    [comedy, romance, action]
4  movie_Y     2      office    [comedy, romance, action]
5  movie_Z     1        boat  [horror, thriller, romance]
6  movie_Z     2       beach  [horror, thriller, romance]
7  movie_Z     3        home  [horror, thriller, romance]

然而，我得到一个错误，说名单是不可破坏的。希望你们都能提供帮助：）

非标量对象通常会导致熊猫出现问题。除此之外，您还需要整理数据，以便下一步更简单（表格结构上的主要操作通常在整理数据集上定义）。您需要一个数据集，其中不列出一行中的所有类型，而是每个类型都有自己的行

以下是实现这一目标的可能方法之一：

genre_df = pd.DataFrame(myDF['movie'].map(genreDict).tolist())

df = myDF.join(genre_df.stack().rename('genre').reset_index(level=1, drop=True))
df
Out: 
  environment    movie scene     genre
0        home  movie_X     1   romance
0        home  movie_X     1    action
1         car  movie_X     2   romance
1         car  movie_X     2    action
2        home  movie_X     3   romance
2        home  movie_X     3    action
3        home  movie_Y     1    comedy
3        home  movie_Y     1   romance
3        home  movie_Y     1    action
4      office  movie_Y     2    comedy
4      office  movie_Y     2   romance
4      office  movie_Y     2    action
5        boat  movie_Z     1    horror
5        boat  movie_Z     1  thriller
5        boat  movie_Z     1   romance
6       beach  movie_Z     2    horror
6       beach  movie_Z     2  thriller
6       beach  movie_Z     2   romance
7        home  movie_Z     3    horror
7        home  movie_Z     3  thriller
7        home  movie_Z     3   romance

一旦有了这样的结构，分组或交叉制表数据就容易多了：

df.groupby('genre').size()
Out: 
genre
action      5
comedy      2
horror      3
romance     8
thriller    3
dtype: int64

pd.crosstab(df['genre'], df['environment'])
Out: 
environment  beach  boat  car  home  office
genre                                      
action           0     0    1     3       1
comedy           0     0    0     1       1
horror           1     1    0     1       0
romance          1     1    1     4       1
thriller         1     1    0     1       0

哈德利·威克姆（Hadley Wickham）有一篇很棒的读物：。

以下是实现这一目标的可能方法之一：

genre_df = pd.DataFrame(myDF['movie'].map(genreDict).tolist())

df = myDF.join(genre_df.stack().rename('genre').reset_index(level=1, drop=True))
df
Out: 
  environment    movie scene     genre
0        home  movie_X     1   romance
0        home  movie_X     1    action
1         car  movie_X     2   romance
1         car  movie_X     2    action
2        home  movie_X     3   romance
2        home  movie_X     3    action
3        home  movie_Y     1    comedy
3        home  movie_Y     1   romance
3        home  movie_Y     1    action
4      office  movie_Y     2    comedy
4      office  movie_Y     2   romance
4      office  movie_Y     2    action
5        boat  movie_Z     1    horror
5        boat  movie_Z     1  thriller
5        boat  movie_Z     1   romance
6       beach  movie_Z     2    horror
6       beach  movie_Z     2  thriller
6       beach  movie_Z     2   romance
7        home  movie_Z     3    horror
7        home  movie_Z     3  thriller
7        home  movie_Z     3   romance

一旦有了这样的结构，分组或交叉制表数据就容易多了：

df.groupby('genre').size()
Out: 
genre
action      5
comedy      2
horror      3
romance     8
thriller    3
dtype: int64

pd.crosstab(df['genre'], df['environment'])
Out: 
environment  beach  boat  car  home  office
genre                                      
action           0     0    1     3       1
comedy           0     0    0     1       1
horror           1     1    0     1       0
romance          1     1    1     4       1
thriller         1     1    0     1       0

Hadley Wickham阅读了一篇很棒的文章：.

如果数据帧越大，则使用

numpy

重复行，通过

列表

和：

然后使用并聚合：

如果数据帧越大，则使用

numpy

按

列表

重复行，并且：

然后使用并聚合：

你能发布你想要的数据集吗？你能发布你想要的数据集吗？