Python 熊猫：多列的计数值_Python_Pandas

Python 熊猫：多列的计数值

python pandas

Python 熊猫：多列的计数值,python,pandas,Python,Pandas,我目前正在做一个项目，我需要计算每年每种流派的受欢迎程度。数据集为我提供了电影，但每部电影都可以有多个类型，如下面的示例df所示（也有以“|”）分隔的混乱格式类型）我的第一步是将每种类型划分为不同的列，以便使用str.split处理干净的数据 df[['Genre_1','Genre_2','Genre_3','Genre_4','Genre_5']] = df['genres'].str.split("|",expand=True) release_year Genre_1

我目前正在做一个项目，我需要计算每年每种流派的受欢迎程度。数据集为我提供了电影，但每部电影都可以有多个类型，如下面的示例df所示（也有以“|”）分隔的混乱格式类型）

我的第一步是将每种类型划分为不同的列，以便使用str.split处理干净的数据

df[['Genre_1','Genre_2','Genre_3','Genre_4','Genre_5']] = df['genres'].str.split("|",expand=True)

    release_year   Genre_1          Genre_2          Genre_3   Genre_4  Genre_5
0          2015     Action        Adventure  Science Fiction  Thriller    None
1          2015     Action        Adventure  Science Fiction  Thriller    None
2          2015  Adventure  Science Fiction         Thriller      None    None
3          2015     Action        Adventure  Science Fiction   Fantasy    None
4          2015     Action            Crime         Thriller      None    None

既然每部电影都有多部，我如何使用groupby语句来计算每种类型每年的受欢迎程度？看起来我想折叠我扩展的所有列，但保留每个列的年份键，理想情况下会导致如下结果：

    release_year   All genres
0          2015     Action 
1          2015     Action 
2          2015  Adventure
3          2015     Action
4          2015     Action

我真的非常感谢你在这方面的帮助。

非常感谢

我猜下面的内容将为您提供所需的输出

df = pd.DataFrame(
    [
        [2015, 'Action|Adventure|Science Fiction|Thriller'],
        [2015, 'Action|Adventure|Science Fiction|Thriller'],
        [2015, ' Action|Crime|Thriller']
    ],
    columns=['release_year', 'genres']

)
df2 = df['genres'].str.split('|').apply(pd.Series)
df2.index = df.set_index(['release_year']).index
df2.stack().reset_index(['release_year']).rename(columns={0: 'All Genres'})

输出：

   release_year       All Genres
0          2015           Action
1          2015        Adventure
2          2015  Science Fiction
3          2015         Thriller
0          2015           Action
1          2015        Adventure
2          2015  Science Fiction
3          2015         Thriller
0          2015           Action
1          2015            Crime
2          2015         Thriller

为什么您想要的输出具有相同年份和相同类型的重复行？工作非常有魅力！谢谢！很高兴这有帮助。快乐编码。。

   release_year       All Genres
0          2015           Action
1          2015        Adventure
2          2015  Science Fiction
3          2015         Thriller
0          2015           Action
1          2015        Adventure
2          2015  Science Fiction
3          2015         Thriller
0          2015           Action
1          2015            Crime
2          2015         Thriller