Python从多个列获取值计数,并从另一列获取平均值

Python从多个列获取值计数,并从另一列获取平均值,python,pandas,dataframe,pandas-groupby,series,Python,Pandas,Dataframe,Pandas Groupby,Series,我有一个包含以下列的数据框架 Movie Rating Genre_0 Genre_1 Genre_2 MovieA 8.9 Action Comedy Family MovieB 9.1 Horror NaN NaN MovieC 4.4 Comedy Family Adventure MovieD 7.7 Action Adventure NaN Mo

我有一个包含以下列的数据框架

Movie    Rating  Genre_0     Genre_1    Genre_2
MovieA   8.9     Action      Comedy     Family
MovieB   9.1     Horror      NaN        NaN
MovieC   4.4     Comedy      Family     Adventure
MovieD   7.7     Action      Adventure  NaN
MovieE   9.5     Adventure   Comedy     NaN
MovieF   7.5     Horror      NaN        NaN
MovieG   8.6     Horror      NaN        NaN
我想得到一个数据帧,它有每种类型的值计数,以及每种类型出现时的平均评级

Genre     value_count   Average_Rating
Action    2             8.3  
Comedy    3             7.6
Horror    3             8.4
Family    2             6.7
Adventure 3             7.2
我已经尝试了以下代码,并且能够获得值计数。然而,我无法根据每种类型出现的次数获得每种类型的平均评级。非常感谢任何形式的帮助,谢谢

#create a list for the genre columns
genre_col = [col for col in df if col.startswith('Genre_')]

#get value counts of genres
genre_counts = df[genre_col].apply(pd.Series.value_counts).sum(1).to_frame(name='Count')
genre_counts.index.name = 'Genre'

genre_counts = genre_counts.reset_index()
您可以使用指定列及其相应聚合函数的字典对数据帧进行聚合,然后对数据帧进行分组,然后在
流派
上对数据帧进行聚合:

# filter and melt the dataframe
m = df.filter(regex=r'Rating|Genre').melt('Rating', value_name='Genre')

# group and aggregate
dct = {'Value_Count': ('Genre', 'count'), 'Average_Rating': ('Rating', 'mean')}
df_out = m.groupby('Genre', as_index=False).agg(**dct)


将体裁编码为其值计数的过程是频率编码,可以使用此代码完成

df_frequency_map = df.Genre_0.value_counts().to_dict()
df['Genre0_frequency_map'] = df.Genre_0.map(df_frequency_map)
将平均值作为特征添加到数据集中我认为您可以执行相同的操作,但在执行
to_dict()
函数之前计算平均值

df_frequency_map = df.df.Genre_0.value_counts().mean().to_dict()
df['Genre0_mean_frequency_map'] = df.Genre_0.map(df_frequency_map)
df_frequency_map = df.df.Genre_0.value_counts().mean().to_dict()
df['Genre0_mean_frequency_map'] = df.Genre_0.map(df_frequency_map)