Python从多个列获取值计数,并从另一列获取平均值
我有一个包含以下列的数据框架Python从多个列获取值计数,并从另一列获取平均值,python,pandas,dataframe,pandas-groupby,series,Python,Pandas,Dataframe,Pandas Groupby,Series,我有一个包含以下列的数据框架 Movie Rating Genre_0 Genre_1 Genre_2 MovieA 8.9 Action Comedy Family MovieB 9.1 Horror NaN NaN MovieC 4.4 Comedy Family Adventure MovieD 7.7 Action Adventure NaN Mo
Movie Rating Genre_0 Genre_1 Genre_2
MovieA 8.9 Action Comedy Family
MovieB 9.1 Horror NaN NaN
MovieC 4.4 Comedy Family Adventure
MovieD 7.7 Action Adventure NaN
MovieE 9.5 Adventure Comedy NaN
MovieF 7.5 Horror NaN NaN
MovieG 8.6 Horror NaN NaN
我想得到一个数据帧,它有每种类型的值计数,以及每种类型出现时的平均评级
Genre value_count Average_Rating
Action 2 8.3
Comedy 3 7.6
Horror 3 8.4
Family 2 6.7
Adventure 3 7.2
我已经尝试了以下代码,并且能够获得值计数。然而,我无法根据每种类型出现的次数获得每种类型的平均评级。非常感谢任何形式的帮助,谢谢
#create a list for the genre columns
genre_col = [col for col in df if col.startswith('Genre_')]
#get value counts of genres
genre_counts = df[genre_col].apply(pd.Series.value_counts).sum(1).to_frame(name='Count')
genre_counts.index.name = 'Genre'
genre_counts = genre_counts.reset_index()
您可以使用指定列及其相应聚合函数的字典对数据帧进行聚合,然后对数据帧进行分组,然后在流派
上对数据帧进行聚合:
# filter and melt the dataframe
m = df.filter(regex=r'Rating|Genre').melt('Rating', value_name='Genre')
# group and aggregate
dct = {'Value_Count': ('Genre', 'count'), 'Average_Rating': ('Rating', 'mean')}
df_out = m.groupby('Genre', as_index=False).agg(**dct)
将体裁编码为其值计数的过程是频率编码,可以使用此代码完成
df_frequency_map = df.Genre_0.value_counts().to_dict()
df['Genre0_frequency_map'] = df.Genre_0.map(df_frequency_map)
将平均值作为特征添加到数据集中我认为您可以执行相同的操作,但在执行to_dict()
函数之前计算平均值
df_frequency_map = df.df.Genre_0.value_counts().mean().to_dict()
df['Genre0_mean_frequency_map'] = df.Genre_0.map(df_frequency_map)
df_frequency_map = df.df.Genre_0.value_counts().mean().to_dict()
df['Genre0_mean_frequency_map'] = df.Genre_0.map(df_frequency_map)