Python 数据帧转换。一次应用几个算术运算
我有一个如下所示的数据框: 我想应用一些操作使其看起来像这样: 在1)电影中,我将Movies.count()放在每个演员身上,2)评级变成独特电影的平均评级,3)按演员对独特电影的投票进行汇总Python 数据帧转换。一次应用几个算术运算,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个如下所示的数据框: 我想应用一些操作使其看起来像这样: 在1)电影中,我将Movies.count()放在每个演员身上,2)评级变成独特电影的平均评级,3)按演员对独特电影的投票进行汇总 请帮助了解如何进行此转换。谢谢。首先,您可以按名称和电影分组以删除重复项,然后按名称分组以聚合其余项: In [25]: films.groupby(["Name", "Movie"]).first().reset_index().groupby("Name") ...: .agg({"Mo
请帮助了解如何进行此转换。谢谢。首先,您可以按名称和电影分组以删除重复项,然后按名称分组以聚合其余项:
In [25]: films.groupby(["Name", "Movie"]).first().reset_index().groupby("Name")
...: .agg({"Movie": "count", "Rating": "mean", "Votes": "sum"})
Out[25]:
Movie Rating Votes
Name
Brad Pitt 3 7.60 250
John Travolta 1 7.90 85
Leonardo DiCaprio 2 8.65 280
Rowan Atkinson 1 9.00 150
Uma Thurman 1 7.90 85
首先,您可以按名称和电影分组以删除重复项,然后仅按名称分组以聚合其余项:
In [25]: films.groupby(["Name", "Movie"]).first().reset_index().groupby("Name")
...: .agg({"Movie": "count", "Rating": "mean", "Votes": "sum"})
Out[25]:
Movie Rating Votes
Name
Brad Pitt 3 7.60 250
John Travolta 1 7.90 85
Leonardo DiCaprio 2 8.65 280
Rowan Atkinson 1 9.00 150
Uma Thurman 1 7.90 85
我将首先处理重复项,然后分组,而不是使用嵌套的groupby
%timeit films.drop_duplicates(['Movie', 'Name']).groupby(['Name']).agg({'Movie' : 'count', 'Rating' : 'mean', 'Votes' : 'sum'})
2.55 ms ± 122 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit films.groupby(["Name", "Movie"]).first().reset_index().groupby("Name").agg({"Movie": "count", "Rating": "mean", "Votes": "sum"})
6.92 ms ± 143 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Movie Rating Votes
Name
Brad Pitt 3 7.60 250
John Travolta 1 7.90 85
Leonardo DiCaprio 2 8.65 280
Rowan Atkinson 1 9.00 150
Uma Thurman 1 7.90 85
我将首先处理重复项,然后分组,而不是使用嵌套的groupby
%timeit films.drop_duplicates(['Movie', 'Name']).groupby(['Name']).agg({'Movie' : 'count', 'Rating' : 'mean', 'Votes' : 'sum'})
2.55 ms ± 122 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit films.groupby(["Name", "Movie"]).first().reset_index().groupby("Name").agg({"Movie": "count", "Rating": "mean", "Votes": "sum"})
6.92 ms ± 143 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Movie Rating Votes
Name
Brad Pitt 3 7.60 250
John Travolta 1 7.90 85
Leonardo DiCaprio 2 8.65 280
Rowan Atkinson 1 9.00 150
Uma Thurman 1 7.90 85
令人惊叹的!谢谢!令人惊叹的!谢谢!