Python 熊猫得到辅助信息计数_Python_Pandas

Python 熊猫得到辅助信息计数

python pandas

Python 熊猫得到辅助信息计数,python,pandas,Python,Pandas,我有以下数据框： df = pd.DataFrame([{'file_name': 'my_movie.mov', 'status': 'final'}, {'file_name': 'his_movie.mov', 'status': 'source'}, {'file_name': 'her_movie.mov', 'status': 'source'}]) file_name status 0 my_movie.mov final 1 his_movie.mov

我有以下数据框：

df = pd.DataFrame([{'file_name': 'my_movie.mov', 'status': 'final'}, {'file_name': 'his_movie.mov', 'status': 'source'}, {'file_name': 'her_movie.mov', 'status': 'source'}])

       file_name  status
0   my_movie.mov   final
1  his_movie.mov  source
2  her_movie.mov  source

我想做一些类似的事情：

df.groupby('status')[['status', 'file_name', 'count']]

status         file_name              count
final          my_movie.mov           1
source         his_movie.mov          2

file\u name

可以是任何文件名值，count将是记录的

count

。答案可能是这样的：

df.groupby('status')[['status', 'file_name', 'count']]

status         file_name              count
final          my_movie.mov           1
source         his_movie.mov          2

在SQL中（使用mysql方言），我将执行以下操作：

SELECT status, file_name, COUNT(*) FROM df GROUP BY status

我该如何在熊猫身上做到这一点

我得到的最接近的是这个，但这并没有添加到文件名中，我想要：

>>> df[['new__status', 'file_name']].groupby('new__status').count().sort_values('file_name', ascending=False)

试试这个

df.groupby('status').agg({'file_name': 'first', 'status': 'size'}).rename(columns={'status': 'count'}).reset_index()

使用

series.groupby

namedag

df_agg = df.groupby('status').file_name.agg(file_name='first', count='count').reset_index()

Out[393]:
   status      file_name  count
0   final   my_movie.mov      1
1  source  his_movie.mov      2

我认为您的SQL查询不正确，您是按状态和文件名进行分组吗？@Reza--不是。我只是按

状态进行分组。一些sql方言允许您访问未聚合的字段，并从中提取一个随机值（有时称为ANY（…）
或FIRST（…）
。我明白了，所以你需要任何来处理熊猫的问题吗？@RezaI在问题中添加了一个更新，并举例说明了我现在的位置。我刚刚发布了一个答案，它对你有用吗？是的，我刚刚添加了。排序值（'count'，升序=False）
在最后有效，谢谢！但这里有两个问题：（1）你为什么在最后添加了reset_index（）
？这有什么意义？（2）你能添加几个链接到你正在使用的方法吗？rename
，agg
，等等？reset_index（）
用于将聚合列'status'
作为新列而不是索引。以下是agg
的链接：