python—2个groupby列的聚合计数的最大值
我想从下面的python—2个groupby列的聚合计数的最大值,python,pandas,dataframe,Python,Pandas,Dataframe,我想从下面的name和hours列的行计数中获取最大值: import pandas as pd hours = [8,8,9,9, 8,9,10,10, 8,9,12,12, 10,11,12,12] names = ['A', 'B', 'C', 'D'] * 4 df = pd.DataFrame({'names': names, 'hours', hours}) 我的预期产出: names
name
和hours
列的行计数中获取最大值:
import pandas as pd
hours = [8,8,9,9,
8,9,10,10,
8,9,12,12,
10,11,12,12]
names = ['A', 'B', 'C', 'D'] * 4
df = pd.DataFrame({'names': names,
'hours', hours})
我的预期产出:
names hours count
A 8 3
B 9 2
C 12 2
D 12 2
我所尝试的:
# This will get me the aggregated count based on names and hours
df.groupby(['names', 'hours']).size().reset_index(name='count')
# result
names hours count
A 8 3
10 1
B 8 1
9 2
11 1
C 9 1
10 1
12 2
D 9 1
10 1
12 2
# To get the max value for each names & hours group (But failed)
df.groupby(['names', 'hours']).size().reset_index(name='count').\
groupby(['names','hours']).max()
# I get the same result as I got above
这个怎么样:
grouped = df.groupby(['names', 'hours']).size().reset_index(name='count')
final = df.loc[df.groupby(['names'])['count'].transform(max) == df['count']]
final
#names hours count
#A 8 3
#B 9 2
#C 12 2
#D 12 2
这个怎么样:
grouped = df.groupby(['names', 'hours']).size().reset_index(name='count')
final = df.loc[df.groupby(['names'])['count'].transform(max) == df['count']]
final
#names hours count
#A 8 3
#B 9 2
#C 12 2
#D 12 2
使用groupby和value_计数的另一种方法:
(
df.groupby('names')
.apply(lambda x: x.hours.value_counts().nlargest(1))
.reset_index()
.set_axis(['names','hours','count'], axis=1, inplace=False)
)
Out[249]:
names hours count
0 A 8 3
1 B 9 2
2 C 12 2
3 D 12 2
使用groupby和value_计数的另一种方法:
(
df.groupby('names')
.apply(lambda x: x.hours.value_counts().nlargest(1))
.reset_index()
.set_axis(['names','hours','count'], axis=1, inplace=False)
)
Out[249]:
names hours count
0 A 8 3
1 B 9 2
2 C 12 2
3 D 12 2
谢谢你的回答。这是可行的,但我不喜欢使用.loc,只想继续链接功能,谢谢你的回答。这是可行的,但我不喜欢使用.loc,而只是继续使用链接功能