Python 按两个文本列分组,并根据计数返回最大行数
我正在试图找出最大Python 按两个文本列分组,并根据计数返回最大行数,python,pandas,max,Python,Pandas,Max,我正在试图找出最大(第一个单词,组)对 import pandas as pd df = pd.DataFrame({'First_Word': ['apple', 'apple', 'orange', 'apple', 'pear'], 'Group': ['apple bins', 'apple trees', 'orange juice', 'apple trees', 'pear tree'], 'Text': ['where to buy
(第一个单词,组)
对
import pandas as pd
df = pd.DataFrame({'First_Word': ['apple', 'apple', 'orange', 'apple', 'pear'],
'Group': ['apple bins', 'apple trees', 'orange juice', 'apple trees', 'pear tree'],
'Text': ['where to buy apple bins', 'i see an apple tree', 'i like orange juice',
'apple fell out of the tree', 'partrige in a pear tree']},
columns=['First_Word', 'Group', 'Text'])
First_Word Group Text
0 apple apple bins where to buy apple bins
1 apple apple trees i see an apple tree
2 orange orange juice i like orange juice
3 apple apple trees apple fell out of the tree
4 pear pear tree partrige in a pear tree
然后我做一个groupby
:
grouped = df.groupby(['First_Word', 'Group']).count()
Text
First_Word Group
apple apple bins 1
apple trees 2
orange orange juice 1
pear pear tree 1
现在我想把它过滤到只有唯一的索引行,它们的最大计数是Text
。下面您会注意到苹果箱
已被删除,因为苹果树
具有最大值
Text
First_Word Group
apple apple trees 2
orange orange juice 1
pear pear tree 1
Text
First_Word Group
apple apple trees 2
orange orange juice 1
pear pear tree 1
这个问题很相似,但当我尝试这样的问题时:
df.groupby(["First_Word", "Group"]).count().apply(lambda t: t[t['Text']==t['Text'].max()])
我得到一个错误:
keyrerror:('Text','occurrentedatindex Text')
。如果我将axis=1
添加到apply
中,我会得到索引器:('index out-bounds','occured at index(apple,apple-bins)
给定的分组,您现在需要按照第一个单词
索引级别进行分组,并找到每组最大行的索引标签(使用):
然后,您可以使用从按索引标签分组的中选择行:
import pandas as pd
df = pd.DataFrame(
{'First_Word': ['apple', 'apple', 'orange', 'apple', 'pear'],
'Group': ['apple bins', 'apple trees', 'orange juice', 'apple trees', 'pear tree'],
'Text': ['where to buy apple bins', 'i see an apple tree', 'i like orange juice',
'apple fell out of the tree', 'partrige in a pear tree']},
columns=['First_Word', 'Group', 'Text'])
grouped = df.groupby(['First_Word', 'Group']).count()
result = grouped.loc[grouped.groupby(level='First_Word')['Text'].idxmax()]
print(result)
屈服
给定grouped
,您现在希望按第一个单词的索引级别进行分组,并找到每个组最大行的索引标签(使用):
然后,您可以使用从按索引标签分组的中选择行:
import pandas as pd
df = pd.DataFrame(
{'First_Word': ['apple', 'apple', 'orange', 'apple', 'pear'],
'Group': ['apple bins', 'apple trees', 'orange juice', 'apple trees', 'pear tree'],
'Text': ['where to buy apple bins', 'i see an apple tree', 'i like orange juice',
'apple fell out of the tree', 'partrige in a pear tree']},
columns=['First_Word', 'Group', 'Text'])
grouped = df.groupby(['First_Word', 'Group']).count()
result = grouped.loc[grouped.groupby(level='First_Word')['Text'].idxmax()]
print(result)
屈服