Python 按两个文本列分组，并根据计数返回最大行数_Python_Pandas_Max

Python 按两个文本列分组，并根据计数返回最大行数

python pandas

Python 按两个文本列分组，并根据计数返回最大行数,python,pandas,max,Python,Pandas,Max,我正在试图找出最大（第一个单词，组）对 import pandas as pd df = pd.DataFrame({'First_Word': ['apple', 'apple', 'orange', 'apple', 'pear'], 'Group': ['apple bins', 'apple trees', 'orange juice', 'apple trees', 'pear tree'], 'Text': ['where to buy

我正在试图找出最大

（第一个单词，组）

对

import pandas as pd

df = pd.DataFrame({'First_Word': ['apple', 'apple', 'orange', 'apple', 'pear'],
           'Group': ['apple bins', 'apple trees', 'orange juice', 'apple trees', 'pear tree'],
           'Text': ['where to buy apple bins', 'i see an apple tree', 'i like orange juice',
                'apple fell out of the tree', 'partrige in a pear tree']},
          columns=['First_Word', 'Group', 'Text'])

  First_Word         Group                        Text
0      apple    apple bins     where to buy apple bins
1      apple   apple trees         i see an apple tree
2     orange  orange juice         i like orange juice
3      apple   apple trees  apple fell out of the tree
4       pear     pear tree     partrige in a pear tree

然后我做一个

groupby

：

grouped = df.groupby(['First_Word', 'Group']).count()
                         Text
First_Word Group             
apple      apple bins       1
           apple trees      2
orange     orange juice     1
pear       pear tree        1

现在我想把它过滤到只有唯一的索引行，它们的最大计数是

Text

。下面您会注意到

苹果箱

已被删除，因为

苹果树

具有最大值

                         Text
First_Word Group             
apple      apple trees      2
orange     orange juice     1
pear       pear tree        1

                         Text
First_Word Group             
apple      apple trees      2
orange     orange juice     1
pear       pear tree        1

这个问题很相似，但当我尝试这样的问题时：

df.groupby(["First_Word", "Group"]).count().apply(lambda t: t[t['Text']==t['Text'].max()])

我得到一个错误：

keyrerror:（'Text'，'occurrentedatindex Text'）

。如果我将

axis=1

添加到

apply

中，我会得到

索引器：（'index out-bounds'，'occured at index（apple，apple-bins）

给定的

分组，您现在需要按照第一个单词
索引级别进行分组，并找到每组最大行的索引标签（使用）：
然后，您可以使用从按索引标签分组的中选择行：
import pandas as pd
df = pd.DataFrame(
    {'First_Word': ['apple', 'apple', 'orange', 'apple', 'pear'],
     'Group': ['apple bins', 'apple trees', 'orange juice', 'apple trees', 'pear tree'],
     'Text': ['where to buy apple bins', 'i see an apple tree', 'i like orange juice',
              'apple fell out of the tree', 'partrige in a pear tree']},
    columns=['First_Word', 'Group', 'Text'])

grouped = df.groupby(['First_Word', 'Group']).count()
result = grouped.loc[grouped.groupby(level='First_Word')['Text'].idxmax()]
print(result)

屈服
给定grouped
，您现在希望按第一个单词的索引级别进行分组，并找到每个组最大行的索引标签（使用）：
然后，您可以使用从按索引标签分组的中选择行：
import pandas as pd
df = pd.DataFrame(
    {'First_Word': ['apple', 'apple', 'orange', 'apple', 'pear'],
     'Group': ['apple bins', 'apple trees', 'orange juice', 'apple trees', 'pear tree'],
     'Text': ['where to buy apple bins', 'i see an apple tree', 'i like orange juice',
              'apple fell out of the tree', 'partrige in a pear tree']},
    columns=['First_Word', 'Group', 'Text'])

grouped = df.groupby(['First_Word', 'Group']).count()
result = grouped.loc[grouped.groupby(level='First_Word')['Text'].idxmax()]
print(result)

屈服