Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/292.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/sorting/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 列出熊猫组中最常见的成员?_Python_Sorting_Pandas_Dataframe_Series - Fatal编程技术网

Python 列出熊猫组中最常见的成员?

Python 列出熊猫组中最常见的成员?,python,sorting,pandas,dataframe,series,Python,Sorting,Pandas,Dataframe,Series,我有一个dataframe,其中的列如下: id lead_sponsor lead_sponsor_class 02837692 Janssen Research & Development, LLC Industry 02837679 Aarhus University Hospital Other 02837666 Unive

我有一个dataframe,其中的列如下:

        id                           lead_sponsor lead_sponsor_class
  02837692    Janssen Research & Development, LLC           Industry
  02837679             Aarhus University Hospital              Other
  02837666  Universidad Autonoma de Ciudad Juarez              Other
  02837653         Universidad Autonoma de Madrid              Other
  02837640         Beirut Eye Specialist Hospital              Other
我想找到最常见的主要赞助商。我可以使用以下方法列出每个组的大小:

df.groupby(['lead_sponsor', 'lead_sponsor_class']).size()
这就给了我:

lead_sponsor                              lead_sponsor_class
307 Hospital of PLA                       Other                  1
3E Therapeutics Corporation               Industry               1
3M                                        Industry               4
4SC AG                                    Industry               8
5 Santé                                   Other                  1
但我如何找到最常见的前10个群体呢?如果我这样做:

df.groupby(['lead_sponsor', 'lead_sponsor_class']).size().sort_values(ascending=False).head(10) 
然后我得到一个错误:

AttributeError:“Series”对象没有“sort\u values”属性

我认为你可以使用:

在is注释中

比.sort_值(升序=False)快。相对于序列对象大小的小n的头(n)

样本:

import pandas as pd

df = pd.DataFrame({'id': {0: 2837692, 1: 2837679, 2: 2837666, 3: 2837653, 4: 2837640}, 
                   'lead_sponsor': {0: 'a', 1: 'a', 2: 'a', 3: 's', 4: 's'}, 
                   'lead_sponsor_class': {0: 'Industry', 1: 'Other', 2: 'Other', 3: 'Other', 4: 'Other'}})

print (df)
        id lead_sponsor lead_sponsor_class
0  2837692            a           Industry
1  2837679            a              Other
2  2837666            a              Other
3  2837653            s              Other
4  2837640            s              Other

print (df.groupby(['lead_sponsor', 'lead_sponsor_class']).size())
lead_sponsor  lead_sponsor_class
a             Industry              1
              Other                 2
s             Other                 2
dtype: int64

print (df.groupby(['lead_sponsor', 'lead_sponsor_class']).size().sort_values(ascending=False).head(2))
lead_sponsor  lead_sponsor_class
s             Other                 2
a             Other                 2
dtype: int64

print (df.groupby(['lead_sponsor', 'lead_sponsor_class']).size().nlargest(2))
lead_sponsor  lead_sponsor_class
a             Other                 2
s             Other                 2
dtype: int64

对我来说,你的解决方案也很有效。我明白这是调用
.size()
a系列的结果吗?我想我很困惑,因为它看起来像一个数据帧,而不是一个系列(它将两列打印到左边的方式)。是的,它是
series
。您可以通过打印(键入(df.groupby(['lead\u-shandor','lead\u-shandor\u-class']).size())来测试它。
import pandas as pd

df = pd.DataFrame({'id': {0: 2837692, 1: 2837679, 2: 2837666, 3: 2837653, 4: 2837640}, 
                   'lead_sponsor': {0: 'a', 1: 'a', 2: 'a', 3: 's', 4: 's'}, 
                   'lead_sponsor_class': {0: 'Industry', 1: 'Other', 2: 'Other', 3: 'Other', 4: 'Other'}})

print (df)
        id lead_sponsor lead_sponsor_class
0  2837692            a           Industry
1  2837679            a              Other
2  2837666            a              Other
3  2837653            s              Other
4  2837640            s              Other

print (df.groupby(['lead_sponsor', 'lead_sponsor_class']).size())
lead_sponsor  lead_sponsor_class
a             Industry              1
              Other                 2
s             Other                 2
dtype: int64

print (df.groupby(['lead_sponsor', 'lead_sponsor_class']).size().sort_values(ascending=False).head(2))
lead_sponsor  lead_sponsor_class
s             Other                 2
a             Other                 2
dtype: int64

print (df.groupby(['lead_sponsor', 'lead_sponsor_class']).size().nlargest(2))
lead_sponsor  lead_sponsor_class
a             Other                 2
s             Other                 2
dtype: int64