使用Python/Pandas根据发生次数选择数据

使用Python/Pandas根据发生次数选择数据,python,numpy,pandas,Python,Numpy,Pandas,我的数据集基于芝加哥市食品检查的结果 import pandas as pd df = pd.read_csv("C:/~/Food_Inspections.csv") df.head() Out[1]: Inspection ID DBA Name \ 0 1609238 JR'SJAMAICAN TROPICAL CAFE,INC 1 1609245 BU

我的数据集基于芝加哥市食品检查的结果

import pandas as pd
df = pd.read_csv("C:/~/Food_Inspections.csv")

df.head()
Out[1]: 
   Inspection ID                        DBA Name  \
0         1609238  JR'SJAMAICAN TROPICAL CAFE,INC   
1         1609245                     BURGER KING   
2         1609237   DUNKIN DONUTS / BASKIN ROBINS   
3         1609258          CHIPOTLE MEXICAN GRILL   
4         1609244      ATARDECER ACAPULQUENO INC.   

                        AKA Name  License # Facility Type             Risk  \
0                            NaN  2442496.0    Restaurant    Risk 1 (High)   
1                    BURGER KING  2411124.0    Restaurant  Risk 2 (Medium)   
2  DUNKIN DONUTS / BASKIN ROBINS  1717126.0    Restaurant  Risk 2 (Medium)   
3         CHIPOTLE MEXICAN GRILL  1335044.0    Restaurant    Risk 1 (High)   
4     ATARDECER ACAPULQUENO INC.  1910118.0    Restaurant    Risk 1 (High)   
以下是每个设施在数据集中出现的频率:

df['Facility Type'].value_counts()
Out[3]: 
Restaurant                          14304
Grocery Store                        2647
School                               1155
Daycare (2 - 6 Years)                 367
Bakery                                316
Children's Services Facility          262
Daycare Above and Under 2 Years       248
Long Term Care                        169
Daycare Combo 1586                    142
Catering                              123
Liquor                                 78
Hospital                               68
Mobile Food Preparer                   67
Golden Diner                           65
Mobile Food Dispenser                  51
Special Event                          25
Shared Kitchen User (Long Term)        22
Daycare (Under 2 Years)                18
我试图创建一组新的数据,其中包含其设施类型在数据集中出现次数超过50次的行。我将如何处理这个问题

请注意,设施数量的列表要大得多,因为我删掉了大部分信息,因为这与手头的问题无关(因此,简单地删除“特殊事件”、“共享厨房用户”和“日托”不是我要查找的内容)。

IIUC然后您想:

例如:

In [9]:
df = pd.DataFrame({'type':list('aabcddddee'), 'value':np.random.randn(10)})
df

Out[9]:
  type     value
0    a -0.160041
1    a -0.042310
2    b  0.530609
3    c  1.238046
4    d -0.754779
5    d -0.197309
6    d  1.704829
7    d -0.706467
8    e -1.039818
9    e  0.511638

In [10]:
df.groupby('type').filter(lambda x: len(x) > 1)

Out[10]:
  type     value
0    a -0.160041
1    a -0.042310
4    d -0.754779
5    d -0.197309
6    d  1.704829
7    d -0.706467
8    e -1.039818
9    e  0.511638

未经测试,但应能正常工作

FT=df['Facility Type'].value_counts()
df[df['Facility Type'].isin(FT.index[FT>50])]
FT=df['Facility Type'].value_counts()
df[df['Facility Type'].isin(FT.index[FT>50])]