使用Python/Pandas根据发生次数选择数据
我的数据集基于芝加哥市食品检查的结果使用Python/Pandas根据发生次数选择数据,python,numpy,pandas,Python,Numpy,Pandas,我的数据集基于芝加哥市食品检查的结果 import pandas as pd df = pd.read_csv("C:/~/Food_Inspections.csv") df.head() Out[1]: Inspection ID DBA Name \ 0 1609238 JR'SJAMAICAN TROPICAL CAFE,INC 1 1609245 BU
import pandas as pd
df = pd.read_csv("C:/~/Food_Inspections.csv")
df.head()
Out[1]:
Inspection ID DBA Name \
0 1609238 JR'SJAMAICAN TROPICAL CAFE,INC
1 1609245 BURGER KING
2 1609237 DUNKIN DONUTS / BASKIN ROBINS
3 1609258 CHIPOTLE MEXICAN GRILL
4 1609244 ATARDECER ACAPULQUENO INC.
AKA Name License # Facility Type Risk \
0 NaN 2442496.0 Restaurant Risk 1 (High)
1 BURGER KING 2411124.0 Restaurant Risk 2 (Medium)
2 DUNKIN DONUTS / BASKIN ROBINS 1717126.0 Restaurant Risk 2 (Medium)
3 CHIPOTLE MEXICAN GRILL 1335044.0 Restaurant Risk 1 (High)
4 ATARDECER ACAPULQUENO INC. 1910118.0 Restaurant Risk 1 (High)
以下是每个设施在数据集中出现的频率:
df['Facility Type'].value_counts()
Out[3]:
Restaurant 14304
Grocery Store 2647
School 1155
Daycare (2 - 6 Years) 367
Bakery 316
Children's Services Facility 262
Daycare Above and Under 2 Years 248
Long Term Care 169
Daycare Combo 1586 142
Catering 123
Liquor 78
Hospital 68
Mobile Food Preparer 67
Golden Diner 65
Mobile Food Dispenser 51
Special Event 25
Shared Kitchen User (Long Term) 22
Daycare (Under 2 Years) 18
我试图创建一组新的数据,其中包含其设施类型在数据集中出现次数超过50次的行。我将如何处理这个问题
请注意,设施数量的列表要大得多,因为我删掉了大部分信息,因为这与手头的问题无关(因此,简单地删除“特殊事件”、“共享厨房用户”和“日托”不是我要查找的内容)。IIUC然后您想:
例如:
In [9]:
df = pd.DataFrame({'type':list('aabcddddee'), 'value':np.random.randn(10)})
df
Out[9]:
type value
0 a -0.160041
1 a -0.042310
2 b 0.530609
3 c 1.238046
4 d -0.754779
5 d -0.197309
6 d 1.704829
7 d -0.706467
8 e -1.039818
9 e 0.511638
In [10]:
df.groupby('type').filter(lambda x: len(x) > 1)
Out[10]:
type value
0 a -0.160041
1 a -0.042310
4 d -0.754779
5 d -0.197309
6 d 1.704829
7 d -0.706467
8 e -1.039818
9 e 0.511638
未经测试,但应能正常工作
FT=df['Facility Type'].value_counts()
df[df['Facility Type'].isin(FT.index[FT>50])]
FT=df['Facility Type'].value_counts()
df[df['Facility Type'].isin(FT.index[FT>50])]