Pandas 基于列值选择行
我有一个像这样的数据框Pandas 基于列值选择行,pandas,slice,Pandas,Slice,我有一个像这样的数据框 data = {'ID': [1,2,3,4,5,6,7,8,9], 'Doc':['Order','Order','Inv','Order','Order','Shp','Order', 'Order','Inv'], 'Rep':[101,101,101,102,102,102,103,103,103]} frame = pd.DataFrame(data) Doc ID Rep 0 Order 1 101
data = {'ID': [1,2,3,4,5,6,7,8,9],
'Doc':['Order','Order','Inv','Order','Order','Shp','Order', 'Order','Inv'],
'Rep':[101,101,101,102,102,102,103,103,103]}
frame = pd.DataFrame(data)
Doc ID Rep
0 Order 1 101
1 Order 2 101
2 Inv 3 101
3 Order 4 102
4 Order 5 102
5 Shp 6 102
6 Order 7 103
7 Order 8 103
8 Inv 9 103
frame[frame.Rep == frame.Rep[frame.Doc == 'Inv']]
现在,我想为只有Inv的单据类型的代表选择行
我想要一个数据帧
Doc ID Rep
0 Order 1 101
1 Order 2 101
2 Inv 3 101
6 Order 7 103
7 Order 8 103
8 Inv 9 103
所有的销售代表都有文档类型的订单,所以我试着做类似的事情
data = {'ID': [1,2,3,4,5,6,7,8,9],
'Doc':['Order','Order','Inv','Order','Order','Shp','Order', 'Order','Inv'],
'Rep':[101,101,101,102,102,102,103,103,103]}
frame = pd.DataFrame(data)
Doc ID Rep
0 Order 1 101
1 Order 2 101
2 Inv 3 101
3 Order 4 102
4 Order 5 102
5 Shp 6 102
6 Order 7 103
7 Order 8 103
8 Inv 9 103
frame[frame.Rep == frame.Rep[frame.Doc == 'Inv']]
但我犯了个错误
ValueError:只能比较标签相同的系列对象您可以使用两次-首先按条件获取所有Rep
,然后按以下条件获取所有行:
解决方案包括:
计时:
np.random.seed(123)
N = 1000000
L = ['Order','Shp','Inv']
frame = pd.DataFrame({'Doc': np.random.choice(L, N, p=[0.49, 0.5, 0.01]),
'ID':np.arange(1,N+1),
'Rep':np.random.randint(1000, size=N)})
print (frame.head())
Doc ID Rep
0 Shp 1 95
1 Order 2 147
2 Order 3 282
3 Shp 4 82
4 Shp 5 746
In [204]: %timeit (frame.groupby('Rep').filter(lambda x: 'Inv' in x['Doc'].values))
1 loop, best of 3: 250 ms per loop
In [205]: %timeit (frame[frame['Rep'].isin(frame.loc[frame['Doc'] == 'Inv', 'Rep'])])
100 loops, best of 3: 17.3 ms per loop
In [206]: %%timeit
...: a = frame.query("Doc == 'Inv'")['Rep']
...: frame.query("Rep in @a")
...:
100 loops, best of 3: 14.5 ms per loop
编辑:
谢谢你的建议:
df = frame.query("Rep in %s" % frame.query("Doc == 'Inv'")['Rep'].tolist())
print (df)
Doc ID Rep
0 Order 1 101
1 Order 2 101
2 Inv 3 101
6 Order 7 103
7 Order 8 103
8 Inv 9 103
我得到的输出
Doc ID Rep
0 Order 1 101
1 Order 2 101
2 Inv 3 101
3 Order 4 102
4 Order 5 102
6 Order 7 103
7 Order 8 103
8 Inv 9 103
谢谢,但我不想要。我只想要只有库存和订单的代表。这可能不可读,但可以与
frame.query(“在%s中的代表”%frame.query(“Doc=='Inv'))['Rep'].tolist())
Doc ID Rep
0 Order 1 101
1 Order 2 101
2 Inv 3 101
3 Order 4 102
4 Order 5 102
6 Order 7 103
7 Order 8 103
8 Inv 9 103