Python Pandas:如何提取与Filter1或filter2匹配的数据帧行
我有一个熊猫数据框,看起来像这样,例如:Python Pandas:如何提取与Filter1或filter2匹配的数据帧行,python,pandas,Python,Pandas,我有一个熊猫数据框,看起来像这样,例如: label Y88_N diff div fold 0 25273.626713 17348.581851 2.016404 2.016404 1 29139.510491 -4208.868050 0.604304 -0.604304 2 34388.439717 -30147.834699 0.458903 -0.458903 3
label Y88_N diff div fold
0 25273.626713 17348.581851 2.016404 2.016404
1 29139.510491 -4208.868050 0.604304 -0.604304
2 34388.439717 -30147.834699 0.458903 -0.458903
3 69704.254089 -32976.152490 0.116894 -0.116894
4 193717.440783 -71359.494098 0.286045 -0.286045
5 28996.634708 10934.944533 2.031293 2.031293
6 45021.782930 680.437629 1.056383 1.056383
但是有数千行。
当值位于“fold”列中时,我希望获得一个新的数据框,其中包含行
大于2或小于0.6。
因此,在最后,数据帧应该如下所示:
label Y88_N diff div fold
0 25273.626713 17348.581851 2.016404 2.016404
1 29139.510491 -4208.868050 0.604304 -0.604304
5 28996.634708 10934.944533 2.031293 2.031293
我尝试过不同的方法,比如:
def ranged(start, end, step):
x = start
while x < end:
yield x
x += step
df2 = df[~df['fold'].isin(ranged(-0.6, 2, 0.000001))]
def范围(开始、结束、步骤):
x=开始
而x
或
df2=df[(df['fold']>=2)和(df['fold']在第二个示例中,您只需要使用
(或)而不是&
(和):
df2 = df[(df['fold'] >= 2) | (df['fold'] <= -0.6)]
df2
Out[6]:
label Y88_N diff div fold
0 0 25273.626713 17348.581851 2.016404 2.016404
1 1 29139.510491 -4208.868050 0.604304 -0.604304
5 5 28996.634708 10934.944533 2.031293 2.031293
df2=df[(df['fold']>=2)|(df['fold']在第二个示例中,您只需要使用
(或)而不是&
(和):
df2 = df[(df['fold'] >= 2) | (df['fold'] <= -0.6)]
df2
Out[6]:
label Y88_N diff div fold
0 0 25273.626713 17348.581851 2.016404 2.016404
1 1 29139.510491 -4208.868050 0.604304 -0.604304
5 5 28996.634708 10934.944533 2.031293 2.031293
df2=df[(df['fold']>=2)|(df['fold']你可以做
In [276]: df[(df['fold'] >= 2) | (df['fold'] <= -0.6)]
Out[276]:
label Y88_N diff div fold
0 0 25273.626713 17348.581851 2.016404 2.016404
1 1 29139.510491 -4208.868050 0.604304 -0.604304
5 5 28996.634708 10934.944533 2.031293 2.031293
而且,pd.eval()
适用于包含大型数组的表达式
In [278]: df[pd.eval('df.fold >=2 | df.fold <=-0.6')]
Out[278]:
label Y88_N diff div fold
0 0 25273.626713 17348.581851 2.016404 2.016404
1 1 29139.510491 -4208.868050 0.604304 -0.604304
5 5 28996.634708 10934.944533 2.031293 2.031293
[278]中的:df[pd.eval('df.fold>=2 | df.fold你可以做
In [276]: df[(df['fold'] >= 2) | (df['fold'] <= -0.6)]
Out[276]:
label Y88_N diff div fold
0 0 25273.626713 17348.581851 2.016404 2.016404
1 1 29139.510491 -4208.868050 0.604304 -0.604304
5 5 28996.634708 10934.944533 2.031293 2.031293
而且,pd.eval()
适用于包含大型数组的表达式
In [278]: df[pd.eval('df.fold >=2 | df.fold <=-0.6')]
Out[278]:
label Y88_N diff div fold
0 0 25273.626713 17348.581851 2.016404 2.016404
1 1 29139.510491 -4208.868050 0.604304 -0.604304
5 5 28996.634708 10934.944533 2.031293 2.031293
In[278]:df[pd.eval('df.fold>=2 | df.fold)这看起来不错,但每种方法(运行速度、内存等)的优点/缺点是什么?非常感谢您的回答,这真的很完整。我不知道为什么,但我尝试过类似于df[(df['fold'>=2)|(df['fold']这看起来不错,但每种方法(运行速度、内存等)的优点/缺点是什么?非常感谢您的回答,这真的很完整。我不知道为什么,但我尝试过类似于df[(df['fold]>=2)|(df['fold']只是一个技术点df2=df[(df['fold]>=2)和(df['fold']仅仅是一个技术点df2=df[(df['fold']>=2)和(df['fold']