Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/278.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 基于另一个熊猫数据帧有条件地提取熊猫行_Python_Pandas_Indexing_Dataframe_Conditional Statements - Fatal编程技术网

Python 基于另一个熊猫数据帧有条件地提取熊猫行

Python 基于另一个熊猫数据帧有条件地提取熊猫行,python,pandas,indexing,dataframe,conditional-statements,Python,Pandas,Indexing,Dataframe,Conditional Statements,我有两个数据帧: df1: col1 col2 1 2 1 3 2 4 df2: col1 2 3 我想提取df1中的所有行,其中df1的col2不在df2的col1中。因此,在这种情况下: col1 col2 2 4 我首先尝试: df1[df1['col2'] not in df2['col1']] 但它返回: TypeError:“Series”对象是可变的,因此无法对其进行散列 然后我试着: df1[df1['col2'

我有两个数据帧:

df1:

col1    col2
1       2
1       3
2       4
df2:

col1
2
3
我想提取
df1
中的所有行,其中
df1
col2
不在
df2
col1
中。因此,在这种情况下:

col1    col2
2       4
我首先尝试:

df1[df1['col2'] not in df2['col1']]
但它返回:

TypeError:“Series”对象是可变的,因此无法对其进行散列

然后我试着:

df1[df1['col2'] not in df2['col1'].tolist]
但它返回:

TypeError:类型为“instancemethod”的参数不可iterable

您可以与
~
一起使用以反转布尔掩码:

print (df1['col2'].isin(df2['col1']))
0     True
1     True
2    False
Name: col2, dtype: bool

print (~df1['col2'].isin(df2['col1']))
0    False
1    False
2     True
Name: col2, dtype: bool

print (df1[~df1['col2'].isin(df2['col1'])])
   col1  col2
2     2     4
计时

In [8]: %timeit (df1.query('col2 not in @df2.col1'))
1000 loops, best of 3: 1.57 ms per loop

In [9]: %timeit (df1[~df1['col2'].isin(df2['col1'])])
1000 loops, best of 3: 466 µs per loop
使用方法:

更大DFs的时机:

In [44]: df1.shape
Out[44]: (30000000, 2)

In [45]: df2.shape
Out[45]: (20000000, 1)

In [46]: %timeit (df1[~df1['col2'].isin(df2['col1'])])
1 loop, best of 3: 5.56 s per loop

In [47]: %timeit (df1.query('col2 not in @df2.col1'))
1 loop, best of 3: 5.96 s per loop
In [44]: df1.shape
Out[44]: (30000000, 2)

In [45]: df2.shape
Out[45]: (20000000, 1)

In [46]: %timeit (df1[~df1['col2'].isin(df2['col1'])])
1 loop, best of 3: 5.56 s per loop

In [47]: %timeit (df1.query('col2 not in @df2.col1'))
1 loop, best of 3: 5.96 s per loop