Python 使用pandas查找两列之间的差异

Python 使用pandas查找两列之间的差异,python,pandas,dataframe,nan,subtraction,Python,Pandas,Dataframe,Nan,Subtraction,我想找出数据帧中int类型的两列之间的差异。我正在使用python 2.7。列如下所示- >>> df INVOICED_QUANTITY QUANTITY_SHIPPED 0 15 NaN 1 20 NaN 2 7 NaN 3 7

我想找出数据帧中int类型的两列之间的差异。我正在使用python 2.7。列如下所示-

>>> df
   INVOICED_QUANTITY  QUANTITY_SHIPPED
0                 15               NaN
1                 20               NaN
2                  7               NaN
3                  7               NaN
4                  7               NaN
现在,我想从发票数量中减去发货数量&我做以下操作-

>>> df['Diff'] = df['QUANTITY_INVOICED'] - df['SHIPPED_QUANTITY']
>>> df
   QUANTITY_INVOICED  SHIPPED_QUANTITY  Diff
0                 15               NaN   NaN
1                 20               NaN   NaN
2                  7               NaN   NaN
3                  7               NaN   NaN
4                  7               NaN   NaN
我该怎么照顾楠家?我希望得到以下结果,因为我希望NaN被视为0(零)-

我不想做
df.fillna(0)
。总而言之,我会尝试下面的方法&它有效,但不会有什么不同-

>>> df['Sum'] = df[['QUANTITY_INVOICED', 'SHIPPED_QUANTITY']].sum(axis=1)
>>> df
   INVOICED_QUANTITY  QUANTITY_SHIPPED  Diff  Sum
0                 15               NaN   NaN   15
1                 20               NaN   NaN   20
2                  7               NaN   NaN    7
3                  7               NaN   NaN    7
4                  7               NaN   NaN    7

我认为一个简单的0填充NaN将帮助您解决问题

df['Diff'] = df['INVOICED_QUANTITY'] - df['QUANTITY_SHIPPED'].fillna(0)

Out[153]: 
   INVOICED_QUANTITY  QUANTITY_SHIPPED  Diff
0                 15               NaN    15
1                 20               NaN    20
2                  7               NaN     7
3                  7               NaN     7
4                  7               NaN     7

您可以使用
sub
方法执行减法-此方法允许将
NaN
值视为指定值:

df['Diff'] = df['INVOICED_QUANTITY'].sub(df['QUANTITY_SHIPPED'], fill_value=0)
产生:

   INVOICED_QUANTITY  QUANTITY_SHIPPED  Diff
0                 15               NaN    15
1                 20               NaN    20
2                  7               NaN     7
3                  7               NaN     7
4                  7               NaN     7

另一种简洁的方法是:填写列中缺少的值(创建列的副本),然后按常规进行减法

这两种方法几乎相同,尽管
sub
效率稍高一些,因为它不需要事先生成列的副本;它只是“动态”填充缺少的值:


#李建勋-我不想做一件事(0)。还有其他选择吗?已经编辑了我的问题,请看一看。
fillna()
只返回一个副本,而不是修改基础框架。我修改了代码以适应您的需要。
   INVOICED_QUANTITY  QUANTITY_SHIPPED  Diff
0                 15               NaN    15
1                 20               NaN    20
2                  7               NaN     7
3                  7               NaN     7
4                  7               NaN     7
In [46]: %timeit df['INVOICED_QUANTITY'] - df['QUANTITY_SHIPPED'].fillna(0)
10000 loops, best of 3: 144 µs per loop

In [47]: %timeit df['INVOICED_QUANTITY'].sub(df['QUANTITY_SHIPPED'], fill_value=0)
10000 loops, best of 3: 81.7 µs per loop