Python pandas.DataFrame.equals的合同_Python_Pandas

Python pandas.DataFrame.equals的合同

python pandas

Python pandas.DataFrame.equals的合同,python,pandas,Python,Pandas,我有一个简单的函数测试用例，它返回一个可能包含NaN的df。我在测试输出和预期输出是否相等 >>> output Out[1]: r t ts tt ttct 0 2048 30 0 90 1 1 4096 90 1 30 1 2 0 70 2 65 1 [3 rows x 5 columns] >>> expected Out[2]: r t ts t

我有一个简单的函数测试用例，它返回一个可能包含NaN的df。我在测试输出和预期输出是否相等

>>> output
Out[1]: 
      r   t  ts  tt  ttct
0  2048  30   0  90     1
1  4096  90   1  30     1
2     0  70   2  65     1

[3 rows x 5 columns]
>>> expected
Out[2]: 
      r   t  ts  tt  ttct
0  2048  30   0  90     1
1  4096  90   1  30     1
2     0  70   2  65     1

[3 rows x 5 columns]
>>> output == expected
Out[3]: 
      r     t    ts    tt  ttct
0  True  True  True  True  True
1  True  True  True  True  True
2  True  True  True  True  True

但是，由于NaN，我不能简单地依赖

操作符。我的印象是，解决这个问题的适当方法是使用equals方法。从：

尽管如此：

>>> expected.equals(log_events)
Out[4]: False

稍微挖掘一下，就会发现框架之间的差异：

>>> output._data
Out[5]: 
BlockManager
Items: Index([u'r', u't', u'ts', u'tt', u'ttct'], dtype='object')
Axis 1: Int64Index([0, 1, 2], dtype='int64')
FloatBlock: [r], 1 x 3, dtype: float64
IntBlock: [t, ts, tt, ttct], 4 x 3, dtype: int64
>>> expected._data
Out[6]: 
BlockManager
Items: Index([u'r', u't', u'ts', u'tt', u'ttct'], dtype='object')
Axis 1: Int64Index([0, 1, 2], dtype='int64')
IntBlock: [r, t, ts, tt, ttct], 5 x 3, dtype: int64

强制输出浮点块为int，或强制预期的int块为float，测试通过

显然，平等有不同的含义，

DataFrame.equals

执行的测试在某些情况下可能很有用。尽管如此，

和

DataFrame.equals

之间的差异还是让我感到沮丧，似乎是不一致的。在伪代码中，我希望其行为匹配：

(self.index == other.index).all() \
and (self.columns == other.columns).all() \
and (self.values.fillna(SOME_MAGICAL_VALUE) == other.values.fillna(SOME_MAGICAL_VALUE)).all().all()

然而，事实并非如此。我的想法是错误的，还是熊猫API中的不一致？此外，考虑到NaN的可能存在，我应该执行什么样的测试？

.equals（）

就是这么做的。它测试元素之间的精确相等性、NAN（和NAT）的定位、数据类型相等性和索引相等性。可以将此视为测试类型的df2，但它们实际上不必是同一对象，因此，

df.equals（df.copy（））

始终为真

您的示例失败，因为不同的数据类型不相等（尽管它们可能是等效的）。因此，您可以使用

com.array\u等价物

，或者

（df==df2）.all（）.all（）

（如果您没有

nans

）

这是np.array_equal的替代品，它因nan位置检测（和对象数据类型）而被破坏

它主要在内部使用。也就是说，如果您希望增强等效性（例如，元素在

意义和

nan

位置匹配中是等效的），请在github上打开一个问题。（最好提交一份PR！）

我使用了一个变通方法来挖掘

MagicMock

实例：

assert mock_instance.call_count == 1
call_args = mock_instance.call_args[0]
call_kwargs = mock_instance.call_args[1]
pd.testing.assert_frame_equal(call_kwargs['dataframe'], pd.DataFrame())

assert mock_instance.call_count == 1
call_args = mock_instance.call_args[0]
call_kwargs = mock_instance.call_args[1]
pd.testing.assert_frame_equal(call_kwargs['dataframe'], pd.DataFrame())