Python 为什么这个apply/lambda不等式的比较顺序很重要？_Python_Pandas

Python 为什么这个apply/lambda不等式的比较顺序很重要？

python pandas

Python 为什么这个apply/lambda不等式的比较顺序很重要？,python,pandas,Python,Pandas,对不起，这不是一个好标题。但简单的例子是：（0.16.1版）工作正常： df.apply( lambda x: x > x.mean() ) x y 0 False False 1 False False 2 True False 3 True True 这不应该是一样的吗 df.apply( lambda x: x.mean() < x ) --------------------------------------------

对不起，这不是一个好标题。但简单的例子是：

（0.16.1版）

工作正常：

df.apply( lambda x: x > x.mean() )

       x      y
0  False  False
1  False  False
2   True  False
3   True   True

这不应该是一样的吗

df.apply( lambda x: x.mean() < x )
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-467-6f32d50055ea> in <module>()
----> 1 df.apply( lambda x: x.mean() < x )

C:\Users\ei\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\frame.pyc in apply(self, func, axis, broadcast, raw, reduce, args, **kwds)
   3707                     if reduce is None:
   3708                         reduce = True
-> 3709                     return self._apply_standard(f, axis, reduce=reduce)
   3710             else:
   3711                 return self._apply_broadcast(f, axis)

C:\Users\ei\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\frame.pyc in _apply_standard(self, func, axis, ignore_failures, reduce)
   3797             try:
   3798                 for i, v in enumerate(series_gen):
-> 3799                     results[i] = func(v)
   3800                     keys.append(v.name)
   3801             except Exception as e:

<ipython-input-467-6f32d50055ea> in <lambda>(x)
----> 1 df.apply( lambda x: x.mean() < x )

C:\Users\ei\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\ops.pyc in wrapper(self, other, axis)
    586             return NotImplemented
    587         elif isinstance(other, (np.ndarray, pd.Index)):
--> 588             if len(self) != len(other):
    589                 raise ValueError('Lengths must match to compare')
    590             return self._constructor(na_op(self.values, np.asarray(other)),

TypeError: ('len() of unsized object', u'occurred at index x')

df.apply（λx:x.mean（）1 df.应用（λx:x.平均值（）3709返回自我。应用标准（f，轴，减少=减少）
3710其他：
3711返回自应用广播（f轴）
C:\Users\ei\AppData\Local\Continuum\Anaconda\lib\site packages\pandas\core\frame.pyc在应用标准中（self、func、axis、ignore\u failures、reduce）
3797尝试：
3798用于枚举中的i、v（系列）：
->3799结果[i]=func（v）
3800个键。附加（v.name）
3801例外情况除外，如e：
in（x）
---->1 df.应用（λx:x.平均值（）588如果len（self）！=len（其他）：
589 raise VALUERROR（'长度必须匹配才能进行比较'）
590返回自构造函数（na_op（self.values，np.asarray（other）），
TypeError:（“未调整大小的对象的len（）”，u“出现在索引x'）

举个反例，这两种方法都有效：

df.mean() < df

df > df.mean()

df.mean（）df.mean（）

编辑
终于找到了这个错误-
如本期所示-
left=0>s起作用（例如python标量）被视为0-dim数组（它是一个np.int64）（当（打电话来。）我会把它标记为臭虫。请随意挖掘
当在比较运算符的左侧使用带有
numpy
数据类型（如np.int64或np.float64等）的比较运算符时，会出现问题。一个简单的解决方法可能是@santon在他的回答中指出的，将数字转换为python标量，而不是使用
numpy
标量

旧版：
我试过熊猫0.16.2
我在你的原始df上做了以下操作-

In [22]: df['z'] = df['x'].mean() < df['x'] In [23]: df Out[23]: x y z 0 1 1 False 1 2 1 False 2 3 1 True 3 4 9 True In [27]: df['z'].mean() < df['z'] --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-27-afc8a7b869b4> in <module>() ----> 1 df['z'].mean() < df['z'] C:\Anaconda3\lib\site-packages\pandas\core\ops.py in wrapper(self, other, axis) 586 return NotImplemented 587 elif isinstance(other, (np.ndarray, pd.Index)): --> 588 if len(self) != len(other): 589 raise ValueError('Lengths must match to compare') 590 return self._constructor(na_op(self.values, np.asarray(other)), TypeError: len() of unsized object

我认为这与大于运算符的重载方式有关。在使用重载函数时，如果左边或右边的数据类型不同，顺序就很重要。（Python有一种复杂的方法来确定要使用哪个重载函数。）您可以通过强制转换
mean（）的结果来让代码正常工作。
（即
numpy.float64
）转换为简单浮点：

df.apply( lambda x: float(x.mean()) < x )

df.apply（λx:float（x.mean（））
出于某种原因，熊猫代码似乎将numpy.float64 视为数组，这可能是它失败的原因。尝试添加括号，因为Python可能将其解析为（（lambda x:x.mean（）），而不是（lambda x:（x.mean（）这对我也不起作用。它对你起作用了吗？我确实用了一些括号，但没有任何效果，我真的不知道它们会如何改变任何东西。你不必猜我做了什么，我发布了我使用的100%的代码。我的确切代码（如发布的）对你起作用吗？（如上所述，我在熊猫0.16.1中）你发布了你所做的-df.apply（lambda x:x>x.mean（）），这不是重新分配返回到df 。没有重新分配，它工作正常，df.apply（lambda x:x.mean（）没有给我任何错误我的熊猫版本是'0.16.2'@JohnE终于找到了这个错误- In [24]: df['z'] < df['x'] Out[24]: 0 True 1 True 2 True 3 True dtype: bool In [25]: df['z'] < df['x'].mean() Out[25]: 0 True 1 True 2 True 3 True Name: z, dtype: bool In [26]: df['x'].mean() < df['z'] Out[26]: 0 False 1 False 2 False 3 False Name: z, dtype: bool In [10]: df['x'].mean() < df['x'] --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-10-4e5dab1545af> in <module>() ----> 1 df['x'].mean() < df['x'] /opt/anaconda/envs/np18py27-1.9/lib/python2.7/site-packages/pandas/core/ops.pyc in wrapper(self, other, axis) 586 return NotImplemented 587 elif isinstance(other, (np.ndarray, pd.Index)): --> 588 if len(self) != len(other): 589 raise ValueError('Lengths must match to compare') 590 return self._constructor(na_op(self.values, np.asarray(other)), TypeError: len() of unsized object In [11]: df['x'] < df['x'].mean() Out[11]: 0 True 1 True 2 False 3 False Name: x, dtype: bool pip install pandas --upgrade df.apply( lambda x: float(x.mean()) < x )