Python 熊猫:当数据为NaN时,无法执行逻辑操作
我在Pandas中有一个大数据框,2列可以有值,也可以是NaN(Null),如果没有分配给任何值 我想在这2个基础上填充第3列。如果不是NaN,则需要一些值。这项工作如下:Python 熊猫:当数据为NaN时,无法执行逻辑操作,python,pandas,Python,Pandas,我在Pandas中有一个大数据框,2列可以有值,也可以是NaN(Null),如果没有分配给任何值 我想在这2个基础上填充第3列。如果不是NaN,则需要一些值。这项工作如下: In [16]: import pandas as pd In [17]: import numpy as np In [18]: df = pd.DataFrame([[np.NaN, np.NaN],['John', 'Malone'],[np.NaN, np.NaN]], columns = ['col1', 'c
In [16]: import pandas as pd
In [17]: import numpy as np
In [18]: df = pd.DataFrame([[np.NaN, np.NaN],['John', 'Malone'],[np.NaN, np.NaN]], columns = ['col1', 'col2'])
In [19]: df
Out[19]:
col1 col2
0 NaN NaN
1 John Malone
2 NaN NaN
In [20]: df['col3'] = np.NaN
In [21]: df.loc[df['col1'].notnull(),'col3'] = 'I am ' + df['col1']
In [22]: df
Out[22]:
col1 col2 col3
0 NaN NaN NaN
1 John Malone I am John
2 NaN NaN NaN
这也适用于:
In [29]: df.loc[df['col1']== 'John','col3'] = 'I am ' + df['col2']
In [30]: df
Out[30]:
col1 col2 col3
0 NaN NaN NaN
1 John Malone I am Malone
2 NaN NaN NaN
但如果我没有将所有值都设为NaN,然后尝试最后一个loc,它将给我一个错误
In [31]: df = pd.DataFrame([[np.NaN, np.NaN],[np.NaN, np.NaN],[np.NaN, np.NaN]], columns = ['col1', 'col2'])
In [32]: df
Out[32]:
col1 col2
0 NaN NaN
1 NaN NaN
2 NaN NaN
In [33]: df['col3'] = np.NaN
In [34]: df.loc[df['col1']== 'John','col3'] = 'I am ' + df['col2']
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
c:\python33\lib\site-packages\pandas\core\ops.py in na_op(x, y)
552 result = expressions.evaluate(op, str_rep, x, y,
--> 553 raise_on_error=True, **eval_kwargs)
554 except TypeError:
c:\python33\lib\site-packages\pandas\computation\expressions.py in evaluate(op, op_str, a, b, raise_on_error, use_numexpr, **eval_kwargs)
217 return _evaluate(op, op_str, a, b, raise_on_error=raise_on_error,
--> 218 **eval_kwargs)
219 return _evaluate_standard(op, op_str, a, b, raise_on_error=raise_on_error)
c:\python33\lib\site-packages\pandas\computation\expressions.py in _evaluate_standard(op, op_str, a, b, raise_on_error, **eval_kwargs)
70 _store_test_result(False)
---> 71 return op(a, b)
72
c:\python33\lib\site-packages\pandas\core\ops.py in _radd_compat(left, right)
805 try:
--> 806 output = radd(left, right)
807 except TypeError:
c:\python33\lib\site-packages\pandas\core\ops.py in <lambda>(x, y)
802 def _radd_compat(left, right):
--> 803 radd = lambda x, y: y + x
804 # GH #353, NumPy 1.5.1 workaround
TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('<U32') dtype('<U32') dtype('<U32')
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
<ipython-input-34-3b2873f8749b> in <module>()
----> 1 df.loc[df['col1']== 'John','col3'] = 'I am ' + df['col2']
c:\python33\lib\site-packages\pandas\core\ops.py in wrapper(left, right, name, na_op)
616 lvalues = lvalues.values
617
--> 618 return left._constructor(wrap_results(na_op(lvalues, rvalues)),
619 index=left.index, name=left.name,
620 dtype=dtype)
c:\python33\lib\site-packages\pandas\core\ops.py in na_op(x, y)
561 result = np.empty(len(x), dtype=x.dtype)
562 mask = notnull(x)
--> 563 result[mask] = op(x[mask], y)
564 else:
565 raise TypeError("{typ} cannot perform the operation {op}".format(typ=type(x).__name__,op=str_rep))
c:\python33\lib\site-packages\pandas\core\ops.py in _radd_compat(left, right)
804 # GH #353, NumPy 1.5.1 workaround
805 try:
--> 806 output = radd(left, right)
807 except TypeError:
808 raise
c:\python33\lib\site-packages\pandas\core\ops.py in <lambda>(x, y)
801
802 def _radd_compat(left, right):
--> 803 radd = lambda x, y: y + x
804 # GH #353, NumPy 1.5.1 workaround
805 try:
TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('<U32') dtype('<U32') dtype('<U32')
[31]中的:df=pd.DataFrame([[np.NaN,np.NaN],[np.NaN,np.NaN],[np.NaN,np.NaN]],列=['col1','col2'])
In[32]:df
出[32]:
col1 col2
0楠楠楠
1楠楠楠
2楠楠楠
在[33]中:df['col3']=np.NaN
在[34]:df.loc[df['col1']='John','col3']='I am'+df['col2']
---------------------------------------------------------------------------
TypeError回溯(最近一次调用上次)
c:\python33\lib\site packages\pandas\core\ops.py在na_op(x,y)中
552结果=表达式。计算(op,str_rep,x,y,
-->553 raise_on_error=真,**eval_kwargs)
554除类型错误外:
c:\python33\lib\site packages\pandas\computation\expressions.py in evaluate(op,op_str,a,b,raise_on_error,use_numexpr,**eval_kwargs)
217返回评估(op,op,str,a,b,raise_on_error=raise_on_error,
-->218**eval_-kwargs)
219返回评估标准(op、op、a、b、raise_on_error=raise_on_error)
c:\python33\lib\site packages\pandas\computation\expressions.py在评估标准中(op、op、a、b、raise、on错误、**eval\U kwargs)
70存储测试结果(错误)
--->71返回op(a、b)
72
c:\python33\lib\site packages\pandas\core\ops.py in\u radd\u compat(左、右)
805尝试:
-->806输出=radd(左、右)
807除类型错误外:
c:\python33\lib\site packages\pandas\core\ops.py in(x,y)
802 def_radd_compat(左、右):
-->803 radd=λx,y:y+x
804#GH 353,NumPy 1.5.1解决方案
TypeError:ufunc“add”不包含签名类型与dtype匹配的循环(“这里的问题是,如果整个列是np.nan
,则它可能存储为浮点,而不是对象(文本)
因此,您可以:
if not np.all(pandas.isnull(df['mycol'])):
df = my_string_operation(df)
您还可以将相关列强制为对象
类型
df['mycol'] = df['mycol'].astype(object)
df = my_string_operation(df)
我想说的是,如果有任何值不是空的,那么这一行实际上所做的就是在第1列的值中添加一个字符串
df.loc[df['col1'].notnull(),'col3'] = 'I am ' + df['col1']
因此,您只需检查是否存在任何不为null的值,然后仅在存在以下值时执行该操作:
if df['col1'].notnull().any():
df['col3'] = 'I am ' + df['col1']
在以这种方式运行col3列之前,您也不需要创建它。“帮助!”--在这种情况下,帮助是什么样子的?您可以使用try
块来捕获错误。在执行操作之前,您可以首先检查列的所有值是否都是NaN
。您并没有真正提出问题,所以很难给出具体的答案。在SQL上,我可以像Nulls一样轻松地完成这项操作不要让逻辑失败。在这种情况下,使用NaN(在所有值上)会导致以前的工作行工作。因此,希望有人能够指出一种正确的方法来执行我需要的操作。