Python 2.7 熊猫和Numpy中缺失数据的处理_Python 2.7_Numpy_Pandas_Missing Data

Python 2.7 熊猫和Numpy中缺失数据的处理

python-2.7 numpy pandas

Python 2.7 熊猫和Numpy中缺失数据的处理,python-2.7,numpy,pandas,missing-data,Python 2.7,Numpy,Pandas,Missing Data,我有以下数据样本。我想 a）在C列中，将np.NaN替换为999 b）在D列中，将“”置于np.NaN中我的两次尝试都没有成功，我也不知道为什么 import pandas from pandas import DataFrame import numpy as np df = DataFrame({'A' : ['foo', 'foo', 'foo', 'foo', 'bar', 'bar', 'bar', 'bar'],

我有以下数据样本。我想

a）在C列中，将
np.NaN替换为999
b）在D列中，将“”置于
np.NaN
中

我的两次尝试都没有成功，我也不知道为什么

import pandas
from pandas import DataFrame
import numpy as np


df = DataFrame({'A' : ['foo', 'foo', 'foo', 'foo',
                        'bar', 'bar', 'bar', 'bar'],
                 'B' : ['one', 'one', 'two', 'three',
                        'two', 'two', 'one', 'three'],
                 'C' : [1, np.NaN, 1, 2, np.NaN, 1, 1, 2], 'D' : [2, '', 1, 1, '', 2, 2, 1]})

print df

df.C.fillna(999)
df.D.replace('', np.NaN)

print df

Output: 

 A      B   C  D
0  foo    one   1  2
1  foo    one NaN   
2  foo    two   1  1
3  foo  three   2  1
4  bar    two NaN   
5  bar    two   1  2
6  bar    one   1  2
7  bar  three   2  1
     A      B   C  D
0  foo    one   1  2
1  foo    one NaN   
2  foo    two   1  1
3  foo  three   2  1
4  bar    two NaN   
5  bar    two   1  2
6  bar    one   1  2
7  bar  three   2  1

这些操作返回数据的副本（大多数操作的行为相同），除非您明确表示，否则它们不会在适当位置操作（默认值为

inplace=False

），请参阅和：

或分配回：

df['C'] = df.C.fillna(999)
df['D'] = df.D.replace('', np.NaN)

另外，我强烈建议您使用下标运算符

[]

访问列，而不是使用点运算符

作为属性来访问列，以避免不明确的行为

In [60]:
df = pd.DataFrame({'A' : ['foo', 'foo', 'foo', 'foo',
                        'bar', 'bar', 'bar', 'bar'],
                 'B' : ['one', 'one', 'two', 'three',
                        'two', 'two', 'one', 'three'],
                 'C' : [1, np.NaN, 1, 2, np.NaN, 1, 1, 2], 'D' : [2, '', 1, 1, '', 2, 2, 1]})

df.C.fillna(999, inplace =True)
df.D.replace('', np.NaN, inplace=True)
df

Out[60]:
     A      B    C   D
0  foo    one    1   2
1  foo    one  999 NaN
2  foo    two    1   1
3  foo  three    2   1
4  bar    two  999 NaN
5  bar    two    1   2
6  bar    one    1   2
7  bar  three    2   1

In [60]:
df = pd.DataFrame({'A' : ['foo', 'foo', 'foo', 'foo',
                        'bar', 'bar', 'bar', 'bar'],
                 'B' : ['one', 'one', 'two', 'three',
                        'two', 'two', 'one', 'three'],
                 'C' : [1, np.NaN, 1, 2, np.NaN, 1, 1, 2], 'D' : [2, '', 1, 1, '', 2, 2, 1]})

df.C.fillna(999, inplace =True)
df.D.replace('', np.NaN, inplace=True)
df

Out[60]:
     A      B    C   D
0  foo    one    1   2
1  foo    one  999 NaN
2  foo    two    1   1
3  foo  three    2   1
4  bar    two  999 NaN
5  bar    two    1   2
6  bar    one    1   2
7  bar  three    2   1