Python 如何用None-pandas 0.24.1替换NaN和NaT_Python_Pandas

Python 如何用None-pandas 0.24.1替换NaN和NaT

python pandas

Python 如何用None-pandas 0.24.1替换NaN和NaT,python,pandas,Python,Pandas,我需要将pandas.Series中的所有NaN和NaT替换为None 我试过这个： def replaceMissing(ser): return ser.where(pd.notna(ser), None) 但它不起作用： import pandas as pd NaN = float('nan') NaT = pd.NaT floats1 = pd.Series((NaN, NaN, 2.71828, -2.71828)) floats2 = pd.Series((2.718

我需要将

pandas.Series

中的所有

NaN

和

NaT

替换为

None

我试过这个：

def replaceMissing(ser):
    return ser.where(pd.notna(ser), None)

但它不起作用：

import pandas as pd

NaN = float('nan')
NaT = pd.NaT

floats1 = pd.Series((NaN, NaN, 2.71828, -2.71828))
floats2 = pd.Series((2.71828, -2.71828, 2.71828, -2.71828))
dates = pd.Series((NaT, NaT, pd.Timestamp("2019-07-09"), pd.Timestamp("2020-07-09")))


def replaceMissing(ser):
    return ser.where(pd.notna(ser), None)


print(pd.__version__)
print(80*"-")
print(replaceMissing(dates))
print(80*"-")
print(replaceMissing(floats1))
print(80*"-")
print(replaceMissing(floats2))

如您所见，

NaT

未被替换：

0.24.1
--------------------------------------------------------------------------------
0          NaT
1          NaT
2   2019-07-09
3   2020-07-09
dtype: datetime64[ns]
--------------------------------------------------------------------------------
0       None
1       None
2    2.71828
3   -2.71828
dtype: object
--------------------------------------------------------------------------------
0    2.71828
1   -2.71828
2    2.71828
3   -2.71828
dtype: float64

然后我尝试了这个额外的步骤：

def replaceMissing(ser):
    ser = ser.where(pd.notna(ser), None)
    return ser.replace({pd.NaT: None})

但它仍然不起作用。由于某种原因，它会返回

NaN

s：

0.24.1
--------------------------------------------------------------------------------
0                   None
1                   None
2    2019-07-09 00:00:00
3    2020-07-09 00:00:00
dtype: object
--------------------------------------------------------------------------------
0        NaN
1        NaN
2    2.71828
3   -2.71828
dtype: float64
--------------------------------------------------------------------------------
0    2.71828
1   -2.71828
2    2.71828
3   -2.71828
dtype: float64

我还尝试将该系列转换为

对象

：

def replaceMissing(ser):
    return ser.astype("object").where(pd.notna(ser), None)

但现在，最后一个系列也是

对象

，即使它没有缺失值：

0.24.1
--------------------------------------------------------------------------------
0                   None
1                   None
2    2019-07-09 00:00:00
3    2020-07-09 00:00:00
dtype: object
--------------------------------------------------------------------------------
0       None
1       None
2    2.71828
3   -2.71828
dtype: object
--------------------------------------------------------------------------------
0    2.71828
1   -2.71828
2    2.71828
3   -2.71828
dtype: object

我希望它保持

float64

。因此，我添加了

推断对象

：

def replaceMissing(ser):
    return ser.astype("object").where(pd.notna(ser), None).infer_objects()

但它再次带来了

NaN

s：

0.24.1
--------------------------------------------------------------------------------
0                   None
1                   None
2    2019-07-09 00:00:00
3    2020-07-09 00:00:00
dtype: object
--------------------------------------------------------------------------------
0        NaN
1        NaN
2    2.71828
3   -2.71828
dtype: float64
--------------------------------------------------------------------------------
0    2.71828
1   -2.71828
2    2.71828
3   -2.71828
dtype: float64

我觉得必须有一个简单的方法来做到这一点。有人知道吗？

对于我来说，您的第二个解决方案的工作变更单在

0.24.2

中进行了测试，但是

dtype

s被更改为object，因为混合类型-

None

s和

float

s或

timestamp

s：

def replaceMissing(ser):
    return ser.replace({pd.NaT: None}).where(pd.notna(ser), None)

print(pd.__version__)
print(80*"-")
print(replaceMissing(dates))
print(80*"-")
print(replaceMissing(dates).apply(type))
print(80*"-")
print(replaceMissing(floats1))
print(80*"-")
print(replaceMissing(floats1).apply(type))
print(80*"-")
print(replaceMissing(floats2))

0.24.2
--------------------------------------------------------------------------------
0无
1无
2    2019-07-09 00:00:00
3    2020-07-09 00:00:00
数据类型：对象
--------------------------------------------------------------------------------
0
1.
2对于我来说，您的第二个解决方案的工作变更单在0.24.2
中进行了测试，但是dtype
s更改为object，因为混合类型-None
s与float
s或timestamp
s：
def replaceMissing(ser):
    return ser.replace({pd.NaT: None}).where(pd.notna(ser), None)

print(pd.__version__)
print(80*"-")
print(replaceMissing(dates))
print(80*"-")
print(replaceMissing(dates).apply(type))
print(80*"-")
print(replaceMissing(floats1))
print(80*"-")
print(replaceMissing(floats1).apply(type))
print(80*"-")
print(replaceMissing(floats2))


0.24.2
--------------------------------------------------------------------------------
0无
1无
2    2019-07-09 00:00:00
3    2020-07-09 00:00:00
数据类型：对象
--------------------------------------------------------------------------------
0
1.
2抱歉，这不起作用，None
是一个python对象数据类型，因此您必须将列的数据类型转换为object
，为了做到这一点，您试图通过这样做实现什么？问题是@EdChum为什么不？如果一个序列只有非缺失的浮点，它可以保持为float64
序列。None
与NaN
不同，因此它不能用float
dtype表示，因此一旦插入None
它就会立即将dtype更改为object
。问题仍然是，如果一系列float64
值没有NaN
，那么它应该保持一系列float64
，而不是变成一系列对象。不，我问你为什么要首先用None
替换NaN
，NaT
？对不起，这行不通，None
是python对象的数据类型，所以你必须将列的数据类型转换为对象
，才能这样做，问题是你想通过这样做来实现什么？@EdChum为什么不呢？如果一个序列只有非缺失的浮点，它可以保持为float64
序列。None
与NaN
不同，因此它不能用float
dtype表示，因此一旦插入None
它就会立即将dtype更改为object
。问题仍然是，如果一系列float64
值没有NaN
，那么它应该保持一系列float64
，而不是变成一系列对象。不，我在问你为什么要首先用None
替换NaN
，NaT
？这对我也适用，但我不明白为什么如果你切换顺序，它就不起作用了。@spiderface-我想因为熊猫最好使用相同的数据类型-列中的所有值都是float或datetime（因此NaN
或NaT
），这是一种有点黑客行为；）我想不太黑客行为的方式是使用一个条件：def replacemission（ser）：if ser.isna（）.any（）：返回ser.astype（“object”）。其中（pd.notna（ser），None）其他：返回ser
@spiderface-是，但如果其他情况下不需要-def replacemission（ser）：return ser.astype（“object”）。where（pd.notna（ser），None）
但我希望尽可能保留该数据类型。这对我也适用，但我不明白为什么如果您切换顺序，它就不起作用。@spiderface-我想因为熊猫最好使用相同的数据类型-列中的所有值都是float或datetime（因此NaN
或NaT
），这是一种有点黑客行为；）我想不太黑客行为的方式是使用一个条件：def replacemission（ser）：if ser.isna（）.any（）：返回ser.astype（“object”）。其中（pd.notna（ser），None）其他：返回ser
@spiderface-是，但如果其他情况下不需要-def replacemission（ser）：返回ser.astype（“object”）。其中（pd.notna（ser），None）
但我希望尽可能保留数据类型。