Python 如何使用dataframe中的默认字符串更改无效的字符串模式？_Python_Pandas

Python 如何使用dataframe中的默认字符串更改无效的字符串模式？

python pandas

Python 如何使用dataframe中的默认字符串更改无效的字符串模式？,python,pandas,Python,Pandas,我有一个如下所示的数据帧 name birthdate ----------------- john 21011990 steve 14021986 bob alice 13020198 我想检测生日列中的无效值，然后更改值 “生日”列的使用日期格式为“DDMMYYYY”。但在dataframe中有一个无效的格式，也称为“13020198”。我想将无效数据更改为31125000 我想要下面这样的结果 name birthdate ----------------- jo

我有一个如下所示的数据帧

name   birthdate
-----------------
john   21011990
steve  14021986
bob    
alice  13020198

我想检测生日列中的无效值，然后更改值

“生日”列的使用日期格式为“DDMMYYYY”。但在dataframe中有一个无效的格式，也称为“13020198”。我想将无效数据更改为31125000

我想要下面这样的结果

name   birthdate
-----------------
john   21011990
steve  14021986
bob    31125000
alice  31125000

谢谢

这将是我保留您指定格式的解决方案：

import pandas as pd
import numpy as np

data = {'name':['J','S','B','A'],'birthdate':[21011990,14021986,'',13020198]}
df = pd.DataFrame(data)
df['birthdate'] = pd.to_datetime(df['birthdate'],format='%d%m%Y',errors='coerce').astype(str)
df['birthdate'] = df['birthdate'].str.replace('-','',regex=True).replace('NaT',31125000,regex=True).astype(int)
print(df)

输出：

  name  birthdate
0    J   19900121
1    S   19860214
2    B   31125000
3    A   31125000

当然，如果您保留datatime格式会更容易，那么您可以简单地使用：

df['birthdate'] = pd.to_datetime(df['birthdate'],format='%d%m%Y',errors='coerce').fillna(31125000)
print(df)

你会得到：

  name            birthdate
0    J  1990-01-21 00:00:00
1    S  1986-02-14 00:00:00
2    B             31125000
3    A             31125000

这将是我保持您指定格式的解决方案：

import pandas as pd
import numpy as np

data = {'name':['J','S','B','A'],'birthdate':[21011990,14021986,'',13020198]}
df = pd.DataFrame(data)
df['birthdate'] = pd.to_datetime(df['birthdate'],format='%d%m%Y',errors='coerce').astype(str)
df['birthdate'] = df['birthdate'].str.replace('-','',regex=True).replace('NaT',31125000,regex=True).astype(int)
print(df)

输出：

  name  birthdate
0    J   19900121
1    S   19860214
2    B   31125000
3    A   31125000

当然，如果您保留datatime格式会更容易，那么您可以简单地使用：

df['birthdate'] = pd.to_datetime(df['birthdate'],format='%d%m%Y',errors='coerce').fillna(31125000)
print(df)

你会得到：

  name            birthdate
0    J  1990-01-21 00:00:00
1    S  1986-02-14 00:00:00
2    B             31125000
3    A             31125000

您可以先创建无效的日期掩码，然后更新其值：

mask = df.birthdate.apply(lambda x: pd.to_datetime(x, format='%d%m%Y', errors='coerce')).isna()

df.loc[mask, 'birthdate'] = 31125000

    name    birthdate
0   john    21011990
1   steve   14021986
2   bob     31125000
3   alice   31125000

您可以先创建无效的日期掩码，然后更新其值：

mask = df.birthdate.apply(lambda x: pd.to_datetime(x, format='%d%m%Y', errors='coerce')).isna()

df.loc[mask, 'birthdate'] = 31125000

    name    birthdate
0   john    21011990
1   steve   14021986
2   bob     31125000
3   alice   31125000

使用

errors='concurve'

创建掩码，并测试创建的缺失值。如果没有匹配格式，则最后通过以下方式设置新值：

或@Chris从以下评论中获得解决方案：

使用

errors='concurve'

创建掩码，并测试创建的缺失值。如果没有匹配格式，则最后通过以下方式设置新值：

或@Chris从以下评论中获得解决方案：

为什么

13020198

无效？它遵循DDMMYYYY格式。您想将日期保留为datetime，还是保留格式？

df.loc[pd.to_datetime（df['birthdate'，format='%d%m%Y'，errors='improve'）。isna（），'birthdate']='31125000'

。不足以检查最后四位是否在某个范围内，比如1900年到2020年之间？为什么

13020198

无效？它遵循DDMMYYYY格式。您想将日期保留为日期时间，还是保留格式？

df.loc[pd.to_datetime（df['birthdate'，format='%d%m%Y'，errors='improve'）。isna（），'birthdate']='31125000'

。？不足以检查最后四位数字是否在范围内，如1900到2020之间？