Python 熊猫：如何删除包含无效的月/日列组合的行，例如二月三十日？_Python_Pandas

Python 熊猫：如何删除包含无效的月/日列组合的行，例如二月三十日？

python pandas

Python 熊猫：如何删除包含无效的月/日列组合的行，例如二月三十日？,python,pandas,Python,Pandas,我的源数据使用31列作为日值，每个月有一行。我已经将31天列合并为一天列，现在我想将年、月和日列合并为datetime（？）列，以便可以按年/月/日对行进行排序融化后，我的数据框看起来是这样的： year month day prcp 0 1893 1 01 0.0 1 1893 2 01 0.0 2 1893 3 01 0.0 3 1893 4 01 NaN 4

我的源数据使用31列作为日值，每个月有一行。我已经将31天列合并为一天列，现在我想将年、月和日列合并为datetime（？）列，以便可以按年/月/日对行进行排序

融化后，我的数据框看起来是这样的：

       year  month day   prcp
0      1893      1  01    0.0
1      1893      2  01    0.0
2      1893      3  01    0.0
3      1893      4  01    NaN
4      1893      5  01    NaN
5      1893      6  01    NaN
6      1893      7  01    NaN
7      1893      8  01    0.0
8      1893      9  01   10.0
9      1893     10  01    0.0
10     1893     11  01    0.0
11     1893     12  01    NaN
12     1894      1  01    NaN
13     1894      2  01    0.0
14     1894      3  01    NaN
...

接下来，我将尝试创建一个可以排序的“time”列，使用year、month和day列作为datetime构造函数的参数。我尝试过使用这种方法：

def make_datetime(y, m, d):
    return(datetime(year=y, month=m, day=d))

df['time'] = np.vectorize(make_datetime)(df['year'].astype(int), df['month'].astype(int), df['day'].astype(int))

上面的内容并不能让我达到目的，因为它在月/日列组合不合理的情况下会失败，例如非闰年的2月29日、4月31日等。我想接下来要做的是以某种方式将datetime（）调用包装在try/catch中，当它因不兼容的月/日组合而发出嘎嘎声时，我应该将行放在catch块中。如果不在所有行上执行for循环，我将如何执行该操作？还是有更好的方法来破解这个螺母

提前感谢您提供的任何建议或见解。

这里有一种方法可以使用您的建议，即在

中包装，然后尝试/除了子句
from datetime import datetime

def dater(x):
    try:
        return datetime(year=x['year'], month=x['month'], day=x['day'])
    except ValueError:
        return None

df['date'] = df.apply(dater, axis=1)

#    year  month  day       date
# 0  1890      2   29        NaT
# 1  1891      2   29        NaT
# 2  1892      2   29 1892-02-29
# 3  1893      2   29        NaT
# 4  1894      2   29        NaT
# 5  1895      2   29        NaT
# 6  1896      2   29 1896-02-29
# 7  1897      2   29        NaT
# 8  1898      2   29        NaT

df = df.dropna(subset=['date'])

#    year  month  day       date
# 2  1892      2   29 1892-02-29
# 6  1896      2   29 1896-02-29

下面是一种使用您的建议的方法，即在子句中包装，然后尝试/除了子句
from datetime import datetime

def dater(x):
    try:
        return datetime(year=x['year'], month=x['month'], day=x['day'])
    except ValueError:
        return None

df['date'] = df.apply(dater, axis=1)

#    year  month  day       date
# 0  1890      2   29        NaT
# 1  1891      2   29        NaT
# 2  1892      2   29 1892-02-29
# 3  1893      2   29        NaT
# 4  1894      2   29        NaT
# 5  1895      2   29        NaT
# 6  1896      2   29 1896-02-29
# 7  1897      2   29        NaT
# 8  1898      2   29        NaT

df = df.dropna(subset=['date'])

#    year  month  day       date
# 2  1892      2   29 1892-02-29
# 6  1896      2   29 1896-02-29

您可以将df直接传递到到\u datetime

pd.to_datetime(df,errors='coerce')
Out[905]: 
#          NaT
#          NaT
#   1892-02-29
#          NaT
#          NaT
#          NaT
#   1896-02-29
#          NaT
#          NaT
dtype: datetime64[ns]
df['New']=pd.to_datetime(df,errors='coerce')
df.dropna()
Out[907]: 
   year  month  day        New
#  1892      2   29 1892-02-29
#  1896      2   29 1896-02-29

您可以将df直接传递到到\u datetime

pd.to_datetime(df,errors='coerce')
Out[905]: 
#          NaT
#          NaT
#   1892-02-29
#          NaT
#          NaT
#          NaT
#   1896-02-29
#          NaT
#          NaT
dtype: datetime64[ns]
df['New']=pd.to_datetime(df,errors='coerce')
df.dropna()
Out[907]: 
   year  month  day        New
#  1892      2   29 1892-02-29
#  1896      2   29 1896-02-29

谢谢，非常有用，它会根据需要删除行。当我下一次使用df.sort_值（by=['date']）对数据帧进行排序时，
行保持以前的顺序，即没有明显的排序，第0行是1月1日，第1行是2月1日，等等，我希望排序到第0行是1月1日，第1行是1月2日，等等。这可以纠正以允许更可排序的日期列吗？df=df.sort_值（'date'，升序=True）
应该可以为您做到这一点。不幸的是，我在使用sort_values（）
调用时也得到了相同的结果。也许我在将原始的31天列合并到pd.to_datetime（）中时出错了
要使生成的datetimes仅可在当月排序？当我查看df.info（）
输出时，它将day列显示为一个对象，因此可能需要在调用pd.to_datetime（）
之前以某种方式将该列转换为整数。我已经使用df['day']尝试了上面的建议。astype（int）
但没有效果，day列仍然是一个对象。请尝试df['day']=pd.to_numeric（df['day']）
。感谢这一点，非常有用，它会根据需要删除行。下次使用df.sort_值对数据框进行排序时（按=['date']）
行保持以前的顺序，即没有以前的明显排序，第0行是1月1日，第1行是2月1日，等等，我希望将其排序为第0行是1月1日，第1行是1月2日，等等。是否可以纠正这一点以允许更可排序的日期列？df=df.sort\u值（'date'，升序=True）
应该为您做这件事。不幸的是，我在使用sort\u values（）
调用时也得到了相同的结果。也许我把原来的31天列分解成pd.to\u datetime（）
使生成的日期时间只能在当月排序？当我查看df.info（）时
output它将day列显示为一个对象，因此在调用pd.to_datetime（）
之前，我可能需要以某种方式将该列转换为整数。我已经尝试了我上面建议的方法，使用df['day'].astype（int）
但没有效果，day列仍然是一个对象。请尝试df['day']=pd.to\u numeric（df['day']）
+1，非常好。我想补充一点，如果用户的数据框中有其他列，您也可以选择列：df['New']=pd.to_datetime（df[cols]，errors='concure'）
+1，非常好。我想补充一点，如果用户的数据框中有其他列，您也可以选择列：df['New']=pd.to_datetime（df[cols]，errors='compresse'）