Python DataFrame:比较两个不同列的日期

Python DataFrame:比较两个不同列的日期,python,pandas,numpy,dataframe,Python,Pandas,Numpy,Dataframe,比较同一天不同列的日期 df 预期产量 a b output 0 2020-07-17 00:00:01.999 2020-07-17 12:00:01.999 True 1 2020-06-15 13:14:01.999 2020-02-14 12:00:01.999 False 2 2020-09-05

比较同一天不同列的日期

df

预期产量

            a                           b                       output
    0   2020-07-17 00:00:01.999    2020-07-17 12:00:01.999       True
    1   2020-06-15 13:14:01.999    2020-02-14 12:00:01.999       False
    2   2020-09-05 16:14:01.999    2020-09-05 11:59:01.999       True
    3   2020-11-17 23:14:01.999    2020-11-17 05:30:01.999       True

我应该将日期转换为字符串(strf date)并比较它们还是其他方式?

您拥有的是时间戳,要从中获取日期,您应该使用.date()方法,假设数据帧是df

df['output'] = df.apply(lambda row: row['a'].date() == row['b'].date(), axis=1)
如果列“a”和“b”是字符串,则使用

df['output'] = df.apply(lambda row: pd.Timestamp(row['a']).date() == pd.Timestamp(row['b']).date(), axis=1)

使用
pd.to_datetime
或从csv读取时,将datetime对象转换为datetime对象。然后使用
dt.date
函数比较日期

In [22]: df = pd.read_csv("a.csv", parse_dates=["a","b"])

In [23]: df
Out[23]:
                        a                       b
0 2020-07-17 00:00:01.999 2020-07-17 12:00:01.999
1 2020-06-15 13:14:01.999 2020-02-14 12:00:01.999
2 2020-09-05 16:14:01.999 2020-09-05 11:59:01.999
3 2020-11-17 23:14:01.999 2020-11-17 05:30:01.999

In [24]: df["c"] = df["a"].dt.date == df["b"].dt.date

In [25]: df
Out[25]:
                        a                       b      c
0 2020-07-17 00:00:01.999 2020-07-17 12:00:01.999   True
1 2020-06-15 13:14:01.999 2020-02-14 12:00:01.999  False
2 2020-09-05 16:14:01.999 2020-09-05 11:59:01.999   True
3 2020-11-17 23:14:01.999 2020-11-17 05:30:01.999   True

您应该首先使用如下方法将列转换为
datetime
列:

df['a'] = pd.to_datetime(df['a'])
df['b'] = pd.to_datetime(df['b'])
现在,使用创建一个新列,同时仅比较日期:

import numpy as np
df['output'] = np.where(df['a'].dt.date == df['b'].dt.date, True, False)
输出:

    a                           b                           output
0   2020-07-17 00:00:01.999    2020-07-17 12:00:01.999       True
1   2020-06-15 13:14:01.999    2020-02-14 12:00:01.999       False
2   2020-09-05 16:14:01.999    2020-09-05 11:59:01.999       True
3   2020-11-17 23:14:01.999    2020-11-17 05:30:01.999       True

np.where
看起来有点多余。@MateenUlhaq为什么
np.where
非常适合此类检查,并反过来创建一个新列。
xs==np.where(xs,True,False)
if
xs.dtype==bool
    a                           b                           output
0   2020-07-17 00:00:01.999    2020-07-17 12:00:01.999       True
1   2020-06-15 13:14:01.999    2020-02-14 12:00:01.999       False
2   2020-09-05 16:14:01.999    2020-09-05 11:59:01.999       True
3   2020-11-17 23:14:01.999    2020-11-17 05:30:01.999       True