如何比较unicode日期u';2006-07-23';格式和25-06-15 08:42:43.830000000 PM使用python熊猫?
基本上,unicode格式将从datepicker获得,并且如何比较unicode日期u';2006-07-23';格式和25-06-15 08:42:43.830000000 PM使用python熊猫?,python,python-2.7,pandas,Python,Python 2.7,Pandas,基本上,unicode格式将从datepicker获得,并且25-06-15 08:42:43.830000000 PM此格式来自一列 我的数据帧是: query,status,received_date a,closed,25-06-15 08:42:43.830000000 PM b,pending,27-06-15 08:42:43.830000000 PM ab,closed,28-06-15 08:42:43.830000000 PM bb,pending,29-06-15 08:42
25-06-15 08:42:43.830000000 PM
此格式来自一列
我的数据帧是:
query,status,received_date
a,closed,25-06-15 08:42:43.830000000 PM
b,pending,27-06-15 08:42:43.830000000 PM
ab,closed,28-06-15 08:42:43.830000000 PM
bb,pending,29-06-15 08:42:43.830000000 PM
我将从datepicker获得两个日期,格式如下(u'2015-06-23',u'2015-06-29')
。如何比较此unicode日期和Received_date列
我必须显示这两个日期之间的数据(将从datepicker获取)将它们转换为datetime
dates = (u'2015-06-23',u'2015-06-29')
df = df.set_index('received_date')
df.index = pd.DatetimeIndex(df.index)
df[dates[0]:dates[1]]
我想您首先需要转换
日期
,然后再转换列接收日期
,并提取。最后与掩码一起使用
进行过滤:
#datetimes changed for better testing
print df
query status received_date
0 a closed 20-06-15 08:42:43.830000000 PM
1 b pending 27-06-15 08:42:43.830000000 PM
2 ab closed 28-06-15 08:42:43.830000000 PM
3 bb pending 30-06-15 08:42:43.830000000 PM
dates = (u'2015-06-23',u'2015-06-29')
dates = pd.to_datetime(dates).date
print dates
[datetime.date(2015, 6, 23) datetime.date(2015, 6, 29)]
df['received_date'] = pd.to_datetime(df['received_date']).dt.date
print df
query status received_date
0 a closed 2015-06-20
1 b pending 2015-06-27
2 ab closed 2015-06-28
3 bb pending 2015-06-30
print (df['received_date'] > dates[0]) & (df['received_date'] < dates[1])
0 False
1 True
2 True
3 False
Name: received_date, dtype: bool
df = df[(df['received_date'] > dates[0]) & (df['received_date'] < dates[1])]
print df
query status received_date
1 b pending 2015-06-27
2 ab closed 2015-06-28
测试(len(df)=40k
):
测试代码:
#length is 40k
df = pd.concat([df]*10000).reset_index(drop=True)
df1 = df.copy()
df2 = df.copy()
def a(df):
dates = (u'2015-06-23',u'2015-06-29')
df = df.set_index('received_date')
df.index = pd.DatetimeIndex(df.index)
return df[dates[0]:dates[1]]
def b(df):
dates = (u'2015-06-23',u'2015-06-29')
dates = pd.to_datetime(dates).date
df['received_date'] = pd.to_datetime(df['received_date']).dt.date
df = df[(df['received_date'] > dates[0]) & (df['received_date'] < dates[1])]
return df
def c(df):
dates = (u'2015-06-23',u'2015-06-29')
df['received_date'] = pd.to_datetime(df['received_date'])
df = df.set_index('received_date')
return df[dates[0]:dates[1]]
print a(df)
print b(df1)
print c(df2)
#长度为40k
df=pd.concat([df]*10000)。重置索引(drop=True)
df1=df.copy()
df2=df.copy()
def a(df):
日期=(u'2015-06-23',u'2015-06-29')
df=df.set\u索引('received\u date')
df.index=pd.DatetimeIndex(df.index)
返回日期[0]:日期[1]]
def b(df):
日期=(u'2015-06-23',u'2015-06-29')
日期=pd.to_日期时间(日期)。日期
df['received_date']=pd.to_datetime(df['received_date']).dt.date
df=df[(df['received_date']>日期[0])和(df['received_date']<日期[1])]
返回df
def c(df):
日期=(u'2015-06-23',u'2015-06-29')
df['received_date']=pd.to_datetime(df['received_date'])
df=df.set\u索引('received\u date')
返回日期[0]:日期[1]]
打印a(df)
打印b(df1)
打印c(df2)
我必须比较unicode日期和接收日期列,并显示这些日期之间的数据如果我选择u'2015-06-23',u'2015-06-29'意味着它将显示这些日期之间的数据
In [569]: %timeit a(df)
1 loops, best of 3: 12.2 s per loop
In [570]: %timeit b(df1)
10 loops, best of 3: 92.3 ms per loop
In [571]: %timeit c(df2)
100 loops, best of 3: 6.57 ms per loop
#length is 40k
df = pd.concat([df]*10000).reset_index(drop=True)
df1 = df.copy()
df2 = df.copy()
def a(df):
dates = (u'2015-06-23',u'2015-06-29')
df = df.set_index('received_date')
df.index = pd.DatetimeIndex(df.index)
return df[dates[0]:dates[1]]
def b(df):
dates = (u'2015-06-23',u'2015-06-29')
dates = pd.to_datetime(dates).date
df['received_date'] = pd.to_datetime(df['received_date']).dt.date
df = df[(df['received_date'] > dates[0]) & (df['received_date'] < dates[1])]
return df
def c(df):
dates = (u'2015-06-23',u'2015-06-29')
df['received_date'] = pd.to_datetime(df['received_date'])
df = df.set_index('received_date')
return df[dates[0]:dates[1]]
print a(df)
print b(df1)
print c(df2)