Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/322.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 大熊猫日期时间比较性能的改进_Python_Pandas_Performance_Function_Datetime - Fatal编程技术网

Python 大熊猫日期时间比较性能的改进

Python 大熊猫日期时间比较性能的改进,python,pandas,performance,function,datetime,Python,Pandas,Performance,Function,Datetime,我有一个熊猫数据框,它的值如下: df['ORDER_RECEIVED_DATE'].head() Out[91]: 0 2018-01-01 1 2018-01-01 2 2018-01-01 3 2018-01-01 4 2018-01-01 我正在定义一个自定义函数来创建另一列“Period”,具体取决于与“ORDER\u RECEIVED\u date”中的日期值的比较 但在约100万张唱片上,它的速度惊人地慢。如何加快速度?试试: old_date = '01-

我有一个熊猫数据框,它的值如下:

df['ORDER_RECEIVED_DATE'].head()
Out[91]: 
0   2018-01-01
1   2018-01-01
2   2018-01-01
3   2018-01-01
4   2018-01-01
我正在定义一个自定义函数来创建另一列“Period”,具体取决于与“ORDER\u RECEIVED\u date”中的日期值的比较

但在约100万张唱片上,它的速度惊人地慢。如何加快速度?

试试:

old_date = '01-01-1970'
future_date = '01-01-2050'
cuts = pd.to_datetime([old_date, '01-04-2018', '04-05-2018', '05-31-2018',
                '08-02-2018', '09-27-2018', '01-03-2019',
                '02-14-2019', '03-28-2019', future_date])

df = pd.DataFrame({'date': pd.date_range('01-01-2018', '04-05-2019', freq='MS')})
df['ped'] = pd.cut(df['date'], bins=cuts).cat.codes
输出:

+----+---------------------+-------+
|    | date                |   ped |
|----+---------------------+-------|
|  0 | 2018-01-01 00:00:00 |     0 |
|  1 | 2018-02-01 00:00:00 |     1 |
|  2 | 2018-03-01 00:00:00 |     1 |
|  3 | 2018-04-01 00:00:00 |     1 |
|  4 | 2018-05-01 00:00:00 |     2 |
|  5 | 2018-06-01 00:00:00 |     3 |
|  6 | 2018-07-01 00:00:00 |     3 |
|  7 | 2018-08-01 00:00:00 |     3 |
|  8 | 2018-09-01 00:00:00 |     4 |
|  9 | 2018-10-01 00:00:00 |     5 |
| 10 | 2018-11-01 00:00:00 |     5 |
| 11 | 2018-12-01 00:00:00 |     5 |
| 12 | 2019-01-01 00:00:00 |     5 |
| 13 | 2019-02-01 00:00:00 |     6 |
| 14 | 2019-03-01 00:00:00 |     7 |
| 15 | 2019-04-01 00:00:00 |     8 |
+----+---------------------+-------+

编辑:结束日期存在问题,即,
2019-03-28
在此代码中给出7,而不是代码中的
8
。这可以通过将阈值减少1天来解决。

假设您创建了一个日期数组

dates = pd.to_datetime([
    '01-04-2018', '04-05-2018', '05-31-2018',
    '08-02-2018', '09-27-2018', '01-03-2019',
    '02-14-2019', '03-28-2019'
]).values
您可以使用
searchsorted
,它将按照
dates

df.assign(Period=dates.searchsorted(df.ORDER_RECEIVED_DATE))

   ORDER_RECEIVED_DATE  Period
0           2018-01-01       0
1           2018-02-01       1
2           2018-03-01       1
3           2018-04-01       1
4           2018-05-01       2
5           2018-06-01       3
6           2018-07-01       3
7           2018-08-01       3
8           2018-09-01       4
9           2018-10-01       5
10          2018-11-01       5
11          2018-12-01       5
12          2019-01-01       5
13          2019-02-01       6
14          2019-03-01       7
15          2019-04-01       8
​

查看
pd.cut
dates = pd.to_datetime([
    '01-04-2018', '04-05-2018', '05-31-2018',
    '08-02-2018', '09-27-2018', '01-03-2019',
    '02-14-2019', '03-28-2019'
]).values
df.assign(Period=dates.searchsorted(df.ORDER_RECEIVED_DATE))

   ORDER_RECEIVED_DATE  Period
0           2018-01-01       0
1           2018-02-01       1
2           2018-03-01       1
3           2018-04-01       1
4           2018-05-01       2
5           2018-06-01       3
6           2018-07-01       3
7           2018-08-01       3
8           2018-09-01       4
9           2018-10-01       5
10          2018-11-01       5
11          2018-12-01       5
12          2019-01-01       5
13          2019-02-01       6
14          2019-03-01       7
15          2019-04-01       8
​