Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
更高效的时间增量计算python 3_Python_Python 3.x_Pandas_Numpy_Datetime - Fatal编程技术网

更高效的时间增量计算python 3

更高效的时间增量计算python 3,python,python-3.x,pandas,numpy,datetime,Python,Python 3.x,Pandas,Numpy,Datetime,对不起,我是python新手 我有一个实体数据框架,每个月记录一次值。对于数据帧中的每个唯一实体,我先找到最大值,然后找到最大值对应的月份。使用最大值月,可以以天为单位计算彼此唯一实体的月份与最大月份之间的时间差。这适用于小数据帧 我知道我的循环性能不佳,无法扩展到更大的数据帧(例如,3M行(+156MB))。经过几周的研究,我发现我的循环退化了,我觉得有一个numpy解决方案或者更像Python的东西。有人能找到一种更有效的方法来计算这个时间差吗 我在lambda函数中尝试了各种value.s

对不起,我是python新手

我有一个实体数据框架,每个月记录一次值。对于数据帧中的每个唯一实体,我先找到最大值,然后找到最大值对应的月份。使用最大值月,可以以天为单位计算彼此唯一实体的月份与最大月份之间的时间差。这适用于小数据帧

我知道我的循环性能不佳,无法扩展到更大的数据帧(例如,3M行(+156MB))。经过几周的研究,我发现我的循环退化了,我觉得有一个numpy解决方案或者更像Python的东西。有人能找到一种更有效的方法来计算这个时间差吗

我在lambda函数中尝试了各种value.shift(x)计算,但峰值不一致。我还尝试计算更多列,以最小化循环计算

import pandas as pd

df = pd.DataFrame({'entity':['A','A','A','A','B','B','B','C','C','C','C','C'], 'month': ['10/31/2018','11/30/2018','12/31/2018','1/31/2019','1/31/2009','2/28/2009','3/31/2009','8/31/2011','9/30/2011','10/31/2011','11/30/2011','12/31/2011'], 'value':['80','600','500','400','150','300','100','200','250','300','200','175'], 'month_number': ['1','2','3','4','1','2','3','1','2','3','4','5']})

df['month'] = df['month'].apply(pd.to_datetime)

for entity in set(df['entity']):
    # set peak value
    peak_value = df.loc[df['entity'] == entity, 'value'].max()
    # set peak value date
    peak_date = df.loc[(df['entity'] == entity) & (df['value'] == peak_value), 'month'].min()
    # subtract peak date from current date
    delta = df.loc[df['entity'] == entity, 'month'] - peak_date
    # update days_delta with delta in days
    df.loc[df['entity'] == entity, 'days_delta'] = delta
结果:

entity   month   value   month_number   days_delta
A       2018-10-31   80    1    0 days
A       2018-11-30    600    2  30 days
A       2018-12-31  500 3   61 days
A       2019-01-31  400 4   92 days
B       2009-01-31  150 1   -28 days
B       2009-02-28  300 2   0 days
B       2009-03-31  100 3   31 days
C       2011-08-31  200 1   -61 days
C       2011-09-30  250 2   -31 days
C       2011-10-31  300 3   0 days
C       2011-11-30  200 4   30 days
C       2011-12-31  175 5   61 days
安装程序 首先,我们还要确保
value
是数字

df = pd.DataFrame({
    'entity':['A','A','A','A','B','B','B','C','C','C','C','C'],
    'month': ['10/31/2018','11/30/2018','12/31/2018','1/31/2019',
              '1/31/2009','2/28/2009','3/31/2009','8/31/2011',
              '9/30/2011','10/31/2011','11/30/2011','12/31/2011'],
    'value':['80','600','500','400','150','300','100','200','250','300','200','175'],
    'month_number': ['1','2','3','4','1','2','3','1','2','3','4','5']
})

df['month'] = df['month'].apply(pd.to_datetime)
df['value'] = pd.to_numeric(df['value'])

transform
idxmax
安装程序 首先,我们还要确保
value
是数字

df = pd.DataFrame({
    'entity':['A','A','A','A','B','B','B','C','C','C','C','C'],
    'month': ['10/31/2018','11/30/2018','12/31/2018','1/31/2019',
              '1/31/2009','2/28/2009','3/31/2009','8/31/2011',
              '9/30/2011','10/31/2011','11/30/2011','12/31/2011'],
    'value':['80','600','500','400','150','300','100','200','250','300','200','175'],
    'month_number': ['1','2','3','4','1','2','3','1','2','3','4','5']
})

df['month'] = df['month'].apply(pd.to_datetime)
df['value'] = pd.to_numeric(df['value'])

transform
idxmax

df['month'].astype('datetime64[D]')
创建一个
numpy
高效数组。
df['month'].astype('datetime64[D]')
创建一个
numpy
高效数组。piRSquared的解决方案在370万条记录上运行了约4分钟。我将df.assign(…)更改为df['time_delta']=df.month-max_months,这样该列将在我的数据帧中永久存在。piRSquared的解决方案在370万条记录上运行了约4分钟。我将df.assign(…)更改为df['time_delta']=df.month-max_months,因此该列在我的数据帧中是永久的。