Python 熊猫:比前滚更快的方法?
我正在为队列分析准备一些数据。我掌握的信息类似于可以使用以下代码生成的假数据集:Python 熊猫:比前滚更快的方法?,python,pandas,Python,Pandas,我正在为队列分析准备一些数据。我掌握的信息类似于可以使用以下代码生成的假数据集: import random import numpy as np import pandas as pd from pandas import Series, DataFrame # prepare some fake data to build frames subscription_prices = [x - 0.05 for x in range(100, 500, 25)] companies = ['i
import random
import numpy as np
import pandas as pd
from pandas import Series, DataFrame
# prepare some fake data to build frames
subscription_prices = [x - 0.05 for x in range(100, 500, 25)]
companies = ['initech','ingen','weyland','tyrell']
starting_periods = ['2014-12-10','2015-1-15','2014-11-20','2015-2-9']
# use the lists and dict from above to create a fake dataset
pieces = []
for company, period in zip(companies,starting_periods):
data = {
'company': company,
'revenue': random.choice(subscription_prices),
'invoice_date': pd.date_range(period,periods=12,freq='31D')
}
frame = DataFrame(data)
pieces.append(frame)
df = pd.concat(pieces, ignore_index=True)
我需要将发票日期正常化为每月一次。出于多种原因,最好将所有invoice\u date
值移到月末。我用了这个方法:
from pandas.tseries.offsets import *
df['rev_period'] = df['invoice_date'].apply(lambda x: MonthEnd(normalize=True).rollforward(x))
但是,即使只有一百万行(这是我实际数据集的大小),这也会变得非常缓慢:
In [11]: %time df['invoice_date'].apply(lambda x: MonthEnd(normalize=True).rollforward(x))
CPU times: user 3min 11s, sys: 1.44 s, total: 3min 12s
Wall time: 3min 17s
这种用熊猫抵消日期的方法最大的一点是,如果发票日期恰好在当月的最后一天,那么该日期将保持为当月的最后一天。另一件好事是,这将dtype
保持为datetime
,而df['invoice\u date'].apply(lambda x:x.strftime(“%Y-%m”)
更快,但将值转换为str
有没有一种矢量化的方法?我尝试了MonthEnd(normalize=True).前滚(df['invoice\u date'])
但出现了错误TypeError:无法将输入转换为时间戳
是的,存在:
df['rev_period'] = df['invoice_date'] + pd.offsets.MonthEnd(0)
应该至少快一个数量级