如何在python中将时间序列数据移动一个月
我尝试将如何在python中将时间序列数据移动一个月,python,pandas,time-series,Python,Pandas,Time Series,我尝试将DataFrame.shift()函数与freq='M'一起使用,但当我偏移1个月时,日期会偏移到月末,而不是下个月的同一日期 我有没有办法用一个月的时间来抵消呢。i、 e.如果我有一个时间序列数据帧,并且第一个索引值是8月23日的话,在移动一个月后,我希望9月23日的索引值在8月23日的索引值之前 请建议一种方法。这将节省大量时间,否则,我将不得不使用循环 我想在此数据帧中创建一个新列,这样对应于索引20-10-01 10:00:00和ticker AAPL的新列中的值应该是时间20-
DataFrame.shift()
函数与freq='M'
一起使用,但当我偏移1个月时,日期会偏移到月末,而不是下个月的同一日期
我有没有办法用一个月的时间来抵消呢。i、 e.如果我有一个时间序列数据帧,并且第一个索引值是8月23日的话,在移动一个月后,我希望9月23日的索引值在8月23日的索引值之前
请建议一种方法。这将节省大量时间,否则,我将不得不使用循环
我想在此数据帧中创建一个新列,这样对应于索引20-10-01 10:00:00和ticker AAPL的新列中的值应该是时间20-11-01 10:00:00和ticker AAPL的列“c”的值。其他行也是如此。示例数据:
Timestamp('2019-10-01 10:00:00+0000', tz='UTC'): 56.5675,
Timestamp('2019-10-01 16:00:00+0000', tz='UTC'): 56.2725,
Timestamp('2019-10-01 22:00:00+0000', tz='UTC'): 56.2925,
Timestamp('2019-10-02 04:00:00+0000', tz='UTC'): 55.6525,
Timestamp('2019-10-02 10:00:00+0000', tz='UTC'): 54.8025,
Timestamp('2019-10-02 16:00:00+0000', tz='UTC'): 54.625,
Timestamp('2019-10-02 22:00:00+0000', tz='UTC'): 54.625,
Timestamp('2019-10-03 04:00:00+0000', tz='UTC'): 54.825,
Timestamp('2019-10-03 10:00:00+0000', tz='UTC'): 54.7075,
Timestamp('2019-10-03 16:00:00+0000', tz='UTC'): 55.1575,
Timestamp('2019-10-03 22:00:00+0000', tz='UTC'): 55.125,
Timestamp('2019-10-04 04:00:00+0000', tz='UTC'): 55.88,
Timestamp('2019-10-04 10:00:00+0000', tz='UTC'): 56.51,
Timestamp('2019-10-04 16:00:00+0000', tz='UTC'): 56.77,
Timestamp('2019-10-04 22:00:00+0000', tz='UTC'): 56.7375,
Timestamp('2019-10-07 04:00:00+0000', tz='UTC'): 56.5,
Timestamp('2019-10-07 10:00:00+0000', tz='UTC'): 57.3525,
Timestamp('2019-10-07 16:00:00+0000', tz='UTC'): 56.7875,
Timestamp('2019-10-07 22:00:00+0000', tz='UTC'): 56.86,
Timestamp('2019-10-08 04:00:00+0000', tz='UTC'): 56.75,
Timestamp('2019-10-08 10:00:00+0000', tz='UTC'): 56.525,
Timestamp('2019-10-08 16:00:00+0000', tz='UTC'): 55.9775,
Timestamp('2019-10-08 22:00:00+0000', tz='UTC'): 55.925,
Timestamp('2019-10-09 04:00:00+0000', tz='UTC'): 56.75,
Timestamp('2019-10-09 10:00:00+0000', tz='UTC'): 56.6783,
Timestamp('2019-10-09 16:00:00+0000', tz='UTC'): 56.77,
Timestamp('2019-10-09 22:00:00+0000', tz='UTC'): 56.075,
Timestamp('2019-10-10 04:00:00+0000', tz='UTC'): 56.875,
Timestamp('2019-10-10 10:00:00+0000', tz='UTC'): 57.5175,
Timestamp('2019-10-10 16:00:00+0000', tz='UTC'): 57.71,
Timestamp('2019-10-10 22:00:00+0000', tz='UTC'): 57.8125,
Timestamp('2019-10-11 04:00:00+0000', tz='UTC'): 58.235,
Timestamp('2019-10-11 10:00:00+0000', tz='UTC'): 58.62,
Timestamp('2019-10-11 16:00:00+0000', tz='UTC'): 59.1825,
Timestamp('2019-10-11 22:00:00+0000', tz='UTC'): 59.3125,
Timestamp('2019-10-14 04:00:00+0000', tz='UTC'): 58.5925,
Timestamp('2019-10-14 10:00:00+0000', tz='UTC'): 59.25,
Timestamp('2019-10-14 16:00:00+0000', tz='UTC'): 58.975,
Timestamp('2019-10-14 22:00:00+0000', tz='UTC'): 59.1125,
Timestamp('2019-10-15 04:00:00+0000', tz='UTC'): 59.2525,
Timestamp('2019-10-15 10:00:00+0000', tz='UTC'): 58.9238,
Timestamp('2019-10-15 16:00:00+0000', tz='UTC'): 58.9,
Timestamp('2019-10-15 22:00:00+0000', tz='UTC'): 58.75,
Timestamp('2019-10-16 04:00:00+0000', tz='UTC'): 58.565,
Timestamp('2019-10-16 10:00:00+0000', tz='UTC'): 58.59,
Timestamp('2019-10-16 16:00:00+0000', tz='UTC'): 58.6825,
Timestamp('2019-10-16 22:00:00+0000', tz='UTC'): 58.5875,
Timestamp('2019-10-17 04:00:00+0000', tz='UTC'): 58.9375,
Timestamp('2019-10-17 10:00:00+0000', tz='UTC'): 58.48,
Timestamp('2019-10-17 16:00:00+0000', tz='UTC'): 58.8375,
Timestamp('2019-10-17 22:00:00+0000', tz='UTC'): 58.8025,
Timestamp('2019-10-18 04:00:00+0000', tz='UTC'): 58.7275,
Timestamp('2019-10-18 10:00:00+0000', tz='UTC'): 58.7838,
Timestamp('2019-10-18 16:00:00+0000', tz='UTC'): 59.0675,
Timestamp('2019-10-18 22:00:00+0000', tz='UTC'): 59.0525,
Timestamp('2019-10-21 04:00:00+0000', tz='UTC'): 59.3775,
Timestamp('2019-10-21 10:00:00+0000', tz='UTC'): 60.1825,
Timestamp('2019-10-21 16:00:00+0000', tz='UTC'): 60.165,
Timestamp('2019-10-21 22:00:00+0000', tz='UTC'): 60.1725,
Timestamp('2019-10-22 04:00:00+0000', tz='UTC'): 60.1975,
Timestamp('2019-10-22 10:00:00+0000', tz='UTC'): 60.2975,
Timestamp('2019-10-22 16:00:00+0000', tz='UTC'): 59.8025,
Timestamp('2019-10-22 22:00:00+0000', tz='UTC'): 59.755,
Timestamp('2019-10-23 04:00:00+0000', tz='UTC'): 60.3975,
Timestamp('2019-10-23 10:00:00+0000', tz='UTC'): 60.6265,
Timestamp('2019-10-23 16:00:00+0000', tz='UTC'): 60.8875,
Timestamp('2019-10-23 22:00:00+0000', tz='UTC'): 61.0275,
Timestamp('2019-10-24 04:00:00+0000', tz='UTC'): 61.0525,
Timestamp('2019-10-24 10:00:00+0000', tz='UTC'): 60.82,
Timestamp('2019-10-24 16:00:00+0000', tz='UTC'): 60.8125,
Timestamp('2019-10-24 22:00:00+0000', tz='UTC'): 60.8225,
Timestamp('2019-10-25 04:00:00+0000', tz='UTC'): 60.75,
Timestamp('2019-10-25 10:00:00+0000', tz='UTC'): 61.3425,
Timestamp('2019-10-25 16:00:00+0000', tz='UTC'): 61.7,
Timestamp('2019-10-25 22:00:00+0000', tz='UTC'): 61.6875,
Timestamp('2019-10-28 04:00:00+0000', tz='UTC'): 61.8575,
Timestamp('2019-10-28 10:00:00+0000', tz='UTC'): 62.1388,
Timestamp('2019-10-28 16:00:00+0000', tz='UTC'): 62.285,
Timestamp('2019-10-28 22:00:00+0000', tz='UTC'): 62.2875,
Timestamp('2019-10-29 04:00:00+0000', tz='UTC'): 62.15,
Timestamp('2019-10-29 10:00:00+0000', tz='UTC'): 60.7952,
Timestamp('2019-10-29 16:00:00+0000', tz='UTC'): 60.9525,
Timestamp('2019-10-29 22:00:00+0000', tz='UTC'): 60.9575,
Timestamp('2019-10-30 04:00:00+0000', tz='UTC'): 60.9575,
Timestamp('2019-10-30 10:00:00+0000', tz='UTC'): 60.5125,
Timestamp('2019-10-30 16:00:00+0000', tz='UTC'): 62.05,
Timestamp('2019-10-30 22:00:00+0000', tz='UTC'): 62.0475,
Timestamp('2019-10-31 04:00:00+0000', tz='UTC'): 61.76,
Timestamp('2019-10-31 10:00:00+0000', tz='UTC'): 62.0523,
Timestamp('2019-10-31 16:00:00+0000', tz='UTC'): 62.105,
Timestamp('2019-10-31 22:00:00+0000', tz='UTC'): 62.14,
Timestamp('2019-11-01 04:00:00+0000', tz='UTC'): 62.35,
Timestamp('2019-11-01 10:00:00+0000', tz='UTC'): 63.3099,
Timestamp('2019-11-01 16:00:00+0000', tz='UTC'): 63.9725,
Timestamp('2019-11-01 22:00:00+0000', tz='UTC'): 64.025,
Timestamp('2019-11-04 10:00:00+0000', tz='UTC'): 64.2388,
Timestamp('2019-11-04 16:00:00+0000', tz='UTC'): 64.375,
Timestamp('2019-11-04 22:00:00+0000', tz='UTC'): 64.4975,
Timestamp('2019-11-05 04:00:00+0000', tz='UTC'): 64.575}}
这是数据集
预期的新列是:62.35
63.3099、63.9725、64.025等
我想要提前1个月的值
但是使用df['new_column']=df.shift(1,freq='M')['c']
并不能完成这项工作这个问题非常简单,但是你需要在日期上做一些具体的事情才能得到n
找到需要移位的行数,我称之为n
,并使用pd.DateOffset(months=1)
您需要按n
行向上移动-
请注意,为了获得上述输出,我使用了:
df = pd.DataFrame(
{pd.Timestamp('2019-10-01 10:00:00+0000', tz='UTC'): 56.5675,
pd.Timestamp('2019-10-01 16:00:00+0000', tz='UTC'): 56.2725,
pd.Timestamp('2019-10-01 22:00:00+0000', tz='UTC'): 56.2925,
pd.Timestamp('2019-10-02 04:00:00+0000', tz='UTC'): 55.6525,
pd.Timestamp('2019-10-02 10:00:00+0000', tz='UTC'): 54.8025,
pd.Timestamp('2019-10-02 16:00:00+0000', tz='UTC'): 54.625,
pd.Timestamp('2019-10-02 22:00:00+0000', tz='UTC'): 54.625,
pd.Timestamp('2019-10-03 04:00:00+0000', tz='UTC'): 54.825,
pd.Timestamp('2019-10-03 10:00:00+0000', tz='UTC'): 54.7075,
pd.Timestamp('2019-10-03 16:00:00+0000', tz='UTC'): 55.1575,
pd.Timestamp('2019-10-03 22:00:00+0000', tz='UTC'): 55.125,
pd.Timestamp('2019-10-04 04:00:00+0000', tz='UTC'): 55.88,
pd.Timestamp('2019-10-04 10:00:00+0000', tz='UTC'): 56.51,
pd.Timestamp('2019-10-04 16:00:00+0000', tz='UTC'): 56.77,
pd.Timestamp('2019-10-04 22:00:00+0000', tz='UTC'): 56.7375,
pd.Timestamp('2019-10-07 04:00:00+0000', tz='UTC'): 56.5,
pd.Timestamp('2019-10-07 10:00:00+0000', tz='UTC'): 57.3525,
pd.Timestamp('2019-10-07 16:00:00+0000', tz='UTC'): 56.7875,
pd.Timestamp('2019-10-07 22:00:00+0000', tz='UTC'): 56.86,
pd.Timestamp('2019-10-08 04:00:00+0000', tz='UTC'): 56.75,
pd.Timestamp('2019-10-08 10:00:00+0000', tz='UTC'): 56.525,
pd.Timestamp('2019-10-08 16:00:00+0000', tz='UTC'): 55.9775,
pd.Timestamp('2019-10-08 22:00:00+0000', tz='UTC'): 55.925,
pd.Timestamp('2019-10-09 04:00:00+0000', tz='UTC'): 56.75,
pd.Timestamp('2019-10-09 10:00:00+0000', tz='UTC'): 56.6783,
pd.Timestamp('2019-10-09 16:00:00+0000', tz='UTC'): 56.77,
pd.Timestamp('2019-10-09 22:00:00+0000', tz='UTC'): 56.075,
pd.Timestamp('2019-10-10 04:00:00+0000', tz='UTC'): 56.875,
pd.Timestamp('2019-10-10 10:00:00+0000', tz='UTC'): 57.5175,
pd.Timestamp('2019-10-10 16:00:00+0000', tz='UTC'): 57.71,
pd.Timestamp('2019-10-10 22:00:00+0000', tz='UTC'): 57.8125,
pd.Timestamp('2019-10-11 04:00:00+0000', tz='UTC'): 58.235,
pd.Timestamp('2019-10-11 10:00:00+0000', tz='UTC'): 58.62,
pd.Timestamp('2019-10-11 16:00:00+0000', tz='UTC'): 59.1825,
pd.Timestamp('2019-10-11 22:00:00+0000', tz='UTC'): 59.3125,
pd.Timestamp('2019-10-14 04:00:00+0000', tz='UTC'): 58.5925,
pd.Timestamp('2019-10-14 10:00:00+0000', tz='UTC'): 59.25,
pd.Timestamp('2019-10-14 16:00:00+0000', tz='UTC'): 58.975,
pd.Timestamp('2019-10-14 22:00:00+0000', tz='UTC'): 59.1125,
pd.Timestamp('2019-10-15 04:00:00+0000', tz='UTC'): 59.2525,
pd.Timestamp('2019-10-15 10:00:00+0000', tz='UTC'): 58.9238,
pd.Timestamp('2019-10-15 16:00:00+0000', tz='UTC'): 58.9,
pd.Timestamp('2019-10-15 22:00:00+0000', tz='UTC'): 58.75,
pd.Timestamp('2019-10-16 04:00:00+0000', tz='UTC'): 58.565,
pd.Timestamp('2019-10-16 10:00:00+0000', tz='UTC'): 58.59,
pd.Timestamp('2019-10-16 16:00:00+0000', tz='UTC'): 58.6825,
pd.Timestamp('2019-10-16 22:00:00+0000', tz='UTC'): 58.5875,
pd.Timestamp('2019-10-17 04:00:00+0000', tz='UTC'): 58.9375,
pd.Timestamp('2019-10-17 10:00:00+0000', tz='UTC'): 58.48,
pd.Timestamp('2019-10-17 16:00:00+0000', tz='UTC'): 58.8375,
pd.Timestamp('2019-10-17 22:00:00+0000', tz='UTC'): 58.8025,
pd.Timestamp('2019-10-18 04:00:00+0000', tz='UTC'): 58.7275,
pd.Timestamp('2019-10-18 10:00:00+0000', tz='UTC'): 58.7838,
pd.Timestamp('2019-10-18 16:00:00+0000', tz='UTC'): 59.0675,
pd.Timestamp('2019-10-18 22:00:00+0000', tz='UTC'): 59.0525,
pd.Timestamp('2019-10-21 04:00:00+0000', tz='UTC'): 59.3775,
pd.Timestamp('2019-10-21 10:00:00+0000', tz='UTC'): 60.1825,
pd.Timestamp('2019-10-21 16:00:00+0000', tz='UTC'): 60.165,
pd.Timestamp('2019-10-21 22:00:00+0000', tz='UTC'): 60.1725,
pd.Timestamp('2019-10-22 04:00:00+0000', tz='UTC'): 60.1975,
pd.Timestamp('2019-10-22 10:00:00+0000', tz='UTC'): 60.2975,
pd.Timestamp('2019-10-22 16:00:00+0000', tz='UTC'): 59.8025,
pd.Timestamp('2019-10-22 22:00:00+0000', tz='UTC'): 59.755,
pd.Timestamp('2019-10-23 04:00:00+0000', tz='UTC'): 60.3975,
pd.Timestamp('2019-10-23 10:00:00+0000', tz='UTC'): 60.6265,
pd.Timestamp('2019-10-23 16:00:00+0000', tz='UTC'): 60.8875,
pd.Timestamp('2019-10-23 22:00:00+0000', tz='UTC'): 61.0275,
pd.Timestamp('2019-10-24 04:00:00+0000', tz='UTC'): 61.0525,
pd.Timestamp('2019-10-24 10:00:00+0000', tz='UTC'): 60.82,
pd.Timestamp('2019-10-24 16:00:00+0000', tz='UTC'): 60.8125,
pd.Timestamp('2019-10-24 22:00:00+0000', tz='UTC'): 60.8225,
pd.Timestamp('2019-10-25 04:00:00+0000', tz='UTC'): 60.75,
pd.Timestamp('2019-10-25 10:00:00+0000', tz='UTC'): 61.3425,
pd.Timestamp('2019-10-25 16:00:00+0000', tz='UTC'): 61.7,
pd.Timestamp('2019-10-25 22:00:00+0000', tz='UTC'): 61.6875,
pd.Timestamp('2019-10-28 04:00:00+0000', tz='UTC'): 61.8575,
pd.Timestamp('2019-10-28 10:00:00+0000', tz='UTC'): 62.1388,
pd.Timestamp('2019-10-28 16:00:00+0000', tz='UTC'): 62.285,
pd.Timestamp('2019-10-28 22:00:00+0000', tz='UTC'): 62.2875,
pd.Timestamp('2019-10-29 04:00:00+0000', tz='UTC'): 62.15,
pd.Timestamp('2019-10-29 10:00:00+0000', tz='UTC'): 60.7952,
pd.Timestamp('2019-10-29 16:00:00+0000', tz='UTC'): 60.9525,
pd.Timestamp('2019-10-29 22:00:00+0000', tz='UTC'): 60.9575,
pd.Timestamp('2019-10-30 04:00:00+0000', tz='UTC'): 60.9575,
pd.Timestamp('2019-10-30 10:00:00+0000', tz='UTC'): 60.5125,
pd.Timestamp('2019-10-30 16:00:00+0000', tz='UTC'): 62.05,
pd.Timestamp('2019-10-30 22:00:00+0000', tz='UTC'): 62.0475,
pd.Timestamp('2019-10-31 04:00:00+0000', tz='UTC'): 61.76,
pd.Timestamp('2019-10-31 10:00:00+0000', tz='UTC'): 62.0523,
pd.Timestamp('2019-10-31 16:00:00+0000', tz='UTC'): 62.105,
pd.Timestamp('2019-10-31 22:00:00+0000', tz='UTC'): 62.14,
pd.Timestamp('2019-11-01 04:00:00+0000', tz='UTC'): 62.35,
pd.Timestamp('2019-11-01 10:00:00+0000', tz='UTC'): 63.3099,
pd.Timestamp('2019-11-01 16:00:00+0000', tz='UTC'): 63.9725,
pd.Timestamp('2019-11-01 22:00:00+0000', tz='UTC'): 64.025,
pd.Timestamp('2019-11-04 10:00:00+0000', tz='UTC'): 64.2388,
pd.Timestamp('2019-11-04 16:00:00+0000', tz='UTC'): 64.375,
pd.Timestamp('2019-11-04 22:00:00+0000', tz='UTC'): 64.4975,
pd.Timestamp('2019-11-05 04:00:00+0000', tz='UTC'): 64.575}, index=['c']).T
df = df.reset_index().rename({'index': 'Date'}, axis=1)
# and then my answer:
df['Date'] = pd.to_datetime(pd.to_datetime(df['Date']).dt.date)
n = df[df['Date'].isin(pd.to_datetime(df['Date'] +
pd.DateOffset(months=1)))].index[0]
df['new_column'] = df['c'].shift(-n)
df
假设您每天都有唯一的时间戳,并且没有任何缺少的时间戳值。下面的方法可能会奏效,因为您只需要根据天移动行,并且每天不需要唯一的时间戳
import pandas as pd
# Dummy data
# I assumed you have 4 unique values for a day's timestamp and don't have any missing values
lst1 = list(pd.date_range('2020-08-01 04:00:00', periods=60))
lst2 = list(pd.date_range('2020-08-01 10:00:00', periods=60))
lst3 = list(pd.date_range('2020-08-01 16:00:00', periods=60))
lst4 = list(pd.date_range('2020-08-01 22:00:00', periods=60))
lst1.extend(lst2)
lst1.extend(lst3)
lst1.extend(lst4)
data = {
'date': lst1,
'value': [v for v in range(0,240)]
}
# Preprocessing
df = pd.DataFrame(data)
df = df.sort_values(by=['date'])
df.reset_index(drop=True, inplace=True)
def update(row,df):
row['value'] = df.loc[row.name]['value']
return row
# factor is = X days of shift * Y unique time stamps per day
factor = 31 * 4
df.apply(update,axis=1,args=[df.shift(-factor)])
能否提供数据帧的代码/文本和预期输出?请不要图像。提供例如df.head(20).to_dict()的输出instead@anon01编辑question@anon01,明白了吗?c这是一个列的名称,它是一个多索引dataframe@anon01现在明白了吗?
import pandas as pd
# Dummy data
# I assumed you have 4 unique values for a day's timestamp and don't have any missing values
lst1 = list(pd.date_range('2020-08-01 04:00:00', periods=60))
lst2 = list(pd.date_range('2020-08-01 10:00:00', periods=60))
lst3 = list(pd.date_range('2020-08-01 16:00:00', periods=60))
lst4 = list(pd.date_range('2020-08-01 22:00:00', periods=60))
lst1.extend(lst2)
lst1.extend(lst3)
lst1.extend(lst4)
data = {
'date': lst1,
'value': [v for v in range(0,240)]
}
# Preprocessing
df = pd.DataFrame(data)
df = df.sort_values(by=['date'])
df.reset_index(drop=True, inplace=True)
def update(row,df):
row['value'] = df.loc[row.name]['value']
return row
# factor is = X days of shift * Y unique time stamps per day
factor = 31 * 4
df.apply(update,axis=1,args=[df.shift(-factor)])