Python 如何修改';日期';熊猫数据帧中索引的更改(00:00:00)?
我有一个如下所示的数据帧:Python 如何修改';日期';熊猫数据帧中索引的更改(00:00:00)?,python,python-2.7,pandas,indexing,Python,Python 2.7,Pandas,Indexing,我有一个如下所示的数据帧: Date and Time Close dif 2015/01/01 17:00:00.211 2030.25 0.3 2015/01/01 17:00:02.456 2030.75 0.595137615 2015/01/01 23:55:01.491 2037.25 2.432613592 2015/01/02 00:02:01.955 2036.75 -0.4 2015/01/02 00:04:04.887 2036.5 -0.3911
Date and Time Close dif
2015/01/01 17:00:00.211 2030.25 0.3
2015/01/01 17:00:02.456 2030.75 0.595137615
2015/01/01 23:55:01.491 2037.25 2.432613592
2015/01/02 00:02:01.955 2036.75 -0.4
2015/01/02 00:04:04.887 2036.5 -0.391144414
2015/01/02 15:14:56.207 2021.5 -4.732676608
2015/01/02 15:14:59.020 2021.5 -4.731171953
2015/01/02 15:30:00.020 2022 -4.228169436
2015/01/02 16:13:18.948 2021.25 -4.96153033
2015/01/02 16:15:00.000 2021 -5.210187988
2015/01/04 17:00:00.105 2020.5 0
2015/01/04 17:00:01.077 2021 0.423093923
Date and Time Close dif
2015/01/02 17:00:00.211 2030.25 0.3
2015/01/02 17:00:02.456 2030.75 0.595137615
2015/01/02 23:55:01.491 2037.25 2.432613592
2015/01/02 00:02:01.955 2036.75 -0.4
2015/01/02 00:04:04.887 2036.5 -0.391144414
2015/01/02 15:14:56.207 2021.5 -4.732676608
2015/01/02 15:14:59.020 2021.5 -4.731171953
2015/01/05 17:00:00.105 2020.5 0
2015/01/05 17:00:01.077 2021 0.423093923
如何修改索引,使当前日期从前一天的17:00:00开始,到15:15:00结束。(15:15:00至17:00:00之间的数据可以删除)
新的数据帧如下所示:
Date and Time Close dif
2015/01/01 17:00:00.211 2030.25 0.3
2015/01/01 17:00:02.456 2030.75 0.595137615
2015/01/01 23:55:01.491 2037.25 2.432613592
2015/01/02 00:02:01.955 2036.75 -0.4
2015/01/02 00:04:04.887 2036.5 -0.391144414
2015/01/02 15:14:56.207 2021.5 -4.732676608
2015/01/02 15:14:59.020 2021.5 -4.731171953
2015/01/02 15:30:00.020 2022 -4.228169436
2015/01/02 16:13:18.948 2021.25 -4.96153033
2015/01/02 16:15:00.000 2021 -5.210187988
2015/01/04 17:00:00.105 2020.5 0
2015/01/04 17:00:01.077 2021 0.423093923
Date and Time Close dif
2015/01/02 17:00:00.211 2030.25 0.3
2015/01/02 17:00:02.456 2030.75 0.595137615
2015/01/02 23:55:01.491 2037.25 2.432613592
2015/01/02 00:02:01.955 2036.75 -0.4
2015/01/02 00:04:04.887 2036.5 -0.391144414
2015/01/02 15:14:56.207 2021.5 -4.732676608
2015/01/02 15:14:59.020 2021.5 -4.731171953
2015/01/05 17:00:00.105 2020.5 0
2015/01/05 17:00:01.077 2021 0.423093923
谢谢这就是你要找的吗
# read in your dataframe
import pandas as pd
df = pd.read_csv('dt_data.csv', skipinitialspace=True)
df.columns = ['mydt', 'close', 'dif'] # changed your column name to 'mydt'
df.mydt = pd.to_datetime(df.mydt) # convert mydt to datetime so we can operate on it
# keep times outside [15:15 to 17:00] interval
df = df[(((df.mydt.dt.hour >= 15) & (df.mydt.dt.minute > 15))
| (df.mydt.dt.hour == 16))==False]
# increment the day count for hours >= 17 at start of new 'day'
ndx = df[df.mydt.dt.hour>=17].index
df.ix[ndx, 'mydt'] += pd.Timedelta(days=1)
df.set_index('mydt', inplace=True, drop=True)
print(df)
close dif
mydt
2015-01-02 17:00:00.211 2030.25 0.300000
2015-01-02 17:00:02.456 2030.75 0.595138
2015-01-02 00:02:01.955 2036.75 -0.400000
2015-01-02 00:04:04.887 2036.50 -0.391144
2015-01-02 15:14:56.207 2021.50 -4.732677
2015-01-02 15:14:59.020 2021.50 -4.731172
2015-01-05 17:00:00.105 2020.50 0.000000
2015-01-05 17:00:01.077 2021.00 0.423094
编辑:在评论中回答groupby问题。如果只需要访问上面datetime列mydt的日期部分,可以执行以下操作:
df.reset_index(inplace=True)
print(df.mydt.dt.date)
0 2015-01-02
1 2015-01-02
2 2015-01-02
3 2015-01-02
4 2015-01-02
5 2015-01-02
6 2015-01-05
7 2015-01-05
dtype: object
然后,您可以仅使用日期部分执行groupby操作
print(df.groupby(df.mydt.dt.date)['dif'].sum())
2015-01-02 -9.359855
2015-01-05 0.423094
Name: dif, dtype: float64
它工作正常。现在的问题是,如果我尝试这样做:df['dif']=df.groupby(pd.TimeGrouper('D'))['dif'].cumsum(),我会得到:“ValueError:无法从重复的轴重新编制索引”。它以前工作过。@hernanavella请参阅我在上面的编辑,了解仅按“日期”分组的一种方法,如果您正试图这样做的话。我不明白你想对cumsum做什么:cumsum在一天之内?整天?不确定。也许你可以进一步解释或张贴一个你想要的输出的例子?