Datetime 熊猫日期时间切片:junkdf.ix[';2015-08-03';:';2015-08-06';]不工作
junkdf:Datetime 熊猫日期时间切片:junkdf.ix[';2015-08-03';:';2015-08-06';]不工作,datetime,pandas,indexing,selection,slice,Datetime,Pandas,Indexing,Selection,Slice,junkdf: rev dtime 2015-08-03 20.45 2015-08-04 -2.57 2015-08-05 12.53 2015-08-06 -8.16 2015-08-07 -4.41 junkdf.reset_index().至_dict('rec') 为什么我不能像下面描述的那样进行日期时间切片: junkdf['2015-08-03':] C:\Users\blah\Anaconda3\lib\site-packages\p
rev
dtime
2015-08-03 20.45
2015-08-04 -2.57
2015-08-05 12.53
2015-08-06 -8.16
2015-08-07 -4.41
junkdf.reset_index().至_dict('rec')
为什么我不能像下面描述的那样进行日期时间切片:
junkdf['2015-08-03':]
C:\Users\blah\Anaconda3\lib\site-packages\pandas\core\base.py in searchsorted(self, key, side, sorter)
1112 def searchsorted(self, key, side='left', sorter=None):
1113 # needs coercion on the key (DatetimeIndex does already)
-> 1114 return self.values.searchsorted(key, side=side, sorter=sorter)
1115
1116 _shared_docs['drop_duplicates'] = (
TypeError: unorderable types: datetime.date() > str()
C:\Users\blah\Anaconda3\lib\site-packages\pandas\core\base.py in searchsorted(self, key, side, sorter)
1112 def searchsorted(self, key, side='left', sorter=None):
1113 # needs coercion on the key (DatetimeIndex does already)
-> 1114 return self.values.searchsorted(key, side=side, sorter=sorter)
1115
1116 _shared_docs['drop_duplicates'] = (
TypeError: unorderable types: datetime.date() > str()
junkdf.ix['2015-08-03':'2015-08-06']
C:\Users\blah\Anaconda3\lib\site-packages\pandas\core\base.py in searchsorted(self, key, side, sorter)
1112 def searchsorted(self, key, side='left', sorter=None):
1113 # needs coercion on the key (DatetimeIndex does already)
-> 1114 return self.values.searchsorted(key, side=side, sorter=sorter)
1115
1116 _shared_docs['drop_duplicates'] = (
TypeError: unorderable types: datetime.date() > str()
C:\Users\blah\Anaconda3\lib\site-packages\pandas\core\base.py in searchsorted(self, key, side, sorter)
1112 def searchsorted(self, key, side='left', sorter=None):
1113 # needs coercion on the key (DatetimeIndex does already)
-> 1114 return self.values.searchsorted(key, side=side, sorter=sorter)
1115
1116 _shared_docs['drop_duplicates'] = (
TypeError: unorderable types: datetime.date() > str()
start=junkdf.index.searchsorted(dt.datetime(2015,8,4))
但是,如果我使用dt.date(),则以下操作有效:
更新:
junkdf = df[['dtime','rev']].groupby((df.dtime).dt.date).sum().copy()
其中,df[['dtime','rev']]
看起来像:
dtime rev
0 2015-08-03 07:59:59 -0.18
1 2015-08-03 08:59:59 -0.11
2 2015-08-03 09:59:59 -0.29
3 2015-08-03 10:59:59 -0.08
4 2015-08-03 11:59:59 0.69
更新2:
rev
dtime
2015-08-03 20.45
2015-08-04 -2.57
2015-08-05 12.53
2015-08-06 -8.16
2015-08-07 -4.41
我试过:
df[['dtime','rev']].head()
dtime rev
0 2015-08-03 07:59:59 -0.18
1 2015-08-03 08:59:59 -0.11
2 2015-08-03 09:59:59 -0.29
3 2015-08-03 10:59:59 -0.08
4 2015-08-03 11:59:59 0.69
df[['dtime','rev']].groupby(pd.TimeGrouper('D', key=df.dtime)).sum()
C:\Users\blah\Anaconda3\lib\site-packages\pandas\core\generic.py in __hash__(self)
804 def __hash__(self):
805 raise TypeError('{0!r} objects are mutable, thus they cannot be'
--> 806 ' hashed'.format(self.__class__.__name__))
807
808 def __iter__(self):
TypeError: 'Series' objects are mutable, thus they cannot be hashed
假设您有以下来源DF(我从您之前的问题中提取并进行了更改,因此我们有多天的数据): 让我们按天分组并计算
总和
:
In [89]: rslt = (df.assign(t=df.datetime - pd.Timedelta(hours=1))
....: .groupby(pd.TimeGrouper('D', key='t'))['rev']
....: .sum())
In [90]: rslt
Out[90]:
t
2016-05-01 -0.08
2016-05-02 -0.22
2016-05-03 -0.30
2016-05-04 -0.47
2016-05-05 -0.47
2016-05-06 -0.51
2016-05-07 -0.36
2016-05-08 -0.08
Freq: D, Name: rev, dtype: float64
In [92]: rslt.index.dtype
Out[92]: dtype('<M8[ns]')
这对我来说很合适。
print(junkdf.index.dtype)
的输出是什么?print(junkdf.index.dtype)=object您的索引是字符串类型的。您必须首先将其转换为DateTimes,因为我添加了其他信息。我通过在datetime专栏上做一个groupby到达junkdf。它是否应该自动成为日期时间类型?.dt.date
-将datetime
dtype转换为string
dtype这将解决我在以下问题中描述的问题:在我看来,没有hour==24
这样的事情,您可以拥有从0
到23
的小时数,这正是电力行业的标准。需求计量是在1到24小时内完成的,因此计费是按要求完成的。我可以从索引中减去1秒,然后按照您建议的pd.Timegrouper进行分组。让我试试……我得到:TypeError:“Series”对象是可变的,因此它们不能被散列
In [89]: rslt = (df.assign(t=df.datetime - pd.Timedelta(hours=1))
....: .groupby(pd.TimeGrouper('D', key='t'))['rev']
....: .sum())
In [90]: rslt
Out[90]:
t
2016-05-01 -0.08
2016-05-02 -0.22
2016-05-03 -0.30
2016-05-04 -0.47
2016-05-05 -0.47
2016-05-06 -0.51
2016-05-07 -0.36
2016-05-08 -0.08
Freq: D, Name: rev, dtype: float64
In [92]: rslt.index.dtype
Out[92]: dtype('<M8[ns]')
In [91]: rslt.ix['2016-05-03':'2016-05-06']
Out[91]:
t
2016-05-03 -0.30
2016-05-04 -0.47
2016-05-05 -0.47
2016-05-06 -0.51
Freq: D, Name: rev, dtype: float64