Python 按日期字符串索引timeseries
给定一个timeseries,Python 按日期字符串索引timeseries,python,pandas,time-series,Python,Pandas,Time Series,给定一个timeseries,s,带有日期时间索引,我希望能够通过日期字符串对timeseries进行索引。我是不是误解了这应该怎么做 import pandas as pd url = 'http://ichart.finance.yahoo.com/table.csvs=SPY&d=12&e=4&f=2012&g=d&a=01&b=01&c=2001&ignore=.csv' df = pd.read_csv(url, in
s
,带有日期时间索引,我希望能够通过日期字符串对timeseries进行索引。我是不是误解了这应该怎么做
import pandas as pd
url = 'http://ichart.finance.yahoo.com/table.csvs=SPY&d=12&e=4&f=2012&g=d&a=01&b=01&c=2001&ignore=.csv'
df = pd.read_csv(url, index_col='Date', parse_dates=True)
s = df['Close']
s['2012-12-04']
结果:
TimeSeriesError Traceback (most recent call last)
<ipython-input-244-e2ccd4ecce94> in <module>()
2 df = pd.read_csv(url, index_col='Date', parse_dates=True)
3 s = df['Close']
----> 4 s['2012-12-04']
G:\Python27-32\lib\site-packages\pandas\core\series.pyc in __getitem__(self, key)
468 def __getitem__(self, key):
469 try:
--> 470 return self.index.get_value(self, key)
471 except InvalidIndexError:
472 pass
G:\Python27-32\lib\site-packages\pandas\tseries\index.pyc in get_value(self, series, key)
1030
1031 try:
-> 1032 loc = self._get_string_slice(key)
1033 return series[loc]
1034 except (TypeError, ValueError, KeyError):
G:\Python27-32\lib\site-packages\pandas\tseries\index.pyc in _get_string_slice(self, key)
1077 asdt, parsed, reso = parse_time_string(key, freq)
1078 key = asdt
-> 1079 loc = self._partial_date_slice(reso, parsed)
1080 return loc
1081
G:\Python27-32\lib\site-packages\pandas\tseries\index.pyc in _partial_date_slice(self, reso, parsed)
992 def _partial_date_slice(self, reso, parsed):
993 if not self.is_monotonic:
--> 994 raise TimeSeriesError('Partial indexing only valid for ordered '
995 'time series.')
996
TimeSeriesError: Partial indexing only valid for ordered time series.
<class 'pandas.core.series.TimeSeries'>
<class 'pandas.lib.Timestamp'>
<class 'pandas.core.series.TimeSeries'>
<class 'pandas.lib.Timestamp'>
-0.608673793503
141.5
-0.608673793503
doesn't work
结果:
TimeSeriesError Traceback (most recent call last)
<ipython-input-244-e2ccd4ecce94> in <module>()
2 df = pd.read_csv(url, index_col='Date', parse_dates=True)
3 s = df['Close']
----> 4 s['2012-12-04']
G:\Python27-32\lib\site-packages\pandas\core\series.pyc in __getitem__(self, key)
468 def __getitem__(self, key):
469 try:
--> 470 return self.index.get_value(self, key)
471 except InvalidIndexError:
472 pass
G:\Python27-32\lib\site-packages\pandas\tseries\index.pyc in get_value(self, series, key)
1030
1031 try:
-> 1032 loc = self._get_string_slice(key)
1033 return series[loc]
1034 except (TypeError, ValueError, KeyError):
G:\Python27-32\lib\site-packages\pandas\tseries\index.pyc in _get_string_slice(self, key)
1077 asdt, parsed, reso = parse_time_string(key, freq)
1078 key = asdt
-> 1079 loc = self._partial_date_slice(reso, parsed)
1080 return loc
1081
G:\Python27-32\lib\site-packages\pandas\tseries\index.pyc in _partial_date_slice(self, reso, parsed)
992 def _partial_date_slice(self, reso, parsed):
993 if not self.is_monotonic:
--> 994 raise TimeSeriesError('Partial indexing only valid for ordered '
995 'time series.')
996
TimeSeriesError: Partial indexing only valid for ordered time series.
<class 'pandas.core.series.TimeSeries'>
<class 'pandas.lib.Timestamp'>
<class 'pandas.core.series.TimeSeries'>
<class 'pandas.lib.Timestamp'>
-0.608673793503
141.5
-0.608673793503
doesn't work
-0.608673793503
141.5
-0.608673793503
不起作用
尝试使用时间戳
对象建立索引:
>>> import pandas as pd
>>> from pandas.lib import Timestamp
>>> url = 'http://ichart.finance.yahoo.com/table.csv?s=SPY&d=12&e=4&f=2012&g=d&a=01&b=01&c=2001&ignore=.csv'
>>> df = pd.read_csv(url, index_col='Date', parse_dates=True)
>>> s = df['Close']
>>> s[Timestamp('2012-12-04')]
141.25
如果时间序列未排序,并且您给出了部分时间戳(例如日期,而不是日期时间),则不清楚应选择哪个日期时间 不能假设每个日期只有一个datetime对象,虽然在本例中有几个选项,但在这里抛出一个错误似乎比猜测一个更安全。(我们可以返回类似于
.ix['2011-01']
的序列/列表,但如果在其他情况下返回一个数字,这可能会令人困惑。我们可以尝试返回“最接近的匹配项”…但这也没有真正意义。)
在订购的情况下,我们更容易选择带有所选日期的第一个日期时间。
在这个简单的例子中,您可以看到这种行为:
import pandas as pd
from numpy.random import randn
from random import shuffle
rng = pd.date_range(start='2011-01-01', end='2011-12-31')
rng2 = list(rng)
shuffle(rng2) # not in order
rng3 = list(rng)
del rng3[20] # in order, but no freq
ts = pd.Series(randn(len(rng)), index=rng)
ts2 = pd.Series(randn(len(rng)), index=rng2)
ts3 = pd.Series(randn(len(rng)-1), index=rng3)
ts.index
<class 'pandas.tseries.index.DatetimeIndex'>
[2011-01-01 00:00:00, ..., 2011-12-31 00:00:00]
Length: 365, Freq: D, Timezone: None
ts['2011-01-01']
# -1.1454418070543406
ts2.index
<class 'pandas.tseries.index.DatetimeIndex'>
[2011-04-16 00:00:00, ..., 2011-03-10 00:00:00]
Length: 365, Freq: None, Timezone: None
ts2['2011-01-01']
#...error which you describe
TimeSeriesError: Partial indexing only valid for ordered time series
ts3.index
<class 'pandas.tseries.index.DatetimeIndex'>
[2011-01-01 00:00:00, ..., 2011-12-31 00:00:00]
Length: 364, Freq: None, Timezone: None
ts3['2011-01-01']
1.7631554507355987
rng4 = pd.date_range(start='2011-01-01', end='2011-01-31', freq='H')
ts4 = pd.Series(randn(len(rng4)), index=rng4)
ts4['2011-01-01'] == ts4[0]
# True # it picks the first element with that date
将熊猫作为pd导入
从numpy.random导入randn
从随机导入洗牌
rng=pd.日期范围(开始时间为2011年1月1日,结束时间为2011年12月31日)
rng2=列表(rng)
洗牌(rng2)#不整齐
rng3=列表(rng)
del rng3[20]#有序,但无频率
ts=pd.系列(随机数(随机数),指数=rng)
ts2=pd.系列(随机数(随机数(随机数)),指数=rng2)
ts3=pd.系列(随机数(长(rng)-1),指数=rng3)
ts指数
[2011-01-01 00:00:00, ..., 2011-12-31 00:00:00]
长度:365,频率:D,时区:无
ts['2011-01-01']
# -1.1454418070543406
ts2.1索引
[2011-04-16 00:00:00, ..., 2011-03-10 00:00:00]
长度:365,频率:无,时区:无
ts2['2011-01-01']
#…你所描述的错误
TimeSeriesError:部分索引仅对有序时间序列有效
ts3.1索引
[2011-01-01 00:00:00, ..., 2011-12-31 00:00:00]
长度:364,频率:无,时区:无
ts3['2011-01-01']
1.7631554507355987
rng4=局部放电日期\范围(开始时间='2011-01-01',结束时间='2011-01-31',频率='H')
ts4=pd.系列(随机数(len(rng4)),指数=rng4)
ts4['2011-01-01']==ts4[0]
#对#它选择带有该日期的第一个元素
我不认为这是一个bug,不过我把它贴成了。虽然熊猫教程很有启发性,但我认为最初提出的问题应该得到直接的回答。我在将Yahoo图表信息转换为可以切片的数据帧时遇到了同样的问题。我发现唯一需要的是:
import pandas as pd
import datetime as dt
def dt_parser(date):
return dt.datetime.strptime(date, '%Y-%m-%d') + dt.timedelta(hours=16)
url = 'http://ichart.finance.yahoo.com/table.csvs=SPY&d=12&e=4&f=2012&g=d&a=01&b=01&c=2001&ignore=.csv'
df = pd.read_csv(url, index_col=0, parse_dates=True, date_parser=dt_parser)
df.sort_index(inplace=True)
s = df['Close']
s['2012-12-04'] # now should work
“诀窍”是包括我自己的日期解析器。我猜在read_csv中有更好的方法来实现这一点,但这至少产生了一个被索引并可以切片的数据帧。是的,谢谢,我可以看出这是可行的。我的问题是为什么s['2012-12-04']没有。这是一个bug,还是对于带有日期时间索引的Timeseries,它应该是这样工作的?@user1878647它有一个时间戳索引而不是日期时间,尽管您也可以使用它的日期时间进行查找。您不能使用未知日期格式的字符串进行查找(一方面,可能不清楚应该将其解释为哪个日期)。我想说这不是一个bug。@hayden上面例子中的ts和s都有时间戳索引,但行为不同。我在那个url上得到一个404错误。如果您复制一些表,或者更好地复制
df.to_dict()
的输出,那么就更容易了。