Python 按日期字符串选择数据帧切片
我有一个像这样加载的数据帧Python 按日期字符串选择数据帧切片,python,pandas,Python,Pandas,我有一个像这样加载的数据帧 minData = pd.read_csv( currentSymbol["fullpath"], header = None, names = ['Date', 'Time', 'Open', 'High', 'Low', 'Close', 'Volume', 'Split Factor', 'Earnings', 'Dividends'],
minData = pd.read_csv(
currentSymbol["fullpath"],
header = None,
names = ['Date', 'Time', 'Open', 'High', 'Low', 'Close', 'Volume', 'Split Factor', 'Earnings', 'Dividends'],
parse_dates = [["Date", "Time"]],
date_parser = lambda x : datetime.datetime.strptime(x, '%Y%m%d %H%M'),
index_col = "Date_Time",
sep=' ')
数据如下所示
>>> minData.index
<class 'pandas.tseries.index.DatetimeIndex'>
[1998-01-02 09:30:00, ..., 2013-12-09 16:00:00]
Length: 1373036, Freq: None, Timezone: None
>>>
>>> minData.head(5)
Open High Low Close Volume \
Date_Time
1998-01-02 09:30:00 8.70630 8.70630 8.70630 8.70630 420.73
1998-01-02 09:35:00 8.82514 8.82514 8.82514 8.82514 420.73
1998-01-02 09:42:00 8.79424 8.79424 8.79424 8.79424 420.73
1998-01-02 09:43:00 8.76572 8.76572 8.76572 8.76572 1262.19
1998-01-02 09:44:00 8.76572 8.76572 8.76572 8.76572 420.73
Split Factor Earnings Dividends Active
Date_Time
1998-01-02 09:30:00 4 0 0 NaN
1998-01-02 09:35:00 4 0 0 NaN
1998-01-02 09:42:00 4 0 0 NaN
1998-01-02 09:43:00 4 0 0 NaN
1998-01-02 09:44:00 4 0 0 NaN
[5 rows x 9 columns]
>>> minData["2004-12-20"]
Open High Low Close Volume \
Date_Time
2004-12-20 09:30:00 35.8574 35.9373 35.8025 35.9273 154112.00
2004-12-20 09:31:00 35.8924 35.9174 35.8824 35.8874 17021.50
2004-12-20 09:32:00 35.8874 35.8924 35.8824 35.8824 17079.50
2004-12-20 09:33:00 35.8874 35.9423 35.8724 35.9373 32491.50
2004-12-20 09:34:00 35.9373 36.0023 35.9174 36.0023 40096.40
2004-12-20 09:35:00 35.9923 36.2071 35.9923 36.1471 67088.90
...
我有这样的日期(从不同的文件读取)
我想在这一天的所有分钟内将“活动”列设置为True
我可以用这个做这个
minData.loc['2004-12-20',"Active"] = True
我可以用这段疯狂的代码对我的时间戳日期做同样的事情
minData.loc[str(ts.year) + "-" + str(ts.month) + "-" + str(ts.day),"Active"] = True
是的,这是从TimeStamp对象创建字符串
我知道一定有更好的方法来做这件事。事实上我会这样做的
In [20]: df = DataFrame(np.random.randn(10,1),index=date_range('20130101 23:55:00',periods=10,freq='T'))
In [21]: df['Active'] = False
In [22]: df
Out[22]:
0 Active
2013-01-01 23:55:00 0.273194 False
2013-01-01 23:56:00 2.869795 False
2013-01-01 23:57:00 0.980566 False
2013-01-01 23:58:00 0.176711 False
2013-01-01 23:59:00 -0.354976 False
2013-01-02 00:00:00 0.258194 False
2013-01-02 00:01:00 -1.765781 False
2013-01-02 00:02:00 0.106163 False
2013-01-02 00:03:00 -1.169214 False
2013-01-02 00:04:00 0.224484 False
[10 rows x 2 columns]
In [28]: df['Active'] = False
正如@Andy Hayden指出的那样,normalize
将时间设置为0,这样您就可以直接将时间与时间为0的时间戳进行比较
In [34]: df.loc[df.index.normalize() == Timestamp('20130102'),'Active'] = True
In [35]: df
Out[35]:
0 Active
2013-01-01 23:55:00 0.273194 False
2013-01-01 23:56:00 2.869795 False
2013-01-01 23:57:00 0.980566 False
2013-01-01 23:58:00 0.176711 False
2013-01-01 23:59:00 -0.354976 False
2013-01-02 00:00:00 0.258194 True
2013-01-02 00:01:00 -1.765781 True
2013-01-02 00:02:00 0.106163 True
2013-01-02 00:03:00 -1.169214 True
2013-01-02 00:04:00 0.224484 True
[10 rows x 2 columns]
要实现真正精细的控制,请执行此操作(如果您只希望使用次数作为索引器,则可以在时间使用索引器)。您可以始终使用and子句来执行更复杂的索引
In [29]: df.loc[df.index.indexer_between_time('20130101 23:59:00','20130102 00:03:00'),'Active'] = True
In [30]: df
Out[30]:
0 Active
2013-01-01 23:55:00 0.273194 False
2013-01-01 23:56:00 2.869795 False
2013-01-01 23:57:00 0.980566 False
2013-01-01 23:58:00 0.176711 False
2013-01-01 23:59:00 -0.354976 True
2013-01-02 00:00:00 0.258194 True
2013-01-02 00:01:00 -1.765781 True
2013-01-02 00:02:00 0.106163 True
2013-01-02 00:03:00 -1.169214 True
2013-01-02 00:04:00 0.224484 False
[10 rows x 2 columns]
太棒了,谢谢你@Jeff!我在读关于normalize的书,但不知道如何在这个例子中使用它。我以前没有读过任何关于索引器\u-between\u-time方法的文章。我要做些调查。再次感谢!
In [29]: df.loc[df.index.indexer_between_time('20130101 23:59:00','20130102 00:03:00'),'Active'] = True
In [30]: df
Out[30]:
0 Active
2013-01-01 23:55:00 0.273194 False
2013-01-01 23:56:00 2.869795 False
2013-01-01 23:57:00 0.980566 False
2013-01-01 23:58:00 0.176711 False
2013-01-01 23:59:00 -0.354976 True
2013-01-02 00:00:00 0.258194 True
2013-01-02 00:01:00 -1.765781 True
2013-01-02 00:02:00 0.106163 True
2013-01-02 00:03:00 -1.169214 True
2013-01-02 00:04:00 0.224484 False
[10 rows x 2 columns]