Python 将csv dateint列读取到datetime
我对StackOverflow和pandas都是新手。我正在尝试以以下格式读取包含股市bin数据的大型CSV文件:Python 将csv dateint列读取到datetime,python,datetime,pandas,Python,Datetime,Pandas,我对StackOverflow和pandas都是新手。我正在尝试以以下格式读取包含股市bin数据的大型CSV文件: date,time,open,high,low,close,volume,splits,earnings,dividends,sym 20130625,715,49.2634,49.2634,49.2634,49.2634,156.293,1,0,0,JPM 20130625,730,49.273,49.273,49.273,49.273,208.39,1,0,0,JPM 2013
date,time,open,high,low,close,volume,splits,earnings,dividends,sym
20130625,715,49.2634,49.2634,49.2634,49.2634,156.293,1,0,0,JPM
20130625,730,49.273,49.273,49.273,49.273,208.39,1,0,0,JPM
20130625,740,49.1866,49.1866,49.1866,49.1866,224.019,1,0,0,JPM
20130625,745,49.321,49.321,49.321,49.321,208.39,1,0,0,JPM
20130625,750,49.3306,49.369,49.3306,49.369,4583.54,1,0,0,JPM
20130625,755,49.369,49.369,49.369,49.369,416.78,1,0,0,JPM
20130625,800,49.369,49.369,49.3594,49.3594,1715.05,1,0,0,JPM
20130625,805,49.369,49.369,49.3306,49.3306,1333.7,1,0,0,JPM
20130625,810,49.3306,49.3786,49.3306,49.3786,1567.09,1,0,0,JPM
我有以下代码将其读入Pandas中的数据帧
import numpy as np
import scipy as sp
import pandas as pd
import datetime as dt
fname = 'bindat.csv'
df = pd.read_csv(fname, header=0, sep=',')
问题是日期和时间列被读入为int64。我想将这两者合并为一个时间戳,例如:2013-06-25 07:15:00
我甚至很难用以下方法正确读取时间:
df['date'] = pd.to_datetime(df['date'].astype(str))
df['time'] = pd.to_datetime(df['time'].astype(str))
第一个命令用于转换,但时间似乎很奇怪
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 9999 entries, 0 to 9998
Data columns (total 11 columns):
date 9999 non-null datetime64[ns]
time 9999 non-null object
open 9999 non-null float64
high 9999 non-null float64
low 9999 non-null float64
close 9999 non-null float64
volume 9999 non-null float64
splits 9999 non-null float64
earnings 9999 non-null int64
dividends 9999 non-null float64
sym 9999 non-null object
dtypes: datetime64[ns](1), float64(7), int64(1), object(2)None
然后我想合并成一个DatetimeIndex
如有任何建议,我们将不胜感激
干杯 有很多方法可以做到这一点。在read_csv期间执行此操作的一种方法是使用parse_dates和date_parser参数,告诉parse_dates组合日期和时间列,并定义一个内联函数来解析日期:
>>> df = pd.read_csv("bindat.csv", parse_dates=[["date", "time"]],
date_parser=lambda x: pd.to_datetime(x, format="%Y%m%d %H%M"),
index_col="date_time")
>>> df
open high low close volume splits earnings dividends sym
date_time
2013-06-25 07:15:00 49.2634 49.2634 49.2634 49.2634 156.293 1 0 0 JPM
2013-06-25 07:30:00 49.2730 49.2730 49.2730 49.2730 208.390 1 0 0 JPM
2013-06-25 07:40:00 49.1866 49.1866 49.1866 49.1866 224.019 1 0 0 JPM
2013-06-25 07:45:00 49.3210 49.3210 49.3210 49.3210 208.390 1 0 0 JPM
2013-06-25 07:50:00 49.3306 49.3690 49.3306 49.3690 4583.540 1 0 0 JPM
2013-06-25 07:55:00 49.3690 49.3690 49.3690 49.3690 416.780 1 0 0 JPM
2013-06-25 08:00:00 49.3690 49.3690 49.3594 49.3594 1715.050 1 0 0 JPM
2013-06-25 08:05:00 49.3690 49.3690 49.3306 49.3306 1333.700 1 0 0 JPM
2013-06-25 08:10:00 49.3306 49.3786 49.3306 49.3786 1567.090 1 0 0 JPM
2013-06-25 16:10:00 49.3306 49.3786 49.3306 49.3786 1567.090 1 0 0 JPM
我在末尾添加了一行,以确保时间正常。有很多方法可以做到这一点。在read_csv期间执行此操作的一种方法是使用parse_dates和date_parser参数,告诉parse_dates组合日期和时间列,并定义一个内联函数来解析日期:
>>> df = pd.read_csv("bindat.csv", parse_dates=[["date", "time"]],
date_parser=lambda x: pd.to_datetime(x, format="%Y%m%d %H%M"),
index_col="date_time")
>>> df
open high low close volume splits earnings dividends sym
date_time
2013-06-25 07:15:00 49.2634 49.2634 49.2634 49.2634 156.293 1 0 0 JPM
2013-06-25 07:30:00 49.2730 49.2730 49.2730 49.2730 208.390 1 0 0 JPM
2013-06-25 07:40:00 49.1866 49.1866 49.1866 49.1866 224.019 1 0 0 JPM
2013-06-25 07:45:00 49.3210 49.3210 49.3210 49.3210 208.390 1 0 0 JPM
2013-06-25 07:50:00 49.3306 49.3690 49.3306 49.3690 4583.540 1 0 0 JPM
2013-06-25 07:55:00 49.3690 49.3690 49.3690 49.3690 416.780 1 0 0 JPM
2013-06-25 08:00:00 49.3690 49.3690 49.3594 49.3594 1715.050 1 0 0 JPM
2013-06-25 08:05:00 49.3690 49.3690 49.3306 49.3306 1333.700 1 0 0 JPM
2013-06-25 08:10:00 49.3306 49.3786 49.3306 49.3786 1567.090 1 0 0 JPM
2013-06-25 16:10:00 49.3306 49.3786 49.3306 49.3786 1567.090 1 0 0 JPM
我在末尾添加了一行,以确保时间正常。时间数据是什么?715代表0715或1155还是…?对不起,它代表0715,但数据就是这样显示的时间数据是什么?715代表0715或1155还是…?抱歉,它代表0715,但这就是数据工作的完美方式!非常感谢。工作做得很好!非常感谢你。