Python 将数据帧转换为numpy数组时如何保持数据类型?
我有一个数据帧,希望将其转换为numpy数组以绘制其值。数据帧如下所示:Python 将数据帧转换为numpy数组时如何保持数据类型?,python,pandas,numpy,datetime,datetime64,Python,Pandas,Numpy,Datetime,Datetime64,我有一个数据帧,希望将其转换为numpy数组以绘制其值。数据帧如下所示: >>> df_ohlc open high low close Date 2018-03-07 03:35:00 62.189999 62.189999 62.169998 62.180000 20
>>> df_ohlc
open high low close
Date
2018-03-07 03:35:00 62.189999 62.189999 62.169998 62.180000
2018-03-07 03:36:00 62.180000 62.180000 62.160000 62.180000
2018-03-07 03:37:00 62.169998 62.220001 62.169998 62.209999
2018-03-07 03:38:00 62.220001 62.220001 62.189999 62.200001
...
[480 rows x 4 columns]
>>> df_ohlc.index
DatetimeIndex(['2018-03-07 03:35:00', '2018-03-07 03:36:00',
'2018-03-07 03:37:00', '2018-03-07 03:38:00',
'2018-03-07 03:39:00', '2018-03-07 03:40:00',
'2018-03-07 03:41:00', '2018-03-07 03:42:00',
'2018-03-07 03:43:00', '2018-03-07 03:44:00',
...
'2018-03-07 11:25:00', '2018-03-07 11:26:00',
'2018-03-07 11:27:00', '2018-03-07 11:28:00',
'2018-03-07 11:29:00', '2018-03-07 11:30:00',
'2018-03-07 11:31:00', '2018-03-07 11:32:00',
'2018-03-07 11:33:00', '2018-03-07 11:34:00'],
dtype='datetime64[ns]', name='Date', length=480, freq='T')
>>> df_ohlc.index[0]
Timestamp('2018-03-07 03:35:00', freq='T') # and why is it Timestamp when it said ```dtype=datetime64[ns]```` right before?
但是当我尝试转换它时,索引类型(日期列)从datetime64[ns]
更改为Timestamp
>>> df_ohlc.reset_index().values
array([[Timestamp('2018-03-07 03:35:00'), 62.189998626708984,
62.189998626708984, 62.16999816894531, 62.18000030517578],
[Timestamp('2018-03-07 03:36:00'), 62.18000030517578,
62.18000030517578, 62.15999984741211, 62.18000030517578],
[Timestamp('2018-03-07 03:37:00'), 62.16999816894531,
62.220001220703125, 62.16999816894531, 62.209999084472656],
...,
[Timestamp('2018-03-07 11:32:00'), 61.939998626708984,
61.95000076293945, 61.93000030517578, 61.93000030517578],
[Timestamp('2018-03-07 11:33:00'), 61.93000030517578,
61.939998626708984, 61.900001525878906, 61.90999984741211],
[Timestamp('2018-03-07 11:34:00'), 61.90999984741211,
61.91999816894531, 61.900001525878906, 61.91999816894531]], dtype=object)
为什么会发生这种情况?如何将类型保持为datetime64
我尝试分离数据帧的索引,然后将其与值连接起来,但它显示了一个错误。我想知道我做错了什么
>>> index_ohlc = np.array([ df_ohlc.index.values.astype('datetime64[s]'), ]).T
>>> index_ohlc.shape
(480, 1)
>>> value_ohlc = df_ohlc.values
>>> value_ohlc.shape
(480, 4)
>>> type(index_ohlc)
<class 'numpy.ndarray'>
>>> type(value_ohlc)
<class 'numpy.ndarray'>
>>> new_array = np.concatenate( (index_ohlc, value_ohlc), axis = 1 )
Traceback (most recent call last):
File "<console>", line 1, in <module>
TypeError: invalid type promotion
index_ohlc=np.array([df_ohlc.index.values.astype('datetime64[s]'),]).T
>>>索引形状
(480, 1)
>>>value_ohlc=df_ohlc.values
>>>价值观
(480, 4)
>>>类型(索引\u ohlc)
>>>类型(值\u ohlc)
>>>新数组=np。连接((索引,值),轴=1)
回溯(最近一次呼叫最后一次):
文件“”,第1行,在
TypeError:无效的类型升级
试试看
演示
from pandas import Timestamp
df = pd.DataFrame(np.array([[Timestamp('2018-03-07 03:35:00'), 62.189998626708984,
62.189998626708984, 62.16999816894531, 62.18000030517578],
[Timestamp('2018-03-07 03:36:00'), 62.18000030517578,
62.18000030517578, 62.15999984741211, 62.18000030517578],
[Timestamp('2018-03-07 03:37:00'), 62.16999816894531,
62.220001220703125, 62.16999816894531, 62.209999084472656]]))
dt = np.dtype([("Date", 'datetime64[ns]'),
("f1", np.float64),
("f2", np.float64),
("f3", np.float64),
("f4", np.float64)])
arr = np.array([tuple(v) for v in df.values.tolist()], dtype=dt)
array([('2018-03-07T03:35:00.000000000', 62.18999863, 62.18999863, 62.16999817, 62.18000031),
('2018-03-07T03:36:00.000000000', 62.18000031, 62.18000031, 62.15999985, 62.18000031),
('2018-03-07T03:37:00.000000000', 62.16999817, 62.22000122, 62.16999817, 62.20999908)],
dtype=[('Date', '<M8[ns]'), ('f1', '<f8'), ('f2', '<f8'), ('f3', '<f8'), ('f4', '<f8')])
从导入时间戳
df=pd.数据帧(np.数组([[时间戳('2018-03-07 03:35:00'),62.189998626708984,
62.189998626708984, 62.16999816894531, 62.18000030517578],
[时间戳('2018-03-07 03:36:00'),62.18000030517578,
62.18000030517578, 62.15999984741211, 62.18000030517578],
[时间戳('2018-03-07 03:37:00'),62.16999816894531,
62.220001220703125, 62.16999816894531, 62.209999084472656]]))
dt=np.dtype([(“日期”,“日期时间64[ns]”),
(“f1”,np.64),
(“f2”,np.64),
(“f3”,np.64),
(“f4”,np.64)])
arr=np.array([df.values.tolist()中v的元组(v),dtype=dt)
数组([('2018-03-07T03:35:00.000000000',62.18999863,62.18999863,62.16999817,62.18000031),
('2018-03-07T03:36:00.000000000',62.18000031,62.18000031,62.15999985,62.18000031),
('2018-03-07T03:37:00.000000000',62.16999817,62.22000122,62.16999817,62.20999908)],
dtype=[('Date','只要数组有混合类型(datetime和float),那么它的dtype就不会是对象以外的任何类型。我建议将索引与值分开。@cᴏʟᴅsᴘᴇᴇᴅ 谢谢你的建议。我想我已经试过你所说的,并且得到了一个TypeError
。你知道是什么引起的吗?不,我没有产生这个错误的代码…我帮了你一个忙,删除了第一个问题之后的第二个不相关的问题。你可以把它作为一个单独的主题发布(实际上比第一个问题更容易回答)。@John Zwinck谢谢!