Python 使用SQLAlchemy将时区感知datetime64[ns]插入MySQL

Python 使用SQLAlchemy将时区感知datetime64[ns]插入MySQL,python,mysql,pandas,sqlalchemy,Python,Mysql,Pandas,Sqlalchemy,假设以下pandas.DataFrame In [108]: import pandas In [109]: import numpy as np In [110]: import sqlalchemy as sql In [111]: df = pandas.DataFrame(np.random.randn(8, 2), columns=['a', 'b']) In [112]: df['DateTime'] = pandas.date_range('2015-01-01', '

假设以下
pandas.DataFrame

In  [108]: import pandas
In  [109]: import numpy as np
In  [110]: import sqlalchemy as sql

In  [111]: df = pandas.DataFrame(np.random.randn(8, 2), columns=['a', 'b'])
In  [112]: df['DateTime'] = pandas.date_range('2015-01-01', '2015-01-08', tz='US/Eastern')
In  [113]: df.dtypes

Out [113]:
a                              float64
b                              float64
DateTime    datetime64[ns, US/Eastern]
dtype: object

# creation of connection alchemy connection string omitted

In  [114]: dtypes_ = {
     ...:    'a': sql.Float(precision=4),
     ...:    'b': sql.Float(precision=4),
     ...:    'DateTime': sql.DateTime(timezone=True)
     ...: }

In  [115]: df.to_sql(
     ...:     MYSQL_TABLE,
     ...:     conn,
     ...:     flavor='mysql',
     ...:     schema=MSYQL_SCHEMA,
     ...:     if_exists='append',
     ...:     index=False,
     ...:     index_label=None,
     ...:     chunksize=None,
     ...:     dtype=dtypes_
     ...: )
此代码引发以下异常(包括上次回溯):

我看到过一些关于将
datetime64[ns,US/Eastern]
强制为字符串并插入的帖子。我希望在表中有正确的字段类型,而不是使用hack。此外,这似乎应该有效

注意
datetime64[ns,美国/东部]
不是
数据帧的索引


有没有建议如何使用SQLALchemy将时区感知的数据类型插入MySQL?

我建议将您的本地时区转换为UTC,将转换后的时间戳保存为常规的
datetime64
(不含时区),当您从DB读回时,将其转换回您的本地时区

演示:

它产生:

In [230]: df.dtypes
Out[230]:
a                  float64
b                  float64
DateTime    datetime64[ns]  # NOTE: there is _no_ TZ info
dtype: object

In [231]: df
Out[231]:
          a         b            DateTime
0  0.050288  0.045425 2014-12-31 23:00:00
1  0.603057 -0.443899 2015-01-01 23:00:00
2 -0.874863 -1.185011 2015-01-02 23:00:00
3  0.446314 -0.301012 2015-01-03 23:00:00
4 -0.267889 -0.819698 2015-01-04 23:00:00
5 -0.888317  0.189641 2015-01-05 23:00:00
6 -0.985719 -0.962523 2015-01-06 23:00:00
7 -0.736928 -0.379683 2015-01-07 23:00:00
现在让我们将DF保存到MySQL数据库

db_connection = 'mysql+pymysql://mysql_user:mysql_password@mysql_host/mysql_db'
engine = create_engine(db_connection)
#engine.execute("set time_zone='US/Eastern'")   # this trick didn't work for me

df.to_sql('test_table_index', engine, if_exists='replace', index=False)
签入MySQL数据库:

mysql> select * from aaa;
+--------------------+--------------------+---------------------+
| a                  | b                  | DateTime            |
+--------------------+--------------------+---------------------+
| 0.0502883957484278 |  0.045424787582407 | 2014-12-31 23:00:00 |
|  0.603057085374334 | -0.443899474872308 | 2015-01-01 23:00:00 |
| -0.874862846879629 |  -1.18501101907713 | 2015-01-02 23:00:00 |
|  0.446314112615487 |   -0.3010118937233 | 2015-01-03 23:00:00 |
| -0.267889181254187 | -0.819698158571756 | 2015-01-04 23:00:00 |
| -0.888316926203869 |     0.189640636565 | 2015-01-05 23:00:00 |
| -0.985719317488699 | -0.962523458724807 | 2015-01-06 23:00:00 |
| -0.736928170623884 |  -0.37968341793291 | 2015-01-07 23:00:00 |
+--------------------+--------------------+---------------------+
8 rows in set (0.00 sec)
# read data back from MySQL
new = pd.read_sql('select * from aaa', engine)
让我们从MySQL数据库中读回:

mysql> select * from aaa;
+--------------------+--------------------+---------------------+
| a                  | b                  | DateTime            |
+--------------------+--------------------+---------------------+
| 0.0502883957484278 |  0.045424787582407 | 2014-12-31 23:00:00 |
|  0.603057085374334 | -0.443899474872308 | 2015-01-01 23:00:00 |
| -0.874862846879629 |  -1.18501101907713 | 2015-01-02 23:00:00 |
|  0.446314112615487 |   -0.3010118937233 | 2015-01-03 23:00:00 |
| -0.267889181254187 | -0.819698158571756 | 2015-01-04 23:00:00 |
| -0.888316926203869 |     0.189640636565 | 2015-01-05 23:00:00 |
| -0.985719317488699 | -0.962523458724807 | 2015-01-06 23:00:00 |
| -0.736928170623884 |  -0.37968341793291 | 2015-01-07 23:00:00 |
+--------------------+--------------------+---------------------+
8 rows in set (0.00 sec)
# read data back from MySQL
new = pd.read_sql('select * from aaa', engine)
现在在UTC TZ中

In [221]: new
Out[221]:
          a         b            DateTime
0  0.050288  0.045425 2014-12-31 23:00:00
1  0.603057 -0.443899 2015-01-01 23:00:00
2 -0.874863 -1.185011 2015-01-02 23:00:00
3  0.446314 -0.301012 2015-01-03 23:00:00
4 -0.267889 -0.819698 2015-01-04 23:00:00
5 -0.888317  0.189641 2015-01-05 23:00:00
6 -0.985719 -0.962523 2015-01-06 23:00:00
7 -0.736928 -0.379683 2015-01-07 23:00:00
将时间戳从UTC转换为我的本地时间:

new['DateTime'] = new['DateTime'].dt.tz_localize('UTC').dt.tz_convert(mytz)


In [223]: new
Out[223]:
          a         b                  DateTime
0  0.050288  0.045425 2015-01-01 00:00:00+01:00
1  0.603057 -0.443899 2015-01-02 00:00:00+01:00
2 -0.874863 -1.185011 2015-01-03 00:00:00+01:00
3  0.446314 -0.301012 2015-01-04 00:00:00+01:00
4 -0.267889 -0.819698 2015-01-05 00:00:00+01:00
5 -0.888317  0.189641 2015-01-06 00:00:00+01:00
6 -0.985719 -0.962523 2015-01-07 00:00:00+01:00
7 -0.736928 -0.379683 2015-01-08 00:00:00+01:00

回答得好。我读了很多关于这方面的文章,MySQL不支持字段内的时区偏移,这就是它失败的原因。这是一个使用内置代码并运行良好的解决方案(在少数几个解决方案中)。