Python ch更快。@RickJames我将测试将time添加到SELECT所使用的索引中,但我怀疑我是否会看到巨大的性能提升。我已经能够非常快速地运行我的SELECT语句。如果有大的改进,我一定会让你知道的。 create table `my_table` (

Python ch更快。@RickJames我将测试将time添加到SELECT所使用的索引中,但我怀疑我是否会看到巨大的性能提升。我已经能够非常快速地运行我的SELECT语句。如果有大的改进,我一定会让你知道的。 create table `my_table` ( ,python,pandas,innodb,mysql-python,Python,Pandas,Innodb,Mysql Python,ch更快。@RickJames我将测试将time添加到SELECT所使用的索引中,但我怀疑我是否会看到巨大的性能提升。我已经能够非常快速地运行我的SELECT语句。如果有大的改进,我一定会让你知道的。 create table `my_table` ( `time` int(10) unsigned not null, `key1` int(10) unsigned not null, `key3` char(3) unsigned not null, `key2` char(2


ch更快。@RickJames我将测试将
time
添加到
SELECT
所使用的索引中,但我怀疑我是否会看到巨大的性能提升。我已经能够非常快速地运行我的
SELECT
语句。如果有大的改进,我一定会让你知道的。
create table `my_table` (
  `time` int(10) unsigned not null,
  `key1` int(10) unsigned not null,
  `key3` char(3) unsigned not null,
  `key2` char(2) unsigned not null,
  `value1` float default null,
  `value2` float default null,
  primary key (`key1`, `key2`, `key3`, `time`),
  key (`key3`, `key2`, `key1`, `time`)
) engine=InnoDB default character set ascii
partition by range(time) (
  partition start        values less than (0),
  partition from20180101 values less than (unix_timestamp('2018-02-01')),
  partition from20180201 values less than (unix_timestamp('2018-03-01')),
  ...,
  partition future       values less than MAX_VALUE
)
import random
import pandas as pd
key2_values = ["aaa", "bbb", ..., "ttt"]  # 20 distinct values
key3_values = ["aa", "ab", "ac", ..., "az", "bb", "bc", ..., "by"]  # 50 distinct values
df = pd.DataFrame([], columns=["key1", "key2", "key3", "value2", "value1"])
idx = 0
for x in range(0, 500):
    for y in range(0, 20):
        for z in range(0, 50):
            df.loc[idx] = [x, key2_values[y], key3_values[z], random.random(), random.random()]
            idx += 1
df.set_index(["key1", "key2", "key3"], inplace=True)
import time
import MySQLdb
conn = MySQLdb.connect(local_infile=1, **connection_params)
cur = conn.cursor()
# Disable data integrity checks -- I know the data is good
cur.execute("SET foreign_key_checks=0;")
cur.execute("SET unique_checks=0;")
# Append current time to the DataFrame
df["time"] = time.time()
df.set_index(["time"], append=True, inplace=True)
# Sort data in primary key order
df.sort_index(inplace=True)
# Dump the data to a CSV
with open("dump.csv", "w") as csv:
    df.to_csv(csv)
# Load the data
cur.execute(
    """
        load data local infile 'dump.csv'
        into table `my_table`
        fields terminated by ','
        enclosed by '"'
        lines terminated by '\n'
        ignore 1 lines
        (`key1`, `key2`, `key3`, `time`, `value`)
    """
)
# Clean up
cur.execute("SET foreign_key_checks=1;")
cur.execute("SET unique_checks=1;")
conn.commit()
# Append current time to the DataFrame
df["time"] = time.time
PRIMARY KEY (`time`, `key1`, `key2`, `key3`),
KEY (`key1`, `key2`, `key3`)
df.reindex(columns=["value1", "value2"], inplace=True)