Python 将大型数据帧加载到Vertica
我有一个相当大的数据帧(500k+行),我正试图加载到Vertica。我有下面的代码,但速度非常慢Python 将大型数据帧加载到Vertica,python,pandas,vertica,Python,Pandas,Vertica,我有一个相当大的数据帧(500k+行),我正试图加载到Vertica。我有下面的代码,但速度非常慢 #convert df to list format lists = output_final.values.tolist() #make insert string insert_qry = " INSERT INTO SCHEMA.TABLE(DATE,ID, SCORE) VALUES (%s,%s,%s) " # load into database for i in range(le
#convert df to list format
lists = output_final.values.tolist()
#make insert string
insert_qry = " INSERT INTO SCHEMA.TABLE(DATE,ID, SCORE) VALUES (%s,%s,%s) "
# load into database
for i in range(len(lists)):
cur.execute(insert_qry, lists[i])
conn_info.commit()
我看到过一些帖子讨论使用COPY而不是EXECUTE来完成这么大的负载,但还没有找到一个好的工作示例 经过反复试验。。。我发现以下几点对我有用
# insert statements
copy_str = "COPY SCHEMA.TABLE(DATE,ID, SCORE)FROM STDIN DELIMITER ','"
# turn the df into a csv-like object
stream = io.StringIO()
contact_output_final.to_csv(stream, sep=",",index=False, header=False)
# reset the position of the stream variable
stream.seek(0)
# load to data
with conn_info.cursor() as cursor:
cur.copy(copy_str,stream.getvalue())
conn_info.commit()