Jdbc Netezza批插入即使在批执行模式下也非常慢_Jdbc_Netezza

Jdbc Netezza批插入即使在批执行模式下也非常慢

jdbc

Jdbc Netezza批插入即使在批执行模式下也非常慢,jdbc,netezza,Jdbc,Netezza,我指的是这份文件。根据文章，如果我们使用executeBatch方法，那么插入会更快（Netezza JDBC驱动程序可能会检测到批量插入，并在封面下将其转换为外部表加载，外部表加载会更快）。我必须执行数以百万计的insert语句，并且每个连接最多只能获得500条记录/分钟的速度。有没有更好的方法通过jdbc连接将数据更快地加载到netezza？我正在使用spark和jdbc连接插入记录。为什么即使在批量执行时，也没有通过加载外部表。下面是我使用的spark代码 Dataset<Strin

我指的是这份文件。根据文章，如果我们使用executeBatch方法，那么插入会更快（Netezza JDBC驱动程序可能会检测到批量插入，并在封面下将其转换为外部表加载，外部表加载会更快）。我必须执行数以百万计的insert语句，并且每个连接最多只能获得500条记录/分钟的速度。有没有更好的方法通过jdbc连接将数据更快地加载到netezza？我正在使用spark和jdbc连接插入记录。为什么即使在批量执行时，也没有通过加载外部表。下面是我使用的spark代码

Dataset<String> insertQueryDataSet.foreachPartition( partition -> {                 
    Connection conn = NetezzaConnector.getSingletonConnection(url, userName, pwd);
    conn.setAutoCommit(false);
    int commitBatchCount = 0;
    int insertBatchCount = 0;
    Statement statement = conn.createStatement();
    //PreparedStatement preparedStmt  = null;
    while(partition.hasNext()){
        insertBatchCount++;             
        //preparedStmt = conn.prepareStatement(partition.next());
        statement.addBatch(partition.next());
        //statement.addBatch(partition.next());
        commitBatchCount++;
        if(insertBatchCount % 10000 == 0){
            LOGGER.info("Before executeBatch.");                            
            int[] execCount = statement.executeBatch();
            LOGGER.info("After execCount." + execCount.length);                     
            LOGGER.info("Before commit.");
            conn.commit();                          
            LOGGER.info("After commit.");                                                   
        }                       
    }
    //execute remaining statements
    statement.executeBatch();
    int[] execCount = statement.executeBatch();
    LOGGER.info("After execCount." + execCount.length);
    conn.commit();          
    conn.close();

});

Dataset insertQueryDataSet.foreachPartition（分区->{
Connection conn=NetezzaConnector.getSingletonConnection（url、用户名、pwd）；
连接设置自动提交（错误）；
int commitBatchCount=0；
int insertBatchCount=0；
语句Statement=conn.createStatement（）；
//PreparedStatement preparedStmt=null；
while（partition.hasNext（））{
insertBatchCount++；
//preparedStmt=conn.prepareStatement（partition.next（））；
语句.addBatch（partition.next（））；
//语句.addBatch（partition.next（））；
commitBatchCount++；
如果（insertBatchCount%10000==0）{
LOGGER.info（“在executeBatch之前”）；
int[]execCount=statement.executeBatch（）；
LOGGER.info（“在execCount之后。”+execCount.length）；
LOGGER.info（“提交前”）；
conn.commit（）；
LOGGER.info（“提交后”）；
}                       
}
//执行其余语句
语句。executeBatch（）；
int[]execCount=statement.executeBatch（）；
LOGGER.info（“在execCount之后。”+execCount.length）；
conn.commit（）；
康涅狄格州关闭（）；
});

我尝试了这种方法（批量插入），但发现速度非常慢，所以我把所有数据放在CSV中，并为每个CSV加载外部表

InsertReq="Insert into "+ tablename + " select * from external '"+ filepath + "' using (maxerrors 0, delimiter ',' unase 2000 encoding 'internal' remotesource 'jdbc' escapechar '\' )";
Jdbctemplate.execute(InsertReq);

因为我使用java-so-JDBC作为源代码&注意csv文件路径是单引号。希望这有帮助。如果您发现比此方法更好，请不要忘记发布。：）