Java 非常慢的JDBC查询 问题

Java 非常慢的JDBC查询 问题,java,oracle,jdbc,Java,Oracle,Jdbc,我有一个JDBC查询,它的运行速度比Python中的相同查询(使用cx\U Oracle)慢8-20倍。我希望语言之间有正常的%差异,但速度慢8倍或更多似乎是我做错了什么 我不包括实际的查询,因为它显示了一些业务逻辑(表结构、命名等),但我认为它不应该非常相关,因为我比较的是Python和Java的执行时间,而不是试图分析查询本身。但是,它是跨越7个表的左连接,并从这些表的25列中提取值 该查询针对具有数百万行的表运行,最终按预期返回28705行 Python代码实际上比Java代码做得更多,因

我有一个JDBC查询,它的运行速度比Python中的相同查询(使用cx\U Oracle)慢8-20倍。我希望语言之间有正常的%差异,但速度慢8倍或更多似乎是我做错了什么

我不包括实际的查询,因为它显示了一些业务逻辑(表结构、命名等),但我认为它不应该非常相关,因为我比较的是Python和Java的执行时间,而不是试图分析查询本身。但是,它是跨越7个表的左连接,并从这些表的25列中提取值

该查询针对具有数百万行的表运行,最终按预期返回28705行

Python代码实际上比Java代码做得更多,因为在Python中,我循环结果并将值提取到对象中。对于Java代码,我只是在ResultSet上循环以获取数据,但除此之外,我不会以任何方式处理数据/行

我希望Java代码中缺少一些明显/不明显的设置,使其更符合Python执行时间

我试过的 我已经尝试了我所认为的显而易见的解决方案,这也是在其他帖子中要解决的主要问题。这些措施包括:

  • 更新Oracle JDBC驱动程序
  • 尝试获取大小的不同值
  • 运行更新版本的Java(12对8)
代码和时间 我有各种配置的计时,它们的运行速度都比Python代码慢很多倍

注意为了简洁起见,输出会稍微清理,以删除每行的驱动程序版本

Java代码:

    public void runJdbc() throws Exception {
        Connection conn = DriverManager.getConnection(url, username, password);

        // Pull driver version info
        OracleDatabaseMetaData meta = (OracleDatabaseMetaData)(conn.getMetaData());
        String name = meta.getDriverName();
        String version = meta.getDriverVersion();
        String driverInfo = name + '.' + version;

        // Run the query for each fetch size and time operations
        int sizing[] = {100, 1000, 5000, 10000, 25000};
        for(int s : sizing) {
            Statement stmt = conn.createStatement();
            stmt.setFetchSize(s);

            // query
            long start = System.currentTimeMillis();
            ResultSet rs = stmt.executeQuery(query);
            long querySeconds = (System.currentTimeMillis() - start) / 1000;

            // iterate over results just to fetch them, no additional processing for now
            int count = 0;
            long start2 = System.currentTimeMillis();
            while ( rs.next() ) {
                count++;
            }
            long rsSeconds = (System.currentTimeMillis() - start) / 1000;

            System.out.println("Execution completed: driver=" + driverInfo + ", fetchSize=" + s + ", rows=" + count + ", query.seconds=" + querySeconds + ", resultSet.seconds=" + rsSeconds );
        }

        conn.close();
    }
执行时间:

Python:

    Completed query. rows=28705, queryTime=11.48
    --- 13.22 seconds ---
期望 我希望JDBC的执行在时间上更接近Python的执行,可能是+/-25%。我实际看到的是JDBC查询比python查询慢8-20倍

最糟糕的情况是,我在Java代码中从ResultSet提取数据,运行速度比Python代码慢20倍。如果我只是在Java中循环结果集,那么它可能会慢8倍

更新
基于这些注释,我修改了代码和命令,以修复时间并增加堆,与Python相比,总体运行时仍然非常差

代码更改:

    long rsSeconds = (System.currentTimeMillis() - start2) / 1000;
    long overall = (System.currentTimeMillis() - start) / 1000;
    stmt.close();

    System.out.println("Execution completed: driver=" + driverInfo + ", fetchSize=" + s + ", rows=" + count + ", query.seconds=" + querySeconds + ", resultSet.seconds=" + rsSeconds + ", overall.seconds=" + overall );
时间:

java -Xms2g -Xmx32g -jar target/JdbcTest-1.0-SNAPSHOT-shaded.jar 

Execution completed: driver=Oracle JDBC driver.19.3.0.0.0, fetchSize=100, rows=28705, query.seconds=1, resultSet.seconds=128, overall.seconds=129
Execution completed: driver=Oracle JDBC driver.19.3.0.0.0, fetchSize=1000, rows=28705, query.seconds=5, resultSet.seconds=120, overall.seconds=126
Execution completed: driver=Oracle JDBC driver.19.3.0.0.0, fetchSize=5000, rows=28705, query.seconds=19, resultSet.seconds=89, overall.seconds=108
Execution completed: driver=Oracle JDBC driver.19.3.0.0.0, fetchSize=10000, rows=28705, query.seconds=36, resultSet.seconds=69, overall.seconds=105
Execution completed: driver=Oracle JDBC driver.19.3.0.0.0, fetchSize=25000, rows=28705, query.seconds=92, resultSet.seconds=13, overall.seconds=105
更新-Python代码 Python代码是:

import cx_Oracle
import time

class DataObject:
    def __init__(self, columns, row):
        index = 0

        for key in columns:
            value = row[index]
            index += 1

            setattr(self, key, value)

    def __str__(self):
        s = ''
        for k, v in self.__dict__.items():
            s += f'{k}={v}, '
        return s

class Oracle:
    def __init__(self, args):
        self.args = args

    def run(self):
        start = time.time()

        # connect
        conn = cx_Oracle.connect(self.args.ora_uid, self.args.ora_pwd, self.args.ora_dsn)
        cursor = conn.cursor()

        # run the query
        cursor.execute(self.args.query)
        columns = [column[0] for column in cursor.description]

        # fetch results
        results = cursor.fetchall()
        count = cursor.rowcount
        print(f'Query returned {count} rows')

        # create objects from the data
        dao = []
        for r in results:
            dao.append(DataObject(columns, r))

        # cleanup
        cursor.close()
        elapsed = time.time() - start
        print(f'Query completed. rows={len(dao)}, seconds={elapsed:.2f}')

        # output to validate
        for d in dao:
            print(d)

        return(columns, dao)
输出为:

查询返回了28705行
查询已完成。行数=28705,秒数=11.34

驱动程序连接字符串
增加堆大小。记住关闭语句(这也将自动关闭它们各自的结果集)。另外,您的第二次计时计算应该使用
start2
,而不是
start
。对于那次计时错误,我很抱歉,我正试图设置一个简单的测试复制案例,但没有成功。它明确地解释了查询与结果集计时的增加,但与Python相比的总体运行时间仍然是一个问题。我将修复该问题,关闭语句,增加堆并重试。谢谢。根据这些评论,我修改了代码和命令来修正时间并增加堆,与Python相比,总体运行时仍然很差。由于大小限制,发布顶级注释。您是否可以添加python代码,因为这是您要比较的基准?
java -Xms2g -Xmx32g -jar target/JdbcTest-1.0-SNAPSHOT-shaded.jar 

Execution completed: driver=Oracle JDBC driver.19.3.0.0.0, fetchSize=100, rows=28705, query.seconds=1, resultSet.seconds=128, overall.seconds=129
Execution completed: driver=Oracle JDBC driver.19.3.0.0.0, fetchSize=1000, rows=28705, query.seconds=5, resultSet.seconds=120, overall.seconds=126
Execution completed: driver=Oracle JDBC driver.19.3.0.0.0, fetchSize=5000, rows=28705, query.seconds=19, resultSet.seconds=89, overall.seconds=108
Execution completed: driver=Oracle JDBC driver.19.3.0.0.0, fetchSize=10000, rows=28705, query.seconds=36, resultSet.seconds=69, overall.seconds=105
Execution completed: driver=Oracle JDBC driver.19.3.0.0.0, fetchSize=25000, rows=28705, query.seconds=92, resultSet.seconds=13, overall.seconds=105
import cx_Oracle
import time

class DataObject:
    def __init__(self, columns, row):
        index = 0

        for key in columns:
            value = row[index]
            index += 1

            setattr(self, key, value)

    def __str__(self):
        s = ''
        for k, v in self.__dict__.items():
            s += f'{k}={v}, '
        return s

class Oracle:
    def __init__(self, args):
        self.args = args

    def run(self):
        start = time.time()

        # connect
        conn = cx_Oracle.connect(self.args.ora_uid, self.args.ora_pwd, self.args.ora_dsn)
        cursor = conn.cursor()

        # run the query
        cursor.execute(self.args.query)
        columns = [column[0] for column in cursor.description]

        # fetch results
        results = cursor.fetchall()
        count = cursor.rowcount
        print(f'Query returned {count} rows')

        # create objects from the data
        dao = []
        for r in results:
            dao.append(DataObject(columns, r))

        # cleanup
        cursor.close()
        elapsed = time.time() - start
        print(f'Query completed. rows={len(dao)}, seconds={elapsed:.2f}')

        # output to validate
        for d in dao:
            print(d)

        return(columns, dao)
private static final String url = "jdbc:oracle:thin:@(DESCRIPTION=(LOAD_BALANCE=on)(FAILOVER=ON)(ADDRESS=(PROTOCOL=TCP)(HOST=***)(PORT=***))(CONNECT_DATA=(SERVICE_NAME=***)(SERVER=DEDICATED)))";