Java 非常慢的JDBC查询 问题
我有一个JDBC查询,它的运行速度比Python中的相同查询(使用cx\U Oracle)慢8-20倍。我希望语言之间有正常的%差异,但速度慢8倍或更多似乎是我做错了什么 我不包括实际的查询,因为它显示了一些业务逻辑(表结构、命名等),但我认为它不应该非常相关,因为我比较的是Python和Java的执行时间,而不是试图分析查询本身。但是,它是跨越7个表的左连接,并从这些表的25列中提取值 该查询针对具有数百万行的表运行,最终按预期返回28705行 Python代码实际上比Java代码做得更多,因为在Python中,我循环结果并将值提取到对象中。对于Java代码,我只是在ResultSet上循环以获取数据,但除此之外,我不会以任何方式处理数据/行 我希望Java代码中缺少一些明显/不明显的设置,使其更符合Python执行时间 我试过的 我已经尝试了我所认为的显而易见的解决方案,这也是在其他帖子中要解决的主要问题。这些措施包括:Java 非常慢的JDBC查询 问题,java,oracle,jdbc,Java,Oracle,Jdbc,我有一个JDBC查询,它的运行速度比Python中的相同查询(使用cx\U Oracle)慢8-20倍。我希望语言之间有正常的%差异,但速度慢8倍或更多似乎是我做错了什么 我不包括实际的查询,因为它显示了一些业务逻辑(表结构、命名等),但我认为它不应该非常相关,因为我比较的是Python和Java的执行时间,而不是试图分析查询本身。但是,它是跨越7个表的左连接,并从这些表的25列中提取值 该查询针对具有数百万行的表运行,最终按预期返回28705行 Python代码实际上比Java代码做得更多,因
- 更新Oracle JDBC驱动程序
- 尝试获取大小的不同值
- 运行更新版本的Java(12对8)
public void runJdbc() throws Exception {
Connection conn = DriverManager.getConnection(url, username, password);
// Pull driver version info
OracleDatabaseMetaData meta = (OracleDatabaseMetaData)(conn.getMetaData());
String name = meta.getDriverName();
String version = meta.getDriverVersion();
String driverInfo = name + '.' + version;
// Run the query for each fetch size and time operations
int sizing[] = {100, 1000, 5000, 10000, 25000};
for(int s : sizing) {
Statement stmt = conn.createStatement();
stmt.setFetchSize(s);
// query
long start = System.currentTimeMillis();
ResultSet rs = stmt.executeQuery(query);
long querySeconds = (System.currentTimeMillis() - start) / 1000;
// iterate over results just to fetch them, no additional processing for now
int count = 0;
long start2 = System.currentTimeMillis();
while ( rs.next() ) {
count++;
}
long rsSeconds = (System.currentTimeMillis() - start) / 1000;
System.out.println("Execution completed: driver=" + driverInfo + ", fetchSize=" + s + ", rows=" + count + ", query.seconds=" + querySeconds + ", resultSet.seconds=" + rsSeconds );
}
conn.close();
}
执行时间:
Python:
Completed query. rows=28705, queryTime=11.48
--- 13.22 seconds ---
期望
我希望JDBC的执行在时间上更接近Python的执行,可能是+/-25%。我实际看到的是JDBC查询比python查询慢8-20倍
最糟糕的情况是,我在Java代码中从ResultSet提取数据,运行速度比Python代码慢20倍。如果我只是在Java中循环结果集,那么它可能会慢8倍
更新
基于这些注释,我修改了代码和命令,以修复时间并增加堆,与Python相比,总体运行时仍然非常差 代码更改:
long rsSeconds = (System.currentTimeMillis() - start2) / 1000;
long overall = (System.currentTimeMillis() - start) / 1000;
stmt.close();
System.out.println("Execution completed: driver=" + driverInfo + ", fetchSize=" + s + ", rows=" + count + ", query.seconds=" + querySeconds + ", resultSet.seconds=" + rsSeconds + ", overall.seconds=" + overall );
时间:
java -Xms2g -Xmx32g -jar target/JdbcTest-1.0-SNAPSHOT-shaded.jar
Execution completed: driver=Oracle JDBC driver.19.3.0.0.0, fetchSize=100, rows=28705, query.seconds=1, resultSet.seconds=128, overall.seconds=129
Execution completed: driver=Oracle JDBC driver.19.3.0.0.0, fetchSize=1000, rows=28705, query.seconds=5, resultSet.seconds=120, overall.seconds=126
Execution completed: driver=Oracle JDBC driver.19.3.0.0.0, fetchSize=5000, rows=28705, query.seconds=19, resultSet.seconds=89, overall.seconds=108
Execution completed: driver=Oracle JDBC driver.19.3.0.0.0, fetchSize=10000, rows=28705, query.seconds=36, resultSet.seconds=69, overall.seconds=105
Execution completed: driver=Oracle JDBC driver.19.3.0.0.0, fetchSize=25000, rows=28705, query.seconds=92, resultSet.seconds=13, overall.seconds=105
更新-Python代码
Python代码是:
import cx_Oracle
import time
class DataObject:
def __init__(self, columns, row):
index = 0
for key in columns:
value = row[index]
index += 1
setattr(self, key, value)
def __str__(self):
s = ''
for k, v in self.__dict__.items():
s += f'{k}={v}, '
return s
class Oracle:
def __init__(self, args):
self.args = args
def run(self):
start = time.time()
# connect
conn = cx_Oracle.connect(self.args.ora_uid, self.args.ora_pwd, self.args.ora_dsn)
cursor = conn.cursor()
# run the query
cursor.execute(self.args.query)
columns = [column[0] for column in cursor.description]
# fetch results
results = cursor.fetchall()
count = cursor.rowcount
print(f'Query returned {count} rows')
# create objects from the data
dao = []
for r in results:
dao.append(DataObject(columns, r))
# cleanup
cursor.close()
elapsed = time.time() - start
print(f'Query completed. rows={len(dao)}, seconds={elapsed:.2f}')
# output to validate
for d in dao:
print(d)
return(columns, dao)
输出为:
查询返回了28705行
查询已完成。行数=28705,秒数=11.34
驱动程序连接字符串
增加堆大小。记住关闭语句(这也将自动关闭它们各自的结果集)。另外,您的第二次计时计算应该使用
start2
,而不是start
。对于那次计时错误,我很抱歉,我正试图设置一个简单的测试复制案例,但没有成功。它明确地解释了查询与结果集计时的增加,但与Python相比的总体运行时间仍然是一个问题。我将修复该问题,关闭语句,增加堆并重试。谢谢。根据这些评论,我修改了代码和命令来修正时间并增加堆,与Python相比,总体运行时仍然很差。由于大小限制,发布顶级注释。您是否可以添加python代码,因为这是您要比较的基准?
java -Xms2g -Xmx32g -jar target/JdbcTest-1.0-SNAPSHOT-shaded.jar
Execution completed: driver=Oracle JDBC driver.19.3.0.0.0, fetchSize=100, rows=28705, query.seconds=1, resultSet.seconds=128, overall.seconds=129
Execution completed: driver=Oracle JDBC driver.19.3.0.0.0, fetchSize=1000, rows=28705, query.seconds=5, resultSet.seconds=120, overall.seconds=126
Execution completed: driver=Oracle JDBC driver.19.3.0.0.0, fetchSize=5000, rows=28705, query.seconds=19, resultSet.seconds=89, overall.seconds=108
Execution completed: driver=Oracle JDBC driver.19.3.0.0.0, fetchSize=10000, rows=28705, query.seconds=36, resultSet.seconds=69, overall.seconds=105
Execution completed: driver=Oracle JDBC driver.19.3.0.0.0, fetchSize=25000, rows=28705, query.seconds=92, resultSet.seconds=13, overall.seconds=105
import cx_Oracle
import time
class DataObject:
def __init__(self, columns, row):
index = 0
for key in columns:
value = row[index]
index += 1
setattr(self, key, value)
def __str__(self):
s = ''
for k, v in self.__dict__.items():
s += f'{k}={v}, '
return s
class Oracle:
def __init__(self, args):
self.args = args
def run(self):
start = time.time()
# connect
conn = cx_Oracle.connect(self.args.ora_uid, self.args.ora_pwd, self.args.ora_dsn)
cursor = conn.cursor()
# run the query
cursor.execute(self.args.query)
columns = [column[0] for column in cursor.description]
# fetch results
results = cursor.fetchall()
count = cursor.rowcount
print(f'Query returned {count} rows')
# create objects from the data
dao = []
for r in results:
dao.append(DataObject(columns, r))
# cleanup
cursor.close()
elapsed = time.time() - start
print(f'Query completed. rows={len(dao)}, seconds={elapsed:.2f}')
# output to validate
for d in dao:
print(d)
return(columns, dao)
private static final String url = "jdbc:oracle:thin:@(DESCRIPTION=(LOAD_BALANCE=on)(FAILOVER=ON)(ADDRESS=(PROTOCOL=TCP)(HOST=***)(PORT=***))(CONNECT_DATA=(SERVICE_NAME=***)(SERVER=DEDICATED)))";