Oracle-优化查询、大型数据库表、CLOB字段
所以我一直在绞尽脑汁想这个问题,不可否认,我对甲骨文不是很在行。我们有一个表,它包含大约6000万条记录,其中存储了建筑物的值。在我认为合适的地方添加了适当的索引,但性能仍然很差。以下是查询,因为它应该会有所帮助:Oracle-优化查询、大型数据库表、CLOB字段,oracle,clob,Oracle,Clob,所以我一直在绞尽脑汁想这个问题,不可否认,我对甲骨文不是很在行。我们有一个表,它包含大约6000万条记录,其中存储了建筑物的值。在我认为合适的地方添加了适当的索引,但性能仍然很差。以下是查询,因为它应该会有所帮助: SELECT count(*) FROM viewBuildings INNER JOIN tblValues ON viewBuildings.bldg_id = tblValues.bldg_id WHERE bldg_deleted
SELECT count(*)
FROM viewBuildings
INNER JOIN tblValues
ON viewBuildings.bldg_id = tblValues.bldg_id
WHERE bldg_deleted = 0
AND (bldg_summary = 1
OR (bldg_root = 0 AND bldg_def = 0)
OR bldg_parent = 1)
AND field_id IN (207)
AND UPPER(dbms_lob.substr(v_value, 2000, 1)) = UPPER('2320')
因此,上面只是可以构造的查询的一个示例。它在v_值CLOB字段的TBL值中查找“2320”匹配项。它使用大写,因为它可以搜索数字和文本值。tblValues拥有6000万条记录。它由建筑id和字段id索引
SQL_ID d4aq8nsr1p6uw, child number 0
-------------------------------------
SELECT /*+ gather_plan_statistics */ count(*) FROM
viewAssetsForUser1 INNER JOIN tblCurrentValues ON
viewAssetsForUser1.as_id = tblCurrentValues.as_id WHERE as_deleted =
:"SYS_B_0" AND (as_summary = :"SYS_B_1" OR (as_root =
:"SYS_B_2" AND as_asset_def = :"SYS_B_3") OR
as_sub_asset_parent = :"SYS_B_4") AND fe_id IN (:"SYS_B_5")
AND UPPER(dbms_lob.substr(cv_value, :"SYS_B_6", :"SYS_B_7")) =
UPPER(:"SYS_B_8")
Plan hash value: 4033422776
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:08:43.19 | 56589 | 56084 | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:08:43.19 | 56589 | 56084 | | | |
|* 2 | FILTER | | 1 | | 0 |00:08:43.19 | 56589 | 56084 | | | |
| 3 | NESTED LOOPS | | 1 | | 0 |00:08:43.19 | 56589 | 56084 | | | |
| 4 | NESTED LOOPS | | 1 | 115 | 0 |00:08:43.19 | 56589 | 56084 | | | |
|* 5 | FILTER | | 1 | | 0 |00:08:43.19 | 56589 | 56084 | | | |
|* 6 | HASH JOIN RIGHT OUTER | | 1 | 82 | 0 |00:08:43.19 | 56589 | 56084 | 1348K| 1348K| 742K (0)|
| 7 | TABLE ACCESS FULL | TBLASSETSTATUSES | 1 | 4 | 4 |00:00:00.01 | 3 | 0 | | | |
| 8 | NESTED LOOPS | | 1 | | 0 |00:08:43.19 | 56586 | 56084 | | | |
| 9 | NESTED LOOPS | | 1 | 163 | 0 |00:08:43.19 | 56586 | 56084 | | | |
|* 10 | TABLE ACCESS BY INDEX ROWID | TBLCURRENTVALUES | 1 | 163 | 0 |00:08:43.19 | 56586 | 56084 | | | |
|* 11 | INDEX RANGE SCAN | IDX_CURVAL_FE_ID | 1 | 16283 | 61357 |00:00:05.98 | 132 | 132 | | | |
|* 12 | INDEX RANGE SCAN | SAA_1 | 0 | 1 | 0 |00:00:00.01 | 0 | 0 | | | |
|* 13 | TABLE ACCESS BY INDEX ROWID | TBLASSETS | 0 | 1 | 0 |00:00:00.01 | 0 | 0 | | | |
|* 14 | INDEX UNIQUE SCAN | PK_TBLINSPECTORBRIDGEMAP2 | 0 | 1 | 0 |00:00:00.01 | 0 | 0 | | | |
|* 15 | TABLE ACCESS BY GLOBAL INDEX ROWID| TBLINSPECTORASSETMAP | 0 | 1 | 0 |00:00:00.01 | 0 | 0 | | | |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(:SYS_B_0=0)
5 - filter(("TBLASSETSTATUSES"."ASSET_STATUS_HIDE_REPORTS" IS NULL OR "TBLASSETSTATUSES"."ASSET_STATUS_HIDE_REPORTS"=0))
6 - access("TBLASSETSTATUSES"."ASSET_STATUS_ID"="TBLASSETS"."ASSET_STATUS_ID")
10 - filter(UPPER("DBMS_LOB"."SUBSTR"("TBLCURRENTVALUES"."CV_VALUE",:SYS_B_6,:SYS_B_7))=SYS_OP_C2C(UPPER(:SYS_B_8)))
11 - access("TBLCURRENTVALUES"."FE_ID"=:SYS_B_5)
12 - access("TBLASSETS"."AS_DELETED"=:SYS_B_0 AND "TBLASSETS"."AS_ID"="TBLCURRENTVALUES"."AS_ID")
13 - filter((("TBLASSETS"."AS_ROOT"=:SYS_B_2 AND "TBLASSETS"."AS_ASSET_DEF"=:SYS_B_3) OR "TBLASSETS"."AS_SUMMARY"=:SYS_B_1 OR
"TBLASSETS"."AS_SUB_ASSET_PARENT"=:SYS_B_4))
14 - access("TBLASSETS"."AS_ID"="TBLINSPECTORASSETMAP"."AS_ID" AND "TBLINSPECTORASSETMAP"."IN_ID"=1)
15 - filter(("TBLINSPECTORASSETMAP"."IAM_ASSET_ACCESS_LEVEL"=0 OR "TBLINSPECTORASSETMAP"."IAM_ASSET_ACCESS_LEVEL"=1))
我可能需要提供更多的信息,但就统计数据而言,我得到的数据是“一致的”。一致gets=74069。这是一个很大的数字吗
任何建议都很好,主要是在处理大型数据库表上的CLOB字段时。无法使用上下文类型索引,因为我需要精确匹配,并且正在查找的数据可以是数字或字符串
编辑(更多信息):
tblBuildings是viewBuildings(视图)的一部分,拥有80000条记录
t左值具有每栋建筑的值,具有68000000条记录
tblValues每个建筑大约有550个字段(字段id)
所需结果:查询以在<5秒内返回结果。这不合理吗?有时它会无限期地运行,有时可能是80秒
解释计划结果
Plan hash value: 1480138519
-----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------------------------|
| 0 | SELECT STATEMENT | | 1 | 192 | 32 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 192 | | |
| 2 | NESTED LOOPS | | 1 | 192 | 15 (0)| 00:00:01 |
| 3 | NESTED LOOPS | | 1 | 183 | 12 (0)| 00:00:01 |
|* 4 | FILTER | | | | | |
| 5 | NESTED LOOPS OUTER | | 1 | 64 | 10 (0)| 00:00:01 |
|* 6 | TABLE ACCESS BY INDEX ROWID | TBLBUILDINGS | 1 | 60 | 9 (0)| 00:00:01 |
|* 7 | INDEX RANGE SCAN | SAA_4 | 17 | | 3 (0)| 00:00:01 |
| 8 | NESTED LOOPS | | 1 | 21 | 3 (0)| 00:00:01 |
| 9 | TABLE ACCESS BY INDEX ROWID| TBLBUILDINGSTATUSES | 1 | 15 | 2 (0)| 00:00:01 |
|* 10 | INDEX RANGE SCAN | IDX_BUILDINGSTATUS_EXCLUDEQUERY | 1 | | 1 (0)| 00:00:01 |
|* 11 | INDEX RANGE SCAN | IDX_BUILDING_STATUS_ASID_DELETED | 1 | 6 | 1 (0)| 00:00:01 |
| 12 | TABLE ACCESS BY INDEX ROWID | TBLBUILDINGSTATUSES | 1 | 4 | 1 (0)| 00:00:01 |
|* 13 | INDEX UNIQUE SCAN | PK_TBLBUILDINGSTATUS | 1 | | 0 (0)| 00:00:01 |
|* 14 | TABLE ACCESS BY INDEX ROWID | TBLVALUES | 1 | 119 | 2 (0)| 00:00:01 |
|* 15 | INDEX UNIQUE SCAN | PK_SAA_6 | 1 | | 1 (0)| 00:00:01 |
| 16 | INLIST ITERATOR | | | | | |
|* 17 | INDEX RANGE SCAN | SAA_7 | 1 | 9 | 3 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
4 - filter("TBLBUILDINGSTATUSES"."BUILDING_STATUS_HIDE_REPORTS" IS NULL OR
"TBLBUILDINGSTATUSES"."BUILDING_STATUS_HIDE_REPORTS"=0)
6 - filter("TBLBUILDINGS"."BLDG_SUMMARY"=1 OR "TBLBUILDINGS"."BLDG_SUB_BUILDING_PARENT"=1 OR
"TBLBUILDINGS"."BLDG_BUILDING_DEF"=0 AND "TBLBUILDINGS"."BLDG_ROOT"=0)
7 - access("TBLBUILDINGS"."BLDG_DELETED"=0)
filter( NOT EXISTS (SELECT 0 FROM "TBLBUILDINGSTATUSES" "TBLBUILDINGSTATUSES","TBLBUILDINGS" "TBLBUILDINGS" WHERE
"TBLBUILDINGS"."BLDG_ID"=:B1 AND "TBLBUILDINGSTATUSES"."BUILDING_STATUS_ID"="TBLBUILDINGS"."BUILDING_STATUS_ID" AND
"TBLBUILDINGSTATUSES"."BUILDING_STATUS_EXCLUDE_QUERY"=1))
10 - access("TBLBUILDINGSTATUSES"."BUILDING_STATUS_EXCLUDE_QUERY"=1)
11 - access("TBLBUILDINGS"."BLDG_ID"=:B1 AND "TBLBUILDINGSTATUSES"."BUILDING_STATUS_ID"="TBLBUILDINGS"."BUILDING_STATUS_ID")
filter("TBLBUILDINGSTATUSES"."BUILDING_STATUS_ID"="TBLBUILDINGS"."BUILDING_STATUS_ID")
13 - access("TBLBUILDINGSTATUSES"."BUILDING_STATUS_ID"(+)="TBLBUILDINGS"."BUILDING_STATUS_ID")
14 - filter(UPPER("DBMS_LOB"."SUBSTR"("TBLVALUES"."V_VALUE",2000,1))=U'2320')
15 - access("TBLVALUES"."FE_ID"=207 AND "TBLBUILDINGS"."BLDG_ID"="TBLVALUES"."BLDG_ID")
17 - access("TBLINSPECTORBUILDINGMAP"."IN_ID"=1 AND ("TBLINSPECTORBUILDINGMAP"."IAM_BUILDING_ACCESS_LEVEL"=0 OR
"TBLINSPECTORBUILDINGMAP"."IAM_BUILDING_ACCESS_LEVEL"=1) AND "TBLBUILDINGS"."BLDG_ID"="TBLINSPECTORBUILDINGMAP"."BLDG_ID")
44 rows selected
Plan hash value: 2137789089
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8168 | 16336 | 29 (0)| 00:00:01 |
| 1 | COLLECTION ITERATOR PICKLER FETCH| DISPLAY | 8168 | 16336 | 29 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
好的,我按照你的建议收集了统计数据,然后这是计划表输出。看起来IDX_曲线_FE_ID是这里的问题吗?这是字段id的值表上的索引
SQL_ID d4aq8nsr1p6uw, child number 0
-------------------------------------
SELECT /*+ gather_plan_statistics */ count(*) FROM
viewAssetsForUser1 INNER JOIN tblCurrentValues ON
viewAssetsForUser1.as_id = tblCurrentValues.as_id WHERE as_deleted =
:"SYS_B_0" AND (as_summary = :"SYS_B_1" OR (as_root =
:"SYS_B_2" AND as_asset_def = :"SYS_B_3") OR
as_sub_asset_parent = :"SYS_B_4") AND fe_id IN (:"SYS_B_5")
AND UPPER(dbms_lob.substr(cv_value, :"SYS_B_6", :"SYS_B_7")) =
UPPER(:"SYS_B_8")
Plan hash value: 4033422776
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:08:43.19 | 56589 | 56084 | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:08:43.19 | 56589 | 56084 | | | |
|* 2 | FILTER | | 1 | | 0 |00:08:43.19 | 56589 | 56084 | | | |
| 3 | NESTED LOOPS | | 1 | | 0 |00:08:43.19 | 56589 | 56084 | | | |
| 4 | NESTED LOOPS | | 1 | 115 | 0 |00:08:43.19 | 56589 | 56084 | | | |
|* 5 | FILTER | | 1 | | 0 |00:08:43.19 | 56589 | 56084 | | | |
|* 6 | HASH JOIN RIGHT OUTER | | 1 | 82 | 0 |00:08:43.19 | 56589 | 56084 | 1348K| 1348K| 742K (0)|
| 7 | TABLE ACCESS FULL | TBLASSETSTATUSES | 1 | 4 | 4 |00:00:00.01 | 3 | 0 | | | |
| 8 | NESTED LOOPS | | 1 | | 0 |00:08:43.19 | 56586 | 56084 | | | |
| 9 | NESTED LOOPS | | 1 | 163 | 0 |00:08:43.19 | 56586 | 56084 | | | |
|* 10 | TABLE ACCESS BY INDEX ROWID | TBLCURRENTVALUES | 1 | 163 | 0 |00:08:43.19 | 56586 | 56084 | | | |
|* 11 | INDEX RANGE SCAN | IDX_CURVAL_FE_ID | 1 | 16283 | 61357 |00:00:05.98 | 132 | 132 | | | |
|* 12 | INDEX RANGE SCAN | SAA_1 | 0 | 1 | 0 |00:00:00.01 | 0 | 0 | | | |
|* 13 | TABLE ACCESS BY INDEX ROWID | TBLASSETS | 0 | 1 | 0 |00:00:00.01 | 0 | 0 | | | |
|* 14 | INDEX UNIQUE SCAN | PK_TBLINSPECTORBRIDGEMAP2 | 0 | 1 | 0 |00:00:00.01 | 0 | 0 | | | |
|* 15 | TABLE ACCESS BY GLOBAL INDEX ROWID| TBLINSPECTORASSETMAP | 0 | 1 | 0 |00:00:00.01 | 0 | 0 | | | |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(:SYS_B_0=0)
5 - filter(("TBLASSETSTATUSES"."ASSET_STATUS_HIDE_REPORTS" IS NULL OR "TBLASSETSTATUSES"."ASSET_STATUS_HIDE_REPORTS"=0))
6 - access("TBLASSETSTATUSES"."ASSET_STATUS_ID"="TBLASSETS"."ASSET_STATUS_ID")
10 - filter(UPPER("DBMS_LOB"."SUBSTR"("TBLCURRENTVALUES"."CV_VALUE",:SYS_B_6,:SYS_B_7))=SYS_OP_C2C(UPPER(:SYS_B_8)))
11 - access("TBLCURRENTVALUES"."FE_ID"=:SYS_B_5)
12 - access("TBLASSETS"."AS_DELETED"=:SYS_B_0 AND "TBLASSETS"."AS_ID"="TBLCURRENTVALUES"."AS_ID")
13 - filter((("TBLASSETS"."AS_ROOT"=:SYS_B_2 AND "TBLASSETS"."AS_ASSET_DEF"=:SYS_B_3) OR "TBLASSETS"."AS_SUMMARY"=:SYS_B_1 OR
"TBLASSETS"."AS_SUB_ASSET_PARENT"=:SYS_B_4))
14 - access("TBLASSETS"."AS_ID"="TBLINSPECTORASSETMAP"."AS_ID" AND "TBLINSPECTORASSETMAP"."IN_ID"=1)
15 - filter(("TBLINSPECTORASSETMAP"."IAM_ASSET_ACCESS_LEVEL"=0 OR "TBLINSPECTORASSETMAP"."IAM_ASSET_ACCESS_LEVEL"=1))
坏指数成本如果统计数据是新的,并且优化器有一个相对较好的基数估计,为什么它会选择一个坏计划?也许有一个参数使索引看起来人为地便宜。看看:
select*from v$参数,其中name位于('optimizer\u index\u cost\u adj','optimizer\u index\u caching')代码>它们是否与默认值100和0显著不同
另外,看看sys.aux_stats$中的select*代码>可能您的系统统计数据使完整表扫描看起来太昂贵了。Oracle的某些版本在工作负载统计数据方面存在缺陷,其中的数字错误了几个数量级
或者您的表太大了,16K索引读取是最好的访问路径。查看DBA_SEGMENTS.BYTES
以查找表和LOB段的大小
即使表是中等大小的,并且计划更改为完整表扫描,也可能无法将运行时间缩短到5秒以内。但结合您的分区想法,这可能就足够了
LOB存储举个例子,我假设大多数CLOB都相对较小?可能您有一个不寻常的LOB设置,它会浪费大量空间,例如禁用行中的存储
。您可能需要检查您的表DDL,或将其全部发布在此处。或者,如果您可以用VARCHAR2替换CLOB,那就更好了
FBI基于CLOB的函数索引可能会显著加快速度。但它可能是一个非常大的索引:create index TBLCURRENTVALUES\u FBI on TBLCURRENTVALUES(UPPER(dbms_lob.substr(v_value,2000,1))代码>
光标共享查询有点变化,这使得调整困难。看起来这个最新版本有CURSOR\u SHARING=FORCE
,这是不寻常的。对于昂贵的查询,使用文本可能是一件好事——花在构建查询计划上的额外时间可能是值得的。如果系统参数无法更改,请查看提示/*+光标\u共享\u精确的*/
您可以进行任意数量的优化,但最终导致问题的是大量数据。当您在OEM
上执行查询并在性能图上跟踪它时,您会发现大部分时间将花在IO上。这就是从内存中获取数据
那么解决方案是什么呢:将表分区。每当数据量很大时,您应该将表分区,以便只处理相关数据。
为了对表进行分区,您需要一些点来隔离数据,并查看您的数据,它可以是构建id
您可以通过以下url了解更多信息:
分区还提供了许多其他特性,比如本地索引,它们有助于进一步优化查询
如果您一直在处理整个大型表数据,分区将不是一个解决方案,但这会给数据库模式打上一个问号
因此,是的,查询优化将有所帮助,但由于数据很大,您也应该评估表分区。74069一致性gets意味着查询可能读取578 MB的数据。但这并不能告诉我们很多。该数字可能过高或过低。首先,您对这个查询的期望是什么?它是否返回一小部分您希望几乎立即显示的行,但需要X秒?我们还需要看看解释计划。发布结果:解释[您的查询]的计划
然后从表中选择*(dbms\u xplan.display)代码>。它可以返回0到17000行之间的任意位置,具体取决于用户编写查询的方式。我希望我们能在5秒内达到这个结果,但考虑到尺寸,我不知道这是否现实。但可能有68000000行,但您也只在给定字段上搜索,因此这些结果应该更窄。我用解释计划结果更新了原来的帖子。如果你需要更多,请告诉我。谢谢你更新问题。有很多非常低的基数估计,行=1。在查询上花费太多时间之前,您可能需要重新收集统计信息,以确保优化器具有最新的信息