Oracle 为计数（*）查询性能设计sql索引_Oracle_Count_Indexing_Oracle10g_Olap

Oracle 为计数（*）查询性能设计sql索引

oracle indexing oracle10g

Oracle 为计数（*）查询性能设计sql索引,oracle,count,indexing,oracle10g,olap,Oracle,Count,Indexing,Oracle10g,Olap,大家好：）我正在构建一个工具，在Oracle 10g数据库上进行一些卷采样。以下是查询： SELECT count(*) FROM product JOIN customer ON product.CUSTOMER_ID = customer.ID WHERE ( product.CATEGORY = 'some first category criteria' AND customer.REGION = 'some first region criteria' AND ..

大家好：）我正在构建一个工具，在Oracle 10g数据库上进行一些卷采样。以下是查询：

SELECT count(*) 
FROM product
JOIN customer ON product.CUSTOMER_ID = customer.ID
WHERE 
 (    product.CATEGORY = 'some first category criteria'
  AND customer.REGION = 'some first region criteria'
  AND ...)
 OR
 (    product.CATEGORY = 'some second category criteria'
  AND customer.REGION = 'some second region criteria'
  AND ...)
 OR ...

从这个查询中我只需要做计数。问题是数量很大：每个表上大约有3000万行，我希望这个查询能够响应

到目前为止，在

customer（，customer\u ID）

上建立复合索引有很大帮助。我认为在执行索引筛选操作后，将

加入
每个（…和…和…
块预计包含大约50000行。搜索条件中使用的列都具有大小约为1000个值的值集
我想知道我能实现什么方法，因为我只做count（*）
s，特别是因为Oracle有一个内置的OLAP模块（和一个多维数据集操作？）。此外，我相信，经过深思熟虑的索引和提示，事情会有很大改善
您将如何设计它？
这看起来是一个很好的候选：
位图索引主要设计用于数据仓库或
查询引用临时数据库中许多列的环境
时尚可能需要位图索引的情况包括：
索引列的基数较低，即
与表行数相比，distinct值较小
索引表是只读的或不受显著影响
DML语句的修改
具体来说，位图连接索引在这里可能是理想的。手册中的示例甚至与您的数据模型相匹配。我尝试在下面重新创建您的模型和数据，位图连接索引的运行速度似乎比其他解决方案快几个数量级
样本数据
--Create tables
create table customer
(
    customer_id number,
    region      varchar2(100) not null
) nologging;

create table product
(
    product_id  number,
    customer_id number not null,
    category    varchar2(100) not null
) nologging;


--Load 30M rows, 1M rows at a time.  Takes about 6 minutes.
begin
    for i in 1 .. 30 loop
        insert /*+ append */ into customer
        select (1000000*i)+level, 'Region '||trunc(dbms_random.value(1, 1000))
        from dual connect by level <= 1000000;
        commit;

        insert /*+ append */ into product
        select (1000000*i)+level, (1000000*i)+level
            ,'Category '||trunc(dbms_random.value(1, 1000))
        from dual connect by level <= 1000000;
        commit;
    end loop;
end;
/

--Add primary keys and foreign key constraints.
alter table customer add constraint customer_pk primary key (customer_id);
alter table product add constraint product_pk primary key (product_id);
alter table product add constraint product_customer_fk
    foreign key (customer_id) references customer(customer_id);

--Gather stats
begin
    dbms_stats.gather_table_stats(user, 'CUSTOMER');
    dbms_stats.gather_table_stats(user, 'PRODUCT');
end;
/

B树索引-仍然很慢
计划改变了，但性能保持不变。我认为这可能是因为我的示例是一个最坏的索引场景，其中数据是真正随机的
create index customer_idx on customer(region);
create index product_idx on product(category);

begin
    dbms_stats.gather_table_stats(user, 'CUSTOMER');
    dbms_stats.gather_table_stats(user, 'PRODUCT');
end;
/

位图索引-稍微好一点
这会稍微提高性能，约为61秒
drop index customer_idx;
drop index product_idx;

create bitmap index customer_bidx on customer(region);
create bitmap index product_bidx on product(category);

begin
    dbms_stats.gather_table_stats(user, 'CUSTOMER');
    dbms_stats.gather_table_stats(user, 'PRODUCT');
end;
/

位图连接索引-非常快
现在查询几乎立即返回结果，我的IDE将其计算为0秒
drop index customer_idx;
drop index product_idx;

create bitmap index customer_product_bjix
on product(product.category, customer.region)
FROM product, customer
where product.CUSTOMER_ID = customer.customer_id;

begin
    dbms_stats.gather_table_stats(user, 'CUSTOMER');
    dbms_stats.gather_table_stats(user, 'PRODUCT');
end;
/

指数成本
位图连接索引的创建时间比b树或位图索引稍长。
与位图或位图连接索引相比，b树索引非常大
select segment_name, bytes/1024/1024 MB
from dba_segments
where segment_name in ('CUSTOMER_IDX', 'PRODUCT_IDX'
    ,'CUSTOMER_BIDX', 'PRODUCT_BIDX',  'CUSTOMER_PRODUCT_BJIX');


SEGMENT_NAME            MB
------------            --
CUSTOMER_IDX            726
PRODUCT_IDX             792
CUSTOMER_BIDX            88
PRODUCT_BIDX             96
CUSTOMER_PRODUCT_BJIX   184

查询样式
这不会影响性能，但您可以按如下方式缩小查询：
SELECT count(*) 
FROM product
JOIN customer ON product.CUSTOMER_ID = customer.customer_id
WHERE (product.category, customer.region)
    in (('Category 1', 'Region 1'),
        ('Category 2', 'Region 2'),
        ('Category 888', 'Region 888'));

索引不是免费的。我不会在这些大表上添加新索引，只是为了支持您的count应用程序。此外，这些计数需要有多新鲜？@tbone两个列上的数据每天最多只刷新一次。因此，一些预先计算可以在晚上进行。这可能是你的答案。使用包含所需计数的简单物化视图进行预LC。然后将你的应用程序指向mat视图，每天在下班时间刷新。@t一个问题是，每个标准大约有1000种可能性。使用5个搜索条件，即1000^5个不同的案例进行统计：/n您仍然可以对其进行预分析，您确实在做DW/分析工作。您可能不想整天不断地对生产表运行实时查询。发布您的表结构和一个示例查询我认为您只是在考虑查询的性能。位图对于具有中等DML活动的表通常都是坏消息。海报没有透露的是该公司是如何使用该表的（不仅仅是这一特定需求）。我见过太多的表，有大量的索引（位图和其他），因为大多数开发人员只考虑他们自己的即时需求（而在添加它们之前，公司几乎没有任何明智的检查）。不管怎么说，你都是正确的，位图索引和DML都有问题。根据“两个列上的数据每天最多只刷新一次”的评论，应该可以建立一个流程来避免这些问题。可能很简单，删除索引，修改表，然后重新创建索引。我想他指的是计数的新鲜度。我怀疑他打的桌子是经常使用的关键桌子，并且DML活动性很高。不管怎样，我觉得我现在太在乎了；-）@tbone啊，在这种情况下，你对物化视图的想法可能会最有效，也许在它上面有一个常规的位图索引。BenoitParis-您能否澄清表格是否每天更新一次，或者您是否只希望计数每天更新一次？
SELECT count(*) 
FROM product
JOIN customer ON product.CUSTOMER_ID = customer.customer_id
WHERE (product.category, customer.region)
    in (('Category 1', 'Region 1'),
        ('Category 2', 'Region 2'),
        ('Category 888', 'Region 888'));