Sql 在两个表之间以增量方式关联项
我有两张桌子。 第一个包含一些激活,第二个包含一些停用 我必须使用以下规则将一次停用与一次激活关联: 激活必须在停用之前进行,但不得早于 92天。 无法关联已与deact关联的激活 再一次 因此,使用一些数据:Sql 在两个表之间以增量方式关联项,sql,oracle,oracle11g,Sql,Oracle,Oracle11g,我有两张桌子。 第一个包含一些激活,第二个包含一些停用 我必须使用以下规则将一次停用与一次激活关联: 激活必须在停用之前进行,但不得早于 92天。 无法关联已与deact关联的激活 再一次 因此,使用一些数据: --a activations, b - deactivations create table a (id1 integer, date1 date); create table b (id2 integer, date2 date); insert into a values (1,
--a activations, b - deactivations
create table a (id1 integer, date1 date);
create table b (id2 integer, date2 date);
insert into a values (1, '1-Feb-2013');
insert into a values (2, '2-Feb-2013');
insert into a values (3, '3-Feb-2013');
insert into a values (4, '1-Mar-2013');
insert into a values (5, '2-Mar-2013');
insert into a values (6, '1-May-2013');
insert into a values (7, '19-May-2013');
insert into b values (1, '1-May-2013');
insert into b values (2, '1-May-2013');
insert into b values (3, '15-May-2013');
insert into b values (4, '16-May-2013');
insert into b values (5, '17-May-2013');
insert into b values (6, '18-May-2013');
期望输出:
id1 date1 id2 date2
1 February, 01 2013 00:00:00+0000 1 May, 01 2013 00:00:00+0000 1 1
2 February, 02 2013 00:00:00+0000 2 May, 01 2013 00:00:00+0000 2 2
4 March, 01 2013 00:00:00+0000 3 May, 15 2013 00:00:00+0000 4 3
5 March, 02 2013 00:00:00+0000 4 May, 16 2013 00:00:00+0000 5 4
6 May, 01 2013 00:00:00+0000 5 May, 17 2013 00:00:00+0000 6 5
生成候选人的查询将是:
select id1, date1, id2, date2
from a
join b
on a.date1 >= b.date2 - 91
and b.date2 >= a.date1;
我成功地使用connect by创建了正确的查询,但是速度太慢了,我有数以百万计的客户端,每个客户端有数千个设备的激活和停用。该示例适用于一个客户机
with chrn as
(
select id1, date1, id2, date2,
dense_rank() over ( order by date1, id1) as act_ord,
dense_rank() over ( order by date2, id2) as deact_ord
from a
join b
on a.date1 >= b.date2 - 91
and b.date2 >= a.date1
)
select *
from (
select s.*, row_number() over (partition by lvl order by act_ord+deact_ord) as rnk
from (
select a1.*, level lvl
from chrn a1
connect by
prior deact_ord < deact_ord and
prior act_ord < act_ord and
(prior deact_ord = deact_ord - 1 or prior act_ord = act_ord - 1)
start with deact_ord = 1 and act_ord = 1
)s
)where rnk =1
;
我想找到一个更快的解决方案,也许只使用解析函数。递归查询速度太慢,原因是候选项和路径数量太多。或者我没有成功地减少候选项和路径的数量。您的需求不能随着记录数量的增加而很好地扩展,因为必须找到前面的所有对才能找到下一对 当然,只要你只做一次,就没办法了。但是,如果您必须经常找到新的配对,我强烈建议在表1和表1中添加一个deact_id 试试这个:
CREATE TABLE A ( ID1 INTEGER,
DATE1 DATE );
CREATE TABLE B ( ID2 INTEGER,
DATE2 DATE );
INSERT INTO
A
VALUES
( 1,
'1-Feb-2013' );
INSERT INTO
A
VALUES
( 2,
'2-Feb-2013' );
INSERT INTO
A
VALUES
( 3,
'3-Feb-2013' );
INSERT INTO
A
VALUES
( 4,
'1-Mar-2013' );
INSERT INTO
A
VALUES
( 5,
'2-Mar-2013' );
INSERT INTO
A
VALUES
( 6,
'1-May-2013' );
INSERT INTO
A
VALUES
( 7,
'19-May-2013' );
INSERT INTO
B
VALUES
( 1,
'1-May-2013' );
INSERT INTO
B
VALUES
( 2,
'1-May-2013' );
INSERT INTO
B
VALUES
( 3,
'15-May-2013' );
INSERT INTO
B
VALUES
( 4,
'16-May-2013' );
INSERT INTO
B
VALUES
( 5,
'17-May-2013' );
INSERT INTO
B
VALUES
( 6,
'18-May-2013' );
COMMIT;
BEGIN
DBMS_STATS.SET_TABLE_STATS ( OWNNAME => 'REALSPIRITUALS',
TABNAME => 'A',
NUMROWS => 100000000 );
END;
/
BEGIN
DBMS_STATS.SET_TABLE_STATS ( OWNNAME => 'REALSPIRITUALS',
TABNAME => 'B',
NUMROWS => 100000000 );
END;
/
你的问题
新查询
id是否与激活和停用之间的关系有关?不,在我的示例中,它只是一行的标识符。我有另一个关键点。首先,你们不在这里使用分区,请使用秩而不是密集秩,两者将给出相同的结果,但秩将执行25%更好的给予或接受。在查询中查找进一步的更改,您可以尝试更改此项谢谢,但我发现问题在于connect by的行数爆炸,而row_number可能是connect后产生的记录数巨大。有趣的idea。事实上,我无法更改facts选项卡,但这是以编程方式完成此操作的理想方法,而无需使用纯sql。对于每一个新的断开连接,我都会发现第一个激活尚未分配。对不起,这将不会分配任何内容给deact no 3,因为deact_ord=act_ord。正确的算法将使3号法令与4号法令相匹配。看见
CREATE TABLE A ( ID1 INTEGER,
DATE1 DATE );
CREATE TABLE B ( ID2 INTEGER,
DATE2 DATE );
INSERT INTO
A
VALUES
( 1,
'1-Feb-2013' );
INSERT INTO
A
VALUES
( 2,
'2-Feb-2013' );
INSERT INTO
A
VALUES
( 3,
'3-Feb-2013' );
INSERT INTO
A
VALUES
( 4,
'1-Mar-2013' );
INSERT INTO
A
VALUES
( 5,
'2-Mar-2013' );
INSERT INTO
A
VALUES
( 6,
'1-May-2013' );
INSERT INTO
A
VALUES
( 7,
'19-May-2013' );
INSERT INTO
B
VALUES
( 1,
'1-May-2013' );
INSERT INTO
B
VALUES
( 2,
'1-May-2013' );
INSERT INTO
B
VALUES
( 3,
'15-May-2013' );
INSERT INTO
B
VALUES
( 4,
'16-May-2013' );
INSERT INTO
B
VALUES
( 5,
'17-May-2013' );
INSERT INTO
B
VALUES
( 6,
'18-May-2013' );
COMMIT;
BEGIN
DBMS_STATS.SET_TABLE_STATS ( OWNNAME => 'REALSPIRITUALS',
TABNAME => 'A',
NUMROWS => 100000000 );
END;
/
BEGIN
DBMS_STATS.SET_TABLE_STATS ( OWNNAME => 'REALSPIRITUALS',
TABNAME => 'B',
NUMROWS => 100000000 );
END;
/
SET AUTOTRACE ON
WITH CHRN
AS (SELECT
ID1,
DATE1,
ID2,
DATE2,
DENSE_RANK ( )
OVER ( ORDER BY
DATE1,
ID1 )
AS ACT_ORD,
DENSE_RANK ( )
OVER ( ORDER BY
DATE2,
ID2 )
AS DEACT_ORD
FROM
A
JOIN
B
ON A.DATE1 >= B.DATE2
- 91
AND B.DATE2 >= A.DATE1)
SELECT
*
FROM
(SELECT
S.*,
ROW_NUMBER ( )
OVER ( PARTITION BY LVL
ORDER BY
ACT_ORD
+ DEACT_ORD )
AS RNK
FROM
(SELECT
A1.*,
LEVEL LVL
FROM
CHRN A1
CONNECT BY
PRIOR DEACT_ORD < DEACT_ORD
AND PRIOR ACT_ORD < ACT_ORD
AND ( PRIOR DEACT_ORD = DEACT_ORD
- 1
OR PRIOR ACT_ORD = ACT_ORD
- 1 )
START WITH
DEACT_ORD = 1
AND ACT_ORD = 1) S)
WHERE
RNK = 1;
ID1 DATE1 ID2 DATE2 ACT_ORD DEACT_ORD LVL RNK
---------- --------- ---------- --------- ---------- ---------- ---------- ----------
1 01-FEB-13 1 01-MAY-13 1 1 1 1
2 02-FEB-13 2 01-MAY-13 2 2 2 1
4 01-MAR-13 3 15-MAY-13 4 3 3 1
5 02-MAR-13 4 16-MAY-13 5 4 4 1
6 01-MAY-13 5 17-MAY-13 6 5 5 1
5 rows selected.
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer Mode=ALL_ROWS (Cost=16 G Card=25000 G Bytes=2235174G)
1 0 TEMP TABLE TRANSFORMATION
2 1 LOAD AS SELECT
3 2 WINDOW SORT (Cost=7 G Card=25000 G Bytes=1024454G)
4 3 WINDOW SORT (Cost=7 G Card=25000 G Bytes=1024454G)
5 4 MERGE JOIN (Cost=2 G Card=25000 G Bytes=1024454G)
6 5 SORT JOIN (Cost=667123 Card=100 M Bytes=2G)
7 6 TABLE ACCESS FULL SRINIV.A (Cost=770 Card=100 M Bytes=2G)
8 5 FILTER
9 8 SORT JOIN (Cost=667123 Card=100 M Bytes=2G)
10 9 TABLE ACCESS FULL SRINIV.B (Cost=770 Card=100 M Bytes=2G)
11 1 VIEW (Cost=9 G Card=25000 G Bytes=2235174G)
12 11 WINDOW SORT PUSHED RANK (Cost=9 G Card=25000 G Bytes=1932494G)
13 12 VIEW (Cost=887 M Card=25000 G Bytes=1932494G)
14 13 CONNECT BY NO FILTERING WITH START-WITH
15 14 COUNT
16 15 VIEW (Cost=887 M Card=25000 G Bytes=1629814G)
17 16 TABLE ACCESS FULL SYS.SYS_TEMP_0FD9D6820_3AD00CE0 (Cost=887 M Card=25000 G Bytes=1024454G)
Statistics
----------------------------------------------------------
2 recursive calls
0 spare statistic 3
0 gcs messages sent
7 db block gets from cache
0 physical reads direct (lob)
0 queue position update
0 queue single row
0 queue ocp pages
0 HSC OLTP Compressed Blocks
0 HSC IDL Compressed Blocks
5 rows processed
SET AUTOTRACE ON
WITH CHRN
AS (SELECT
ID1,
DATE1,
ID2,
DATE2,
RANK ( )
OVER ( ORDER BY
DATE1,
ID1 )
AS ACT_ORD,
RANK ( )
OVER ( ORDER BY
DATE2,
ID2 )
AS DEACT_ORD
FROM
A,
B
WHERE
DATE2
- DATE1 < 92
AND ID1 = ID2)
SELECT
*
FROM
(SELECT
S.*,
ROW_NUMBER ( )
OVER ( PARTITION BY LVL
ORDER BY
ACT_ORD
+ DEACT_ORD )
AS RNK
FROM
(SELECT
A1.*,
LEVEL LVL
FROM
CHRN A1
CONNECT BY
PRIOR DEACT_ORD < DEACT_ORD
AND PRIOR ACT_ORD < ACT_ORD
AND ( PRIOR DEACT_ORD = DEACT_ORD
- 1
OR PRIOR ACT_ORD = ACT_ORD
- 1 )
START WITH
DEACT_ORD = 1
AND ACT_ORD = 1) S)
WHERE
RNK = 1;
ID1 DATE1 ID2 DATE2 ACT_ORD DEACT_ORD LVL RNK
---------- --------- ---------- --------- ---------- ---------- ---------- ----------
1 01-FEB-13 1 01-MAY-13 1 1 1 1
2 02-FEB-13 2 01-MAY-13 2 2 2 1
4 01-MAR-13 3 15-MAY-13 4 3 3 1
5 02-MAR-13 4 16-MAY-13 5 4 4 1
6 01-MAY-13 5 17-MAY-13 6 5 5 1
5 rows selected.
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer Mode=ALL_ROWS (Cost=538808 Card=5 M Bytes=457 M)
1 0 TEMP TABLE TRANSFORMATION
2 1 LOAD AS SELECT
3 2 WINDOW SORT (Cost=436441 Card=5 M Bytes=209 M)
4 3 WINDOW SORT (Cost=436441 Card=5 M Bytes=209 M)
5 4 HASH JOIN (Cost=324556 Card=5 M Bytes=209 M)
6 5 TABLE ACCESS FULL REALSPIRITUALS.A (Cost=770 Card=100 M Bytes=2G)
7 5 TABLE ACCESS FULL REALSPIRITUALS.B (Cost=770 Card=100 M Bytes=2G)
8 1 VIEW (Cost=102367 Card=5 M Bytes=457 M)
9 8 WINDOW SORT PUSHED RANK (Cost=102367 Card=5 M Bytes=395 M)
10 9 VIEW (Cost=5816 Card=5 M Bytes=395 M)
11 10 CONNECT BY NO FILTERING WITH START-WITH
12 11 COUNT
13 12 VIEW (Cost=5816 Card=5 M Bytes=333 M)
14 13 TABLE ACCESS FULL SYS.SYS_TEMP_0FD9D6822_3AD00CE0 (Cost=5816 Card=5 M Bytes=209 M)
Statistics
----------------------------------------------------------
2 recursive calls
0 spare statistic 3
0 gcs messages sent
7 db block gets from cache
0 physical reads direct (lob)
0 queue position update
0 queue single row
0 queue ocp pages
0 HSC OLTP Compressed Blocks
0 HSC IDL Compressed Blocks
5 rows processed