<< 返回文章列表 在 MogDB 和 openGauss 中,参数rewrite_rule用于控制查询重写,本文则介绍查询重写规则uniquecheck。uniquecheck表示提升无agg的where子查询。

现在有如下例子:
从执行计划上可以看到:
现在设置rewrite_rule=uniquecheck,我们再来看一下执行计划:
设置rewrite_rule=uniquecheck之后,子查询被提升了,t1和t2走了hash join,整个SQL耗时10秒,性能较好。
执行计划中Unique Check Required表示检查子查询JOIN列(t2.object_id)是否唯一,如果子查询JOIN列(t2.object_id)不唯一,SQL会报错:
如果子查询有agg函数,无需设置查询重写参数,可以自动提升:
MogDB 数据库查询重写规则uniquecheck详解
2024年7月4日
罗炳森
35
参考示例

orcl=> explain analyze select count(*)
from test01 t1
where t1.data_object_id =
(select data_object_id
from test02 t2
where t1.object_id = t2.object_id);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=369500968.48..369500968.49 rows=1 width=20) (actual time=122494.785..122494.785 rows=1 loops=1)
-> Seq Scan on test01 t1 (cost=0.00..369500411.84 rows=222656 width=12) (actual time=0.912..121981.180 rows=4696064 loops=1)
Filter: (data_object_id = (SubPlan 1))
Rows Removed by Filter: 39841280
SubPlan 1
-> Index Scan using idx_test02_objectid on test02 t2 (cost=0.00..8.27 rows=1 width=6) (actual time=94019.753..99841.735 rows=44537344 loops=44537344)
Index Cond: (t1.object_id = object_id)
Total runtime: 122494.942 ms
(8 rows)
-
test01 t1是大表,有4000多万行数据; -
子查询test02 t2没有被提升(Oracle的说法是没有被展开),走了filter,被扫描了4000多万次; -
整个SQL查询耗时122秒,性能较差。
orcl=> explain analyze select *+ set(rewrite_rule uniquecheck) */ count(*)
from test01 t1
where t1.data_object_id =
(select data_object_id
from test02 t2
where t1.object_id = t2.object_id);
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1467229.99..1467230.00 rows=1 width=8) (actual time=10758.577..10758.577 rows=1 loops=1)
-> Hash Join (cost=5660.88..1467225.97 rows=1608 width=0) (actual time=66.574..10424.858 rows=4696064 loops=1)
Hash Cond: ((t1.object_id = subquery."?column?") AND (t1.data_object_id = subquery.data_object_id))
-> Seq Scan on test01 t1 (cost=0.00..1227776.53 rows=44531153 width=12) (actual time=0.003..3561.282 rows=44537344 loops=1)
-> Hash (cost=4356.08..4356.08 rows=86987 width=12) (actual time=65.462..65.462 rows=9172 loops=1)
Buckets: 131072 Batches: 1 Memory Usage: 1427kB
-> Subquery Scan on subquery (cost=2616.34..4356.08 rows=86987 width=12) (actual time=33.027..54.531 rows=86987 loops=1)
-> HashAggregate (cost=2616.34..3486.21 rows=86987 width=12) (actual time=33.025..47.518 rows=86987 loops=1)
Group By Key: t2.object_id
Unique Check Required
-> Seq Scan on test02 t2 (cost=0.00..2398.87 rows=86987 width=12) (actual time=0.005..9.467 rows=86987 loops=1)
Total runtime: 10758.989 ms
(12 rows)
orcl=> insert into test02 select * from test02;
INSERT 0 86987
orcl=> explain analyze select /*+ set(rewrite_rule uniquecheck) */ count(*)
from test01 t1
where t1.data_object_id =
(select data_object_id
from test02 t2
where t1.object_id = t2.object_id);
ERROR: more than one row returned by a subquery used as an expression
orcl=> explain analyze select count(*)
from test01 t1
where t1.data_object_id =
(select max(data_object_id)
from test02 t2
where t1.object_id = t2.object_id);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1470319.09..1470319.10 rows=1 width=8) (actual time=10345.027..10345.027 rows=1 loops=1)
-> Hash Join (cost=8749.93..1470315.02 rows=1628 width=0) (actual time=122.192..10044.758 rows=4696064 loops=1)
Hash Cond: ((t1.object_id = subquery."?column?") AND (t1.data_object_id = subquery.max))
-> Seq Scan on test01 t1 (cost=0.00..1227776.53 rows=44531153 width=12) (actual time=0.007..3708.561 rows=44537344 loops=1)
-> Hash (cost=7428.51..7428.51 rows=88095 width=38) (actual time=120.853..120.853 rows=9172 loops=1)
Buckets: 131072 Batches: 1 Memory Usage: 1427kB
-> Subquery Scan on subquery (cost=5666.61..7428.51 rows=88095 width=38) (actual time=79.485..110.452 rows=86987 loops=1)
-> HashAggregate (cost=5666.61..6547.56 rows=88095 width=44) (actual time=79.483..103.440 rows=86987 loops=1)
Group By Key: t2.object_id
-> Seq Scan on test02 t2 (cost=0.00..4796.74 rows=173974 width=12) (actual time=0.007..20.196 rows=173974 loops=1)
Total runtime: 10345.491 ms
(11 rows)