语言
<< 返回文章列表

通过 MogDB 数据库ASP功能定位性能问题根因

2024年7月18日
M
o
g
D
B
,
a
s
p
,
,
,
许玉晨
36
上篇文章介绍了ASP的基本原理,详细内容请回看文章《循序渐进丨在 MogDB 数据库中实现 Oracle ASH能力》。本文将继续进一步解读如何通过 MogDB 数据库ASP功能定位性能问题根因。

local_active_session视图

 

 

内存里的ASP信息保存在DBE_PERF.local_active_session视图中。视图DBE_PERF.local_active_session的实现是通过调用函数get_local_active_ session()获取g_instance.stat_cxt.active_sess_hist_arrary->active_sess_hist_info里的采样数据数据生成视图。这部分内容为实时的信息,存储在内存中,显示的是最近的活跃session信息。视图的具体内容如下表:

ASP系统表

 

 

ASP的表为GS_ASP,属于CATALOG里的系统表,在src/include/catalog/gs_asp.h里定义:CATALOG(gs_asp,9534)。具体字段同视图DBE_PERF.local_active_session。GS_ASP的数据来源为选取g_instance.stat_cxt.active_sess_hist_arrary->active_sess_hist_info 里的部分采样并持久化到磁盘中。具体选取多少采样值是由GUC参数asp_flush_rate设定。相对于视图DBE_PERF.local_active_session,ASP表中存放的是过去更长时间内的活跃会话历史信息,适合更长时间跨度的统计分析。

ASP Log 文件

 

 

当GUC参数asp_flush_mode设置为file时,g_instance.stat_cxt.active_sess_hist_arrary里的部分采样信息不会记录到GS_ASP表,而是会被持久化存放到ASP的log文件中。文件的地址和命名分别在GUC参数asp_log_directory和asp_log_filename中设置,默认为数据库asp_data目录下。
ASP 的log文件内容样例如下:
{"sampleid":"350430","sample_time":"2023-07-27T23:59:35.558198+08:00","need_flush_sample":true ,"databaseid":0,"thread_id":"139681843377920","sessionid":"139681843377920","global_sessionid" :"0:0#0","start_time":"2023-07-07T16:39:56.969878+08:00","xact_start_time":null,"query_start_t ime":null,"state":"active","event":"none","waitstatus":"none","lwtid":23455,"psessionid":null, "tlevel":0,"smpid":0,"userid":0,"application_name":"Wal Writer","locktag":null,"lockmode":null,"block_sessionid":null,"client_addr":null,"client_hostn ame":null,"client_port":null,"query_id":"","unique_query_id":null,"user_id":null,"cn_id":null, "unique_query":null}

实战:定位阻塞源

 

 

模拟故障
执行session1的两条语句后,再执行session2的一条语句,session2会被阻塞等待session1的锁释放。
session 1:更新表emp中id为3的记录中字段name值为c1。
begin;
update emp SET name = 'c1' where id = 3;

session2:更新表emp中id为3的记录中字段name值为c2。
update emp SET name = 'c2' where id = 3;

查看阻塞
如果是查询实时信息,那么我们可以通过如下SQL去查询阻塞链。
SELECT blocked_locks.pid     AS blocked_pid,
blocked_activity.usename AS blocked_user,
blocking_locks.pid AS blocking_pid,
blocking_activity.usename AS blocking_user,
blocked_activity.query AS blocked_statement,
blocking_activity.query AS current_statement_in_blocking_process
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks
ON blocking_locks.locktype = blocked_locks.locktype
AND blocking_locks.database IS NOT DISTINCT FROM blocked_locks.database
AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
AND blocking_locks.pid != blocked_locks.pid
JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;
blocked_pid | blocked_user | blocking_pid | blocking_user | blocked_statement | current_statement_in_blocking_process
----------------+--------------+----------------+---------------+---------------------------------------------+---------------------------------------------
47810227406592 | omm | 47810357757696 | omm | update emp SET name = 'MogDB' where id = 3; | update emp SET name = 'MogDB' where id = 3;

可以看出,sessionid:47810227406592被阻塞。
通过ASP数据查询阻塞
SELECT sessionid, start_time, event, count
FROM (
SELECT sessionid, start_time, event, COUNT(*)
FROM dbe_perf.local_active_session
WHERE sample_time > now() - 10 (24 * 60)
GROUP BY sessionid, start_time, event) as t ORDER BY SUM(t.count) OVER (PARTITION BY t. sessionid, start_time)DESC, t.event;

sessionid | start_time | event | count
----------------+-------------------------------+-----------------------+-------
47810357757696 | 2024-04-21 21:07:23.08784+08 | wait cmd | 598
47810390652672 | 2024-04-21 21:12:03.224273+08 | HashJoin - build hash | 1
SELECT sessionid,start_time,event,wait_status,block_sessionid,final_block_sessionid FROM dbe_perf.local_active_session where block_sessionid in (47810357757696,47810390652672);

发现阻塞源头是sessionid:47810357757696,sessionid:47810390652672是被阻塞者,与前面结果一致。

增强ASP功能

 

 

MogDB 企业版增强的ASH能力,称为"SQL运行状态观测",主要是通过在采样数据中增加SQL执行算子的采样来完成的。在dbe_perf.local_active_session和GS_ASP中新增一列plan_node_id来记录SQL语句每个算子操作的执行情况。即可知道对于出现性能问题的SQL具体是慢在了执行计划的哪个步骤上。
开启增强ASP功能的参数:
resource_track_level参数指定为operator,则会开启算子采样能力,默认值是query,只会记录SQL级别采样。
模拟
创建表test:
MogDB=# create table test(c1 int);
CREATE TABLE
MogDB=# insert into test select generate_series(1, 1000000000);
查询出该SQL的query_id:
MogDB=# select query,query_id from pg_stat_activity where query like 'insert into test select%';
query | query_id
-----------------------------------------------------------+-----------------
insert into test select generate_series(1, 100000000000); | 562949953421368
(1 row)
查询带plan_node_id的执行计划:
Set resource_track_cost=10;


MogDB=# select query_plan from dbe_perf.statement_complex_runtime where queryid = 562949953421368;
query_plan
----------------------------------------------------------------------------
Coordinator Name: datanode1 +
1 | Insert on test (cost=0.00..17.51 rows=1000 width=8) +
2 | -> Subquery Scan on "*SELECT*" (cost=0.00..17.51 rows=1000 width=8) +
3 | -> Result (cost=0.00..5.01 rows=1000 width=0) +
+
(1 row)
SQL运行状态观测:
在session2中,根据query_id从采样视图dbe_perf.local_active_session中查询出该语句的采样情况,结合上面查询的执行计划做性能分析。
MogDB=# select plan_node_id, count(plan_node_id) from dbe_perf.local_active_session where query_id = 562949953421368 group by plan_node_id;
plan_node_id | count
--------------+-------
3 | 12
1 | 366
2 | 2
(3 rows)

结论

 

 

当发现insert into test select generate_series(1, 1000000000)存在性能瓶颈,通过以上的步骤定位发现,insert操作在整个SQL语句执行过程中被采样的数值最高( plan_node_id =1 ,count=366),可以对其进行优化。