GaussDB 100 OLTP 之 GS-00001 错误 SGA问题的诊断和解决
华为GaussDB OLTP 100 的安装是通过 Python 脚本执行的,通过运行 install.py 创建数据库。
在第一次安装时,遇到如下问题,输出日志显示,数据库实例启动失败:
[root@localhost ]# python install.py -U omm:dbgrp -R /opt/gaussdb/app -D /opt/gaussdb/data -C LSNR_ADDR=127.0.0.1,192.168.1.121 -C LSNR_PORT=1888
Checking runner.
Checking parameters.
End check parameters.
Checking user.
End check user.
Checking old install.
End check old install.
Checking kernel parameters.
Checking directory.
Checking integrality of run file...
Decompressing run file.
Setting user env.
Checking data dir and config file
Initialize db instance.
Creating database.
Error: Can not get instance '/opt/gaussdb/data' process pid,The detailed information: 'instance startup failed '
Please refer to install log "/home/omm/zengineinstall.log" for more detailed information.
从日志可以看出,默认初始化的实例名称是 zengine ,也就是 Z 引擎的含义。
前台输出的错误信息是:instance startup failed 。
那么实例启动失败的真实原因是什么呢?
在 GaussDB 数据库安装失败之后,会回滚所有操作,检查运行日志,可以找到具体的错误原因。
从以下日志提示中,可以检查数据库的初始参数设置,最后的提示是创建SGA失败:
[root@localhost ~]# cd /opt/gaussdb/log/run
[root@localhost run]# ls
zengine.rlog
[root@localhost run]# more zengine.rlog
UTC+8 2019-11-22 15:36:47.885|ZENGINE|00000|77309437941|INFO>[LOG] file '/opt/gaussdb/data/log/zenith_alarm.log' is added [srv_param.c:488]
UTC+8 2019-11-22 15:36:47.885|ZENGINE|00000|26613|INFO>[LOG] file '/opt/gaussdb/data/log/run/zengine.rlog' is added [cm_log.c:643]
UTC+8 2019-11-22 15:36:47.917|ZENGINE|00000|26613|INFO>[PARAM] LSNR_ADDR = 127.0.0.1,192.168.1.121
UTC+8 2019-11-22 15:36:47.917|ZENGINE|00000|26613|INFO>[PARAM] LSNR_PORT = 1888
UTC+8 2019-11-22 15:36:47.917|ZENGINE|00000|26613|INFO>[PARAM] DATA_BUFFER_SIZE = 2G
UTC+8 2019-11-22 15:36:47.917|ZENGINE|00000|26613|INFO>[PARAM] SHARED_POOL_SIZE = 1G
UTC+8 2019-11-22 15:36:47.917|ZENGINE|00000|26613|INFO>[PARAM] LOG_BUFFER_SIZE = 64M
UTC+8 2019-11-22 15:36:47.917|ZENGINE|00000|26613|INFO>[PARAM] LOG_BUFFER_COUNT = 8
UTC+8 2019-11-22 15:36:47.917|ZENGINE|00000|26613|INFO>[PARAM] TEMP_BUFFER_SIZE = 1G
UTC+8 2019-11-22 15:36:47.917|ZENGINE|00000|26613|INFO>[PARAM] SESSIONS = 1500
UTC+8 2019-11-22 15:36:47.917|ZENGINE|00000|26613|INFO>[PARAM] DBWR_PROCESSES = 8
UTC+8 2019-11-22 15:36:47.917|ZENGINE|00000|26613|INFO>[PARAM] INSTANCE_NAME = zenith
UTC+8 2019-11-22 15:36:47.917|ZENGINE|00000|26613|INFO>[PARAM] ENABLE_SYSDBA_LOGIN = TRUE
UTC+8 2019-11-22 15:36:47.917|ZENGINE|00000|206158456821|INFO>starting instance(nomount)
UTC+8 2019-11-22 15:36:47.917|ZENGINE|00000|26613|ERROR>GS-00001 : Failed to allocate 4592381952 bytes for sga [srv_sga.c:170]
UTC+8 2019-11-22 15:36:47.917|ZENGINE|00000|26613|ERROR>failed to create sga
UTC+8 2019-11-22 15:36:47.917|ZENGINE|00000|26613|ERROR>Instance Startup Failed
注意,我们在这里发现了GaussDB 的第一号错误: GS-00001 。这个错误提示是无法分片足够的 SGA 内存。对于这个错误,扩大主机内存超过 4GB 即可。
我们可以对比一下 Oracle 的第一号错误,ORA-00001 是唯一约束冲突:
ORA-00001: unique constraint (string.string) violated
Cause: An UPDATE or INSERT statement attempted to insert a duplicate key. For Trusted Oracle configured in DBMS MAC mode, you may see this message if a duplicate entry exists at a different level.
Action: Either remove the unique restriction or do not insert the key.
从 GaussDB 和 Oracle 数据库的对比来看,GaussDB 的 第一号 错误,突出的是从数据库入手和着眼;而 Oracle 的 第一号错误,则是从应用和 数据入手的。
那么在 GaussDB 中,Oracle 的第一号错误排在什么位置呢?我们做个测试:
SQL> create unique index idx_enmo_id on enmo(id);
Succeed.
SQL> insert into enmo values(1,'EYGLE');
1 rows affected.
SQL> insert into enmo values(1,'EYGLE');
GS-00729, Unique constraint violated, index IDX_ENMO_ID, duplicate key 1
在 GaussDB 中,唯一约束冲突的问题,排在第 729 号上。
关于 GaussDB 的学习资料,进一步可以参考 墨天轮:
学习 GaussDB 从 100 开始。