<< 返回文章列表

数据恢复| 使用ErrorStack进行Oracle错误跟踪及诊断

2017年12月26日
盖国强
1073

在使用Oracle数据库的过程中,可能会遇到各种各样的错误或异常,很多异常的提示并不具体,我们有必要了解一下Oracle的ErrorStack跟踪方式。

image.png

ErrorStack是Oracle提供的一种对于错误堆栈进行跟踪的方法,通过设置跟踪可以将一些错误的后台信息详尽地转储出来,写入跟踪文件,对于错误的研究与诊断非常有效。


设置ErrorStack主要有4个级别:

0 - 仅转储错误堆栈(0级已经被逐渐废弃)

1 - 转储错误堆栈和函数调用堆栈

2 - Level1 + ProcessState

3 - Level2 + Context area(显示所有cursors,着重显示当前cursor)


ErrorStack可以在实例级或会话级别设置,也可以在参数文件中设置,这个设置仅当某个特定的错误出现时才被触发,如设置ORA-00942事件的跟踪:

alter session set events '942 trace name errorstack level 1';


一个客户曾经出现如下ORA-01438错误,提示数据的精度超过允许值,是后台JOB调度的任务:

Mon Jul 13 10:27:312009
Errors in file /admin/erpdb/bdump/erpdb1_j000_447020.trc:
ORA-12012: error on auto execute of job 22
ORA-01438: value larger than specified precision allowed for this column
ORA-06512: at "ERP.TIMRDU", line 13
ORA-06512: at line 1


跟踪文件中默认的不会记录具体的SQL、绑定变量等信息,我们可以通过ErrorStack进行后台跟踪,获得更详细的信息,执行如下代码中的SQL:

alter system set events='1438 trace name errorstack forever,level 3';


然后可以手工执行出错的存储过程(代码如下),获得跟踪文件,再关闭跟踪:

alter system set events='1438 trace name errorstack off';


在Oracle 10g中,这样的操作会被记录到日志文件中:

Mon Jul 13 10:48:392009
OS Pid: 541528 executed alter system set events '1438 trace name Errorstack forever,level 3'
Mon Jul 13 10:56:06 2009
Errors in file /admin/erpdb/udump/erpdb1_ora_267056.trc:
ORA-01438: value larger than specified precision allowed for this column
Mon Jul 13 10:56:08 2009
Trace dumping is performing id=[cdmp_20090713105608]
Mon Jul 13 10:57:15 2009
OS Pid: 541528 executed alter system set events '1438 trace name Errorstack off'


接下来分析获得的跟踪文件,就可以获得SQL文本线索,找到根本问题。在这个案例中,我们得到的跟踪文件,其关键SQL内容如下,通过这个SQL对照数据表很快就找到了精度超过的Number型字段:

***SESSION ID:(857.16304) 2009-07-13 10:56:06.429
*** 2009-07-13 10:56:06.429
ksedmp: internal or fatal error
ORA-01438: value larger than specified precision allowed for this column
Current SQL statement for this session:
INSERT INTO CONTRAPAYM (IHCODE,GTICODE,IDX,HCODE,PORDATE,FCODE,FCY,CCODE,ECODE,FLAG,MATCHFLAG,BCODE,MONTHZL,STATUS,AUTOFLAG ,REMARK) SELECTDISTINCT IHCODE,'0000000000000000' GTICODE,(IDX+100) IDX,HCODE,:B3 PORDATE,FCODE,:B2FCY,CCODE,ECODE, FLAG,0 MATCHFLAG,BCODE,MONTHZL,STATUS,4 AUTOFLAG ,'Proc_AdjustContractpayment' REMARK FROM CONTRAPAYM WHERE IHCODE=:B1 ANDIDX=(SELECT MAX(IDX) FROM CONTRACTPAYMENT WHERE IHCODE=:B1 ) AND ROWNUM=1
----- PL/SQL Call Stack -----
 object      line object
 handle    number name
7000002ca366e80       100   procedure ERP.PROC_ADJCONTRAPAYM
700000336a1a070       236   procedure ERP.PROC_AUTOBATPROC
7000002ca367df0         5   procedure ERP.TIMRDU
700000342eb7c20         1   anonymous block


在跟踪文件中,还有大量的堆栈信息,对于复杂的问题,还可以通过进一步细致的堆栈分析进行深入追踪。

可以很容易地测试这一功能的使用,比如使用如下代码中的测试过程:

       SQL>  alter system set events='1438 trace name errorstack forever,level 3';

       SQL> connect eygle/eygle
       SQL> create table t (name varchar2(10),id number);

Table created.
SQL> insert into t values(a,1);
insert into t values(a,1)
                        *
ERROR at line 1:
ORA-00984: column not allowed here
SQL> alter system set events '984 trace name errorstack off';


在告警日志文件中就可以获得如下信息:

Mon Jul 13 22:55:592009
OS Pid: 2431 executed alter system set events '984 trace name ERRORSTACK level3'
Mon Jul 13 22:59:12 2009
Errors in file /opt/oracle/admin/mmstest/udump/mmstest_ora_2520.trc:
ORA-00984: column not allowed here
Mon Jul 13 23:01:01 2009
OS Pid: 2431 executed alter system set events '984 trace name errorstack off'


获得的跟踪文件里记录了insert的相关信息:

*** 2009-07-13 22:59:12.928
ksedmp: internal or fatal error
ORA-00984: column not allowed here
Current SQL statement for this session:
insert into t values(a,1)
----- Call Stack Trace -----
calling                 call      entry                   argument values in hex
location                type     point                    (?means dubious value)
--------------------    --------  --------------------     ----------------------------
ksedst()+27             all       ksedst1()                0? 1 ?


以下引用在ITPUB上的一个讨论,是使用ErrorStack跟踪解决问题的典型案例(原文链接指向ITPUB):

问题概述

在进行数据库例行维护时,遇到数据库正常导出,但导入时遇到有张表不能导入的问题。这张表存储工艺路线,大约共有160万条记录。表的不能导入影响一部分功能的实现,各单位反映强烈。

初步分析

首先排除是系统bug所带来的问题,因为前期导入导出未发现问题;另外由于是对同一数据库进行的导入导出,也排除了字符集问题。

单独再次导入此表,出现错误如下;

[oracle@zhx25 ~]$ imp system/ceshi fromuser=dba_mgr touser=dba_mgr tables=t_routdef ignore=y file=1.dmp


Import: Release 10.2.0.1.0 - Production on Tue Mar 18 14:19:49 2008

Copyright (c) 1982, 2005, Oracle.All rights reserved.


Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production

With the Partitioning, OLAP and Data Mining options


Export file created by EXPORT:V10.02.01 via conventional path

import done in ZHS16GBK character set and ZHS16GBK NCHAR character set

. importing DBA_MGR's objects into DBA_MGR

. . importing table "T_ROUTDEF"

IMP-00058: ORACLE error 1438 encountered

ORA-01438: value larger than specified precision allowed for this column

IMP-00028: partial import of previous table rolled back: 20501 rows rolled back


解决方向:有可能是数据中存在问题,很有可能是输入一些特殊字符,这些字符和ORACLE自身使用的通配符一致,所以导致系统不能识别。


【云和恩墨,提供7*24最专业的数据恢复(Oracle,MySQL,SQL server)服务,致力于为您的数据库系统做最后一道安全防护!服务热线:010-59007017-7030】数据恢复|数据库运维|性能优化|安全保障


问题解决

1、对问题进行跟踪

对系统设置1438 errorstack跟踪:

SQL> alter system set events='1438 trace name Errorstack forever,level 10';

System altered

然后重新执行IMP;

2、查看UDUMP下日志,如下:

/opt/oracle/admin/orcl/udump/orcl_ora_32355.trc

Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production

With the Partitioning, OLAP and Data Mining options

ORACLE_HOME = /opt/oracle/product/10.2.0

System name:Linux

Node name:zhx25.cccc.com

Release:2.6.9-22.ELsmp

Version:#1 SMP Mon Sep 19 18:00:54 EDT 2005

Machine:x86_64

Instance name: orcl

Redo thread mounted by this instance: 1

Oracle process number: 17

Unix process pid: 32355, image: oracle@zhx25.cccc.com (TNS V1-V3)


*** SERVICE NAMESYS$USERS) 2008-03-18 17:11:02.014

*** SESSION ID135.949) 2008-03-18 17:11:02.014

*** 2008-03-18 17:11:02.014

ksedmp: internal or fatal error

ORA-01438: value larger than specified precision allowed for this column

Current SQL statement for this session:

INSERT /*+NESTED_TABLE_SET_REFS+*/ INTO "T_ROUTDEF" ("PRTNO", "SEQNO", "SEQDESC", "PRTEFLAG", "SEQCC", "SEQNXTNO", "SEQDEPT", "WKCCODE", "WKCALTCODE", "SEQSETHR", "LAB_SEQSETHR", "SEQRUNHR", "LAB_SEQRUNHR", "SEQQUEHR", "LAB_SEQQUEHR", "SEQTRANHR", "LAB_SEQTRANHR", "SEQNOQLTRAT", "SEQCHGDAT", "SEQCHGRSN", "SEQCHGCTLR", "MACHCODE", "MACHALTCODE", "RESCODE", "KEYSEQ", "SEQCRYLOT", "INITOVERRATE", "COMWORK", "SEQLOTFLAG", "SEQEFDAT", "SEQIEFDAT", "SEQEFLOTNO", "SEQIEFLOTNO", "SEQECONO", "PRTSEQCST", "SEQSELFCST", "INVYORN", "SEQQTY", "SEQNOTE", "NOPRTAVELZ", "OUTSEQFLAG", "WXCST", "SEQGSCHGCTLR", "SEQGSCHGDAT", "SEQSHCTL", "SEQSHDAT", "SEQSHSIGN", "PRICE", "ACCNO", "ROUSTAT", "FIXMFLAG", "BANCL" VALUES (:1, :2, :3, :4, :5, :6, :7, :8, :9, :10, :11, :12, :13, :14, :15, :16, :17, :18, :19, :20, :21, :22, :23, :24, :25, :26, :27, :28, :29, :30, :31, :32, :33, :34, :35, :36, :37, :38, :39, :40, :41, :42, :43, :44, :45, :46, :47, :48, :49, :50, :51, :52)

----- Call Stack Trace -----

calling

call

entry

argument values in hex 


location

type

point

(? means dubious value)


…………


oacdef = 6a9477a0

Offsi = 48, Offsi = 1440


Opcode = 2

Bind Twotask Scalar Sql In (may be out) Copy 


oacdef = 6a9477d0

Offsi = 48, Offsi = 1472


Opcode = 2

Bind Twotask Scalar Sql In (may be out) Copy 


oacdef = 6a947800

Offsi = 48, Offsi = 1504


Opcode = 2

Bind Twotask Scalar Sql In (may be out) Copy 


oacdef = 6a947830

Offsi = 48, Offsi = 1536


Opcode = 2

Bind Twotask Scalar Sql In (may be out) Copy 


oacdef = 6a947860

Offsi = 48, Offsi = 1568


Opcode = 2

Bind Twotask Scalar Sql In (may be out) Copy 


oacdef = 6a947890

Offsi = 48, Offsi = 1600


Opcode = 2

Bind Twotask Scalar Sql In (may be out) Copy 


oacdef = 6a9478c0

Offsi = 48, Offsi = 1632

kkscoacd


Bind#0

oacdty=01 mxl=32(24) mxlc=00 mal=00 scl=00 pre=00

oacflg=03 fl2=1000010 frm=01 csi=852 siz=3344 off=0


kxsbbbfp=2a974dc398

bln=32

avl=23

flg=05


value="?


Bind#1

oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00

oacflg=03 fl2=1000000 frm=00 csi=00 siz=0 off=32

kxsbbbfp=2a974dc3b8

bln=22 avl=22 flg=01


value=###

An invalid number has been seen.Memory contents are :

Dump of memory from 0x0000002A974CDE88 to 0x0000002A974CDE9E

2A974CDE80 3331312E 0406C102 [.113....]

2A974CDE90 C9BBC6C8 52024D01 30310248 00003304[.....M.RH.10.3..]


Bind#2

oacdty=01 mxl=32(30) mxlc=00 mal=00 scl=00 pre=00

oacflg=03 fl2=1000010 frm=01 csi=852 siz=0 off=56

kxsbbbfp=2a974dc3d0

bln=32

avl=00

flg=01


Bind#3

oacdty=01 mxl=32(01) mxlc=00 mal=00 scl=00 pre=00

oacflg=03 fl2=1000010 frm=01 csi=852 siz=0 off=88

kxsbbbfp=2a974dc3f0

bln=32

avl=01

flg=01


value="M"

…………

从跟踪文件可以看到,问题是系统认为地址为2A974CDE88的数值非法,在导入会提示ORA-1438错误。本值在数据库中定义为number(8)类型,但从结果来看,系统把这个值当成了number(24)类型,并且由于格式不对,不能取出值。

3、找到DMP中相关数据

用winhex打开dmp文件,搜索十六进制串“2E313133 02C10604 C8C6BBC9 014D0252 48023130 0433F3FF”(对trace文件中的值要进行顺序调整)


4、对导出数据进行更改

由于“2E313133 02C10604 C8C6BBC9 014D0252 48023130 0433F3FF”不是一个有效的number类型,需要对它进行调整。调整如下,只要确保调整的值是number类型的即可。


5、对数据库相关数据项的类型进行调整

把数据项的类型由number(*)改为number(38);

6、重新进行数据导入

[oracle@zhx25 ~]$ imp system/ceshi fromuser=dba_mgr touser=dba_mgr tables=t_routdef ignore=y file=1.dmp


Import: Release 10.2.0.1.0 - Production on Tue Mar 18 14:19:49 2008

Copyright (c) 1982, 2005, Oracle.All rights reserved.


Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production

With the Partitioning, OLAP and Data Mining options


Export file created by EXPORT:V10.02.01 via conventional path

import done in ZHS16GBK character set and ZHS16GBK NCHAR character set

. importing DBA_MGR's objects into DBA_MGR

. . importing table

"T_ROUTDEF"

1646404


本文通过介绍ErrorStack的应用,将这一常用的诊断工具介绍给大家,作为DBA,在遇到复杂问题时,通过这些手段深入分析,不仅有助于问题的解决,还将完善自己的知识体系,祝大家学习能够不断进步、深入。