数据恢复| 使用ErrorStack进行Oracle错误跟踪及诊断
在使用Oracle数据库的过程中,可能会遇到各种各样的错误或异常,很多异常的提示并不具体,我们有必要了解一下Oracle的ErrorStack跟踪方式。
ErrorStack是Oracle提供的一种对于错误堆栈进行跟踪的方法,通过设置跟踪可以将一些错误的后台信息详尽地转储出来,写入跟踪文件,对于错误的研究与诊断非常有效。
设置ErrorStack主要有4个级别:
0 - 仅转储错误堆栈(0级已经被逐渐废弃)
1 - 转储错误堆栈和函数调用堆栈
2 - Level1 + ProcessState
3 - Level2 + Context area(显示所有cursors,着重显示当前cursor)
ErrorStack可以在实例级或会话级别设置,也可以在参数文件中设置,这个设置仅当某个特定的错误出现时才被触发,如设置ORA-00942事件的跟踪:
alter session set events '942 trace name errorstack level 1';
一个客户曾经出现如下ORA-01438错误,提示数据的精度超过允许值,是后台JOB调度的任务:
Mon Jul 13 10:27:312009
Errors in file /admin/erpdb/bdump/erpdb1_j000_447020.trc:
ORA-12012: error on auto execute of job 22
ORA-01438: value larger than specified precision allowed for this column
ORA-06512: at "ERP.TIMRDU", line 13
ORA-06512: at line 1
跟踪文件中默认的不会记录具体的SQL、绑定变量等信息,我们可以通过ErrorStack进行后台跟踪,获得更详细的信息,执行如下代码中的SQL:
alter system set events='1438 trace name errorstack forever,level 3';
然后可以手工执行出错的存储过程(代码如下),获得跟踪文件,再关闭跟踪:
alter system set events='1438 trace name errorstack off';
在Oracle 10g中,这样的操作会被记录到日志文件中:
Mon Jul 13 10:48:392009
OS Pid: 541528 executed alter system set events '1438 trace name Errorstack forever,level 3'
Mon Jul 13 10:56:06 2009
Errors in file /admin/erpdb/udump/erpdb1_ora_267056.trc:
ORA-01438: value larger than specified precision allowed for this column
Mon Jul 13 10:56:08 2009
Trace dumping is performing id=[cdmp_20090713105608]
Mon Jul 13 10:57:15 2009
OS Pid: 541528 executed alter system set events '1438 trace name Errorstack off'
接下来分析获得的跟踪文件,就可以获得SQL文本线索,找到根本问题。在这个案例中,我们得到的跟踪文件,其关键SQL内容如下,通过这个SQL对照数据表很快就找到了精度超过的Number型字段:
***SESSION ID:(857.16304) 2009-07-13 10:56:06.429
*** 2009-07-13 10:56:06.429
ksedmp: internal or fatal error
ORA-01438: value larger than specified precision allowed for this column
Current SQL statement for this session:
INSERT INTO CONTRAPAYM (IHCODE,GTICODE,IDX,HCODE,PORDATE,FCODE,FCY,CCODE,ECODE,FLAG,MATCHFLAG,BCODE,MONTHZL,STATUS,AUTOFLAG ,REMARK) SELECTDISTINCT IHCODE,'0000000000000000' GTICODE,(IDX+100) IDX,HCODE,:B3 PORDATE,FCODE,:B2FCY,CCODE,ECODE, FLAG,0 MATCHFLAG,BCODE,MONTHZL,STATUS,4 AUTOFLAG ,'Proc_AdjustContractpayment' REMARK FROM CONTRAPAYM WHERE IHCODE=:B1 ANDIDX=(SELECT MAX(IDX) FROM CONTRACTPAYMENT WHERE IHCODE=:B1 ) AND ROWNUM=1
----- PL/SQL Call Stack -----
object line object
handle number name
7000002ca366e80 100 procedure ERP.PROC_ADJCONTRAPAYM
700000336a1a070 236 procedure ERP.PROC_AUTOBATPROC
7000002ca367df0 5 procedure ERP.TIMRDU
700000342eb7c20 1 anonymous block
在跟踪文件中,还有大量的堆栈信息,对于复杂的问题,还可以通过进一步细致的堆栈分析进行深入追踪。
可以很容易地测试这一功能的使用,比如使用如下代码中的测试过程:
SQL> alter system set events='1438 trace name errorstack forever,level 3';
SQL> connect eygle/eygle
SQL> create table t (name varchar2(10),id number);
Table created.
SQL> insert into t values(a,1);
insert into t values(a,1)
*
ERROR at line 1:
ORA-00984: column not allowed here
SQL> alter system set events '984 trace name errorstack off';
在告警日志文件中就可以获得如下信息:
Mon Jul 13 22:55:592009
OS Pid: 2431 executed alter system set events '984 trace name ERRORSTACK level3'
Mon Jul 13 22:59:12 2009
Errors in file /opt/oracle/admin/mmstest/udump/mmstest_ora_2520.trc:
ORA-00984: column not allowed here
Mon Jul 13 23:01:01 2009
OS Pid: 2431 executed alter system set events '984 trace name errorstack off'
获得的跟踪文件里记录了insert的相关信息:
*** 2009-07-13 22:59:12.928
ksedmp: internal or fatal error
ORA-00984: column not allowed here
Current SQL statement for this session:
insert into t values(a,1)
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (?means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst()+27 all ksedst1() 0? 1 ?
以下引用在ITPUB上的一个讨论,是使用ErrorStack跟踪解决问题的典型案例(原文链接指向ITPUB):
问题概述
在进行数据库例行维护时,遇到数据库正常导出,但导入时遇到有张表不能导入的问题。这张表存储工艺路线,大约共有160万条记录。表的不能导入影响一部分功能的实现,各单位反映强烈。
初步分析
首先排除是系统bug所带来的问题,因为前期导入导出未发现问题;另外由于是对同一数据库进行的导入导出,也排除了字符集问题。
单独再次导入此表,出现错误如下;
[oracle@zhx25 ~]$ imp system/ceshi fromuser=dba_mgr touser=dba_mgr tables=t_routdef ignore=y file=1.dmp
Import: Release 10.2.0.1.0 - Production on Tue Mar 18 14:19:49 2008
Copyright (c) 1982, 2005, Oracle.All rights reserved.
Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production
With the Partitioning, OLAP and Data Mining options
Export file created by EXPORT:V10.02.01 via conventional path
import done in ZHS16GBK character set and ZHS16GBK NCHAR character set
. importing DBA_MGR's objects into DBA_MGR
. . importing table "T_ROUTDEF"
IMP-00058: ORACLE error 1438 encountered
ORA-01438: value larger than specified precision allowed for this column
IMP-00028: partial import of previous table rolled back: 20501 rows rolled back
解决方向:有可能是数据中存在问题,很有可能是输入一些特殊字符,这些字符和ORACLE自身使用的通配符一致,所以导致系统不能识别。
【云和恩墨,提供7*24最专业的数据恢复(Oracle,MySQL,SQL server)服务,致力于为您的数据库系统做最后一道安全防护!服务热线:010-59007017-7030】数据恢复|数据库运维|性能优化|安全保障
问题解决
1、对问题进行跟踪
对系统设置1438 errorstack跟踪:
SQL> alter system set events='1438 trace name Errorstack forever,level 10';
System altered
然后重新执行IMP;
2、查看UDUMP下日志,如下:
/opt/oracle/admin/orcl/udump/orcl_ora_32355.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production
With the Partitioning, OLAP and Data Mining options
ORACLE_HOME = /opt/oracle/product/10.2.0
System name:Linux
Node name:zhx25.cccc.com
Release:2.6.9-22.ELsmp
Version:#1 SMP Mon Sep 19 18:00:54 EDT 2005
Machine:x86_64
Instance name: orcl
Redo thread mounted by this instance: 1
Oracle process number: 17
Unix process pid: 32355, image: oracle@zhx25.cccc.com (TNS V1-V3)
*** SERVICE NAMESYS$USERS) 2008-03-18 17:11:02.014
*** SESSION ID135.949) 2008-03-18 17:11:02.014
*** 2008-03-18 17:11:02.014
ksedmp: internal or fatal error
ORA-01438: value larger than specified precision allowed for this column
Current SQL statement for this session:
INSERT /*+NESTED_TABLE_SET_REFS+*/ INTO "T_ROUTDEF" ("PRTNO", "SEQNO", "SEQDESC", "PRTEFLAG", "SEQCC", "SEQNXTNO", "SEQDEPT", "WKCCODE", "WKCALTCODE", "SEQSETHR", "LAB_SEQSETHR", "SEQRUNHR", "LAB_SEQRUNHR", "SEQQUEHR", "LAB_SEQQUEHR", "SEQTRANHR", "LAB_SEQTRANHR", "SEQNOQLTRAT", "SEQCHGDAT", "SEQCHGRSN", "SEQCHGCTLR", "MACHCODE", "MACHALTCODE", "RESCODE", "KEYSEQ", "SEQCRYLOT", "INITOVERRATE", "COMWORK", "SEQLOTFLAG", "SEQEFDAT", "SEQIEFDAT", "SEQEFLOTNO", "SEQIEFLOTNO", "SEQECONO", "PRTSEQCST", "SEQSELFCST", "INVYORN", "SEQQTY", "SEQNOTE", "NOPRTAVELZ", "OUTSEQFLAG", "WXCST", "SEQGSCHGCTLR", "SEQGSCHGDAT", "SEQSHCTL", "SEQSHDAT", "SEQSHSIGN", "PRICE", "ACCNO", "ROUSTAT", "FIXMFLAG", "BANCL" VALUES (:1, :2, :3, :4, :5, :6, :7, :8, :9, :10, :11, :12, :13, :14, :15, :16, :17, :18, :19, :20, :21, :22, :23, :24, :25, :26, :27, :28, :29, :30, :31, :32, :33, :34, :35, :36, :37, :38, :39, :40, :41, :42, :43, :44, :45, :46, :47, :48, :49, :50, :51, :52)
----- Call Stack Trace -----
calling
call
entry
argument values in hex
location
type
point
(? means dubious value)
…………
oacdef = 6a9477a0
Offsi = 48, Offsi = 1440
Opcode = 2
Bind Twotask Scalar Sql In (may be out) Copy
oacdef = 6a9477d0
Offsi = 48, Offsi = 1472
Opcode = 2
Bind Twotask Scalar Sql In (may be out) Copy
oacdef = 6a947800
Offsi = 48, Offsi = 1504
Opcode = 2
Bind Twotask Scalar Sql In (may be out) Copy
oacdef = 6a947830
Offsi = 48, Offsi = 1536
Opcode = 2
Bind Twotask Scalar Sql In (may be out) Copy
oacdef = 6a947860
Offsi = 48, Offsi = 1568
Opcode = 2
Bind Twotask Scalar Sql In (may be out) Copy
oacdef = 6a947890
Offsi = 48, Offsi = 1600
Opcode = 2
Bind Twotask Scalar Sql In (may be out) Copy
oacdef = 6a9478c0
Offsi = 48, Offsi = 1632
kkscoacd
Bind#0
oacdty=01 mxl=32(24) mxlc=00 mal=00 scl=00 pre=00
oacflg=03 fl2=1000010 frm=01 csi=852 siz=3344 off=0
kxsbbbfp=2a974dc398
bln=32
avl=23
flg=05
value="?
Bind#1
oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
oacflg=03 fl2=1000000 frm=00 csi=00 siz=0 off=32
kxsbbbfp=2a974dc3b8
bln=22 avl=22 flg=01
value=###
An invalid number has been seen.Memory contents are :
Dump of memory from 0x0000002A974CDE88 to 0x0000002A974CDE9E
2A974CDE80 3331312E 0406C102 [.113....]
2A974CDE90 C9BBC6C8 52024D01 30310248 00003304[.....M.RH.10.3..]
Bind#2
oacdty=01 mxl=32(30) mxlc=00 mal=00 scl=00 pre=00
oacflg=03 fl2=1000010 frm=01 csi=852 siz=0 off=56
kxsbbbfp=2a974dc3d0
bln=32
avl=00
flg=01
Bind#3
oacdty=01 mxl=32(01) mxlc=00 mal=00 scl=00 pre=00
oacflg=03 fl2=1000010 frm=01 csi=852 siz=0 off=88
kxsbbbfp=2a974dc3f0
bln=32
avl=01
flg=01
value="M"
…………
从跟踪文件可以看到,问题是系统认为地址为2A974CDE88的数值非法,在导入会提示ORA-1438错误。本值在数据库中定义为number(8)类型,但从结果来看,系统把这个值当成了number(24)类型,并且由于格式不对,不能取出值。
3、找到DMP中相关数据
用winhex打开dmp文件,搜索十六进制串“2E313133 02C10604 C8C6BBC9 014D0252 48023130 0433F3FF”(对trace文件中的值要进行顺序调整)
4、对导出数据进行更改
由于“2E313133 02C10604 C8C6BBC9 014D0252 48023130 0433F3FF”不是一个有效的number类型,需要对它进行调整。调整如下,只要确保调整的值是number类型的即可。
5、对数据库相关数据项的类型进行调整
把数据项的类型由number(*)改为number(38);
6、重新进行数据导入
[oracle@zhx25 ~]$ imp system/ceshi fromuser=dba_mgr touser=dba_mgr tables=t_routdef ignore=y file=1.dmp
Import: Release 10.2.0.1.0 - Production on Tue Mar 18 14:19:49 2008
Copyright (c) 1982, 2005, Oracle.All rights reserved.
Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production
With the Partitioning, OLAP and Data Mining options
Export file created by EXPORT:V10.02.01 via conventional path
import done in ZHS16GBK character set and ZHS16GBK NCHAR character set
. importing DBA_MGR's objects into DBA_MGR
. . importing table
"T_ROUTDEF"
1646404
本文通过介绍ErrorStack的应用,将这一常用的诊断工具介绍给大家,作为DBA,在遇到复杂问题时,通过这些手段深入分析,不仅有助于问题的解决,还将完善自己的知识体系,祝大家学习能够不断进步、深入。