<< 返回文章列表

从Oracle到PostgreSQL:最全控制文件(下)

2019年7月19日
李思雷
939


PostgreSQL控制文件恢复测试

测试过程如下(基于PostgreSQL 11.2) :

1. 新建测试数据, 用到with oids的表, 因为OID无法确定, 看看是否会有异常

postgres=# create table lsl_oid1(id int primary key) with oids; 
CREATE TABLE
postgres=#  insert into lsl_oid1 select generate_series(1,100000);
INSERT 0 100000
postgres=# select min(oid),max(oid) from lsl_oid1 ;  
  min  |  max  
-------+--------
16400 | 116399
(1 row)

2. 关闭数据,记录下控制文件信息

[postgres@lsl-test1 bin]$ /usr/pgsql-11/bin/pg_ctl  stop -D /pg/pg11/data
waiting for server to shut down.... done
server stopped

## 记下pg_controldata信息, 方便修复后进行比对

[postgres@lsl-test1 bin]$ /usr/pgsql-11/bin/pg_controldata  -D /pg/pg11/data
pg_control version number:            1100
Catalog version number:               201809051
Database system identifier:           6691945724594983959
Database cluster state:               shut down
pg_control last modified:             Thu 30 May 2019 05:26:41 PM CST
Latest checkpoint location:           0/79E9888
Latest checkpoint's REDO location:    0/79E9888
Latest checkpoint's REDO WAL file:    000000010000000000000007
Latest checkpoint's TimeLineID:       1
Latest checkpoint's PrevTimeLineID:   1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:          0:1048585
Latest checkpoint's NextOID:          116400
Latest checkpoint's NextMultiXactId:  65536
Latest checkpoint's NextMultiOffset:  52352
Latest checkpoint's oldestXID:        561
Latest checkpoint's oldestXID's DB:   13878
Latest checkpoint's oldestActiveXID:  0
Latest checkpoint's oldestMultiXid:   1
Latest checkpoint's oldestMulti's DB: 13878
Latest checkpoint's oldestCommitTsXid:0
Latest checkpoint's newestCommitTsXid:0
Time of latest checkpoint:            Thu 30 May 2019 05:26:40 PM CST
Fake LSN counter for unlogged rels:   0/1
Minimum recovery ending location:     0/0
Min recovery ending loc's timeline:   0
Backup start location:                0/0
Backup end location:                  0/0
End-of-backup record required:        no
wal_level setting:                    replica
wal_log_hints setting:                off
max_connections setting:              100
max_worker_processes setting:         8
max_prepared_xacts setting:           0
max_locks_per_xact setting:           64
track_commit_timestamp setting:       off
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        1996
Size of a large-object chunk:         2048
Date/time type storage:               64-bit integers
Float4 argument passing:              by value
Float8 argument passing:              by value
Data page checksum version:           0
Mock authentication nonce:            0000000000000000000000000000000000000000000000000000000000000000   

3. 模拟控制文件故障,直接删除控制文件

[postgres@lsl-test1 global]$ pwd
/pg/pg11/data/global
[postgres@lsl-test1 global]$ rm -rf pg_control

4. 启动数据库,模拟控制文件丢失场景

[postgres@lsl-test1 global]$ /usr/pgsql-11/bin/pg_ctl  start
waiting for server to start....postgres: could not find the database system
Expected to find it in the directory "/pg/pg11/data",
but could not open file "/pg/pg11/data/global/pg_control": No such file or directory
stopped waiting
pg_ctl: could not start server
Examine the log output.

下面开始正式重建控制文件,使得数据库可以正常启动。

5. 首先创建一个名为pg_control的空文件

[postgres@lsl-test1 global]$ touch $PGDATA/global/pg_control  

6. 使用pg_resetwal修复pg_control

确认pg_resetwal参数值。

首先确认-c参数,上面参数详细分析发现目录下为空,因此暂时忽略此参数。

-e参数是设置下一个事务号的新纪元,我们测试环境没有其它复制系统因此也可以忽略。

-l参数,查看pg_wal下文件文件名,大于文件名最大值即可。

[postgres@lsl-test1 ~]$ cd /pg/pg11/data/pg_wal[postgres@lsl-test1 pg_wal]$ ls -ltotal 32768-rw-------. 1 postgres postgres 16777216 May 30 17:26 000000010000000000000007-rw-------. 1 postgres postgres 16777216 May 30 17:26 000000010000000000000008drwx------. 2 postgres postgres        6 May 17 17:22 archive_status之类要大于最大值,因此我们可以去-l=000000010000000000000009-m参数取pg_multixact/offsets目录下最大值加1然后乘以65536(0x10000)和最小值 乘以65536(0x10000)[postgres@lsl-test1 pg_commit_ts]$ cd ../pg_multixact/offsets/[postgres@lsl-test1 offsets]$ ls -ltotal 8-rwx------. 1 postgres postgres 8192 May 17 18:04 0000

因此-m可以取0x10000,0x00000。

-o参数不确定时,由于测试没有复制软件因此可以暂时忽略。

-O  查找pg_multixact/members目录下数值最大的文件名,+1乘以52352 (0xCC80)。

[postgres@lsl-test1 offsets]$ cd ../members/
[postgres@lsl-test1 members]$ ls -l
total 8
-rwx------. 1 postgres postgres 8192 May 17 17:22 0000

因此-O=0xCC80。


-x参数查找pg_xact目录下可以查看数值最大的文件名,+1乘以 1048576 (0x100000)。

[postgres@lsl-test1 members]$ cd ../../pg_xact/
[postgres@lsl-test1 pg_xact]$ ls -l
total 16
-rwx------. 1 postgres postgres 8192 May 17 18:04 0000
-rw-------. 1 postgres postgres 8192 May 30 17:26 0001

因此-x=0x200000。


不加-f参数时可以查看要写入控制文件中的参数内容。

[postgres@lsl-test1 pg_xact]$ /usr/pgsql-11/bin/pg_resetwal   -l 000000010000000000000009 -m  0x10000,0x00000  -O  0xCC80  -x 0x200000 -D /pg/pg11/data
pg_resetwal: oldest multitransaction ID (-m) must not be 0
[postgres@lsl-test1 pg_xact]$ /usr/pgsql-11/bin/pg_resetwal   -l 000000010000000000000009 -m  0x10000,0x00001  -O  0xCC80  -x 0x200000 -D /pg/pg11/data
pg_resetwal: pg_control exists but is broken or wrong version; ignoring it
Guessed pg_control values:
pg_control version number:            1100
Catalog version number:               201809051
Database system identifier:           6696828635748080009
Latest checkpoint's TimeLineID:       1
Latest checkpoint's full_page_writes: off
Latest checkpoint's NextXID:          0:3
Latest checkpoint's NextOID:          10000
Latest checkpoint's NextMultiXactId:  1
Latest checkpoint's NextMultiOffset:  0
Latest checkpoint's oldestXID:        3
Latest checkpoint's oldestXID's DB:   0
Latest checkpoint's oldestActiveXID:  0
Latest checkpoint's oldestMultiXid:   1
Latest checkpoint's oldestMulti's DB: 0
Latest checkpoint's oldestCommitTsXid:0
Latest checkpoint's newestCommitTsXid:0
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        1996
Size of a large-object chunk:         2048
Date/time type storage:               64-bit integers
Float4 argument passing:              by value
Float8 argument passing:              by value
Data page checksum version:           0

Values to be changed:
First log segment after reset:        000000010000000000000009
NextMultiXactId:                      65536
OldestMultiXid:                       1
OldestMulti's DB:                     0
NextMultiOffset:                      52352
NextXID:                              2097152
OldestXID:                            2297064448
OldestXID's DB:                       0
If these values seem acceptable, use -f to force reset.
[postgres@lsl-test1 pg_xact]$ cd ../global/
[postgres@lsl-test1 global]$ ls -l pg_control
-rw-r--r--. 1 postgres postgres 0 May 30 17:36 pg_control


7. 启动数据库

确认控制文件参数无误后加上-f会写入到控制文件里。

[postgres@lsl-test1 global]$ /usr/pgsql-11/bin/pg_resetwal   -l 000000010000000000000009 -m  0x10000,0x00001  -O  0xCC80  -x 0x200000 -D /pg/pg11/data  -f
pg_resetwal: pg_control exists but is broken or wrong version; ignoring it
Write-ahead log reset

启动数据库。

[postgres@lsl-test1 global]$ /usr/pgsql-11/bin/pg_ctl  start -D /pg/pg11/data/
waiting for server to start....2019-05-30 22:42:50.946 CST [2471] LOG:  listening on IPv6 address "::1", port 5432
2019-05-30 22:42:50.946 CST [2471] LOG:  listening on IPv4 address "127.0.0.1", port 5432
2019-05-30 22:42:50.949 CST [2471] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2019-05-30 22:42:50.961 CST [2471] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2019-05-30 22:42:50.987 CST [2471] LOG:  redirecting log output to logging collector process
2019-05-30 22:42:50.987 CST [2471] HINT:  Future log output will appear in directory "log".
done
server started

8. 查看测试数据是否正常,然后插入新数据看数据库是否可用。

[postgres@lsl-test1 global]$ psql
psql (11.2)
Type "help" for help.
postgres=# select min(oid),max(oid),count(*) from lsl_oid1 ;   
  min  |  max   | count 
-------+--------+--------
16400 | 116399 | 100000
(1 row)
postgres=# insert into lsl_oid1 select generate_series(100001,200000);   
INSERT 0 100000
postgres=#  select min(oid),max(oid),count(*) from lsl_oid1 ;   
  min  |  max   | count 
-------+--------+--------
16384 | 116399 | 200000
(1 row)

数据库可以正常访问。

至此,本文关于Oracle和PostgreSQL的控制文件内容介绍到这里。