前面文章提到,这周末帮一个客户测试报错场景:

客户通过duplicate生产备库的方式创建cascade备库。

发现每次都会遇到两个文件报错,ORA-17628: Oracle error 19505错误,且每一次跑,报错文件不一样。 现在想帮客户验证,这属于是正常现象还是bug。

一、问题具体分析

最开始查看相关MOS文档,锁定的是这两篇文章:

  • ORA-17628, ORA-19505 during RMAN DUPLICATE FROM ACTIVE (Doc ID 1331986.1)

One data file is not using OMF name while the rest of the data files are using OMF name.

怀疑是对应数据文件名字的问题,但后来排除了此原因。

因为比对报错数据文件和正常的数据文件并无特殊之处,且每次报错文件并不一样。

另外一篇文档是:

  • Known issues on ORA-17628: Oracle error 19505 returned by remote Oracle server Reported in Rman Active Duplicate (Doc ID 1401333.1)
RMAN-03009: failure of backup command on ORA_DISK_1 channel at 03/28/2011 19:17:43
ORA-17628: Oracle error 19505 returned by remote Oracle server

这篇文档比较全,把已知问题枚举了下,可作为参考。

得到的启示是,不但要看duplicate的日志,还要去看auxiliary数据库的alert日志,有可能会有更详细的日志描述。

另外我重新审视,增加关键字,还匹配到更精确的一篇文档:

  • Duplicate from Active Database Failing with: RMAN-03009, ORA-17628, and ORA-19505 (Doc ID 1439632.1)

结论是说目标路径有重复文件,这...其实上篇文档也有提到。

另外,1401333.1也说了,

  • ORA-17628 / ORA-19505 is a very generic error and we need more debugging to get the exact cause .

这个错误其实是很通用的一个错误号,如果已知问题都不能匹配,那么就需要更多的诊断来找到确切的原因。

也就是又强调了下,我们还要去看比如auxiliary数据库的alert日志等详细信息。

二、解决方案验证

下面按客户要求,想测试当只有部分几个文件报错,是否可以单独备份过去?

这其实算workaround的范畴,我们通常不建议这样来解决一个实际问题。

但客户实在是不想反复重跑这个duplicate,因为库太大了,老这么重跑时间上确实有些伤不起...

所以本着客户关怀角度,也就协助测试下这个可能性:因为这种问题确实不好直接回答是或否,只能构造场景实际测试下:

首先从一级备库duplicate到二级备库会报错:

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of Duplicate Db command at 05/14/2023 23:54:15
RMAN-05501: aborting duplication of target database
RMAN-05531: a mounted database cannot be duplicated while datafiles are fuzzy

先改为从主库duplicate到二级备库,重新跑脚本。

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of Duplicate Db command at 05/14/2023 23:56:57
RMAN-05501: aborting duplication of target database
RMAN-03015: error occurred in stored script Memory Script
RMAN-03009: failure of backup command on prmy1 channel at 05/14/2023 23:56:57
ORA-17628: Oracle error 19505 returned by remote Oracle server

看看!这里有同样的错误号,但是并不是复现问题,因为这个错误号有太多的可能性。

此时正好按上面文档习得的经验,去查看二级备库(auxiliary)的alert日志有没有更详细的信息:

Sun May 14 23:56:56 2023
Errors in file /u01/app/oracle/diag/rdbms/jycas/jingyu/trace/jingyu_ora_6028.trc:
ORA-19505: failed to identify file "/u01/oradata/jingyu/control01.ctl"
ORA-27040: file create error, unable to create file
Linux-x86_64 Error: 2: No such file or directory
Additional information: 1
Sun May 14 23:57:41 2023
destination database instance is 'started' not 'mounted'

果然,是因为一个简单的文件创建失败的错误,应该是对应的文件夹不存在,直接创建即可:

[oracle@db11gcas ~]$ cd /u01/oradata/jingyu
-bash: cd: /u01/oradata/jingyu: No such file or directory
[oracle@db11gcas ~]$ cd /u01/oradata
[oracle@db11gcas oradata]$ ls -ld
drwxr-xr-x. 2 oracle oinstall 6 May 14 23:49 .
[oracle@db11gcas oradata]$ mkdir jingyu
[oracle@db11gcas oradata]$ cd /u01/oradata/jingyu
[oracle@db11gcas jingyu]$

创建对应的文件夹后再次跑脚本成功。

但是为了构造客户的问题,我这里将成功的文件故意删除掉1个。

...
channel prmy1: starting datafile copy
input datafile file number=00004 name=/u01/oradata/jingyu/users01.dbf
output file name=/u01/oradata/JYCAS/datafile/o1_mf_dbs_i_ji_1r1s5uap_.dbf tag=TAG20230515T000118
channel prmy2: datafile copy complete, elapsed time: 00:00:01
output file name=/u01/oradata/JYCAS/datafile/o1_mf_t2_1s1s5uap_.dbf tag=TAG20230515T000118
channel prmy3: datafile copy complete, elapsed time: 00:00:01
output file name=/u01/oradata/JYCAS/datafile/o1_mf_users_1t1s5uaq_.dbf tag=TAG20230515T000118
channel prmy1: datafile copy complete, elapsed time: 00:00:01
Finished backup at 2023-05-15 00:01:31
...

就删除上面这个4号数据文件吧!删除之后肯定数据库是无法open的。

/u01/oradata/JYCAS/datafile/o1_mf_users_1t1s5uaq_.dbf
[oracle@db11gcas datafile]$ rm o1_mf_users_1t1s5uaq_.dbf SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-10458: standby database requires recovery
ORA-01157: cannot identify/lock data file 4 - see DBWR trace file
ORA-01110: data file 4: '/u01/oradata/JYCAS/datafile/o1_mf_users_1t1s5uaq_.dbf'

而且,这个时候开启mrp进程,也是不成功的:

SQL> recover managed standby database disconnect;
Media recovery complete.
SQL> !ps -ef|grep mrp
oracle 7079 6943 0 00:15 pts/1 00:00:00 /bin/bash -c ps -ef|grep mrp
oracle 7081 7079 0 00:15 pts/1 00:00:00 grep mrp

没有成功启动mrp进程,去看alert日志,清楚的告诉你是因为4号文件找不到:

Mon May 15 00:15:04 2023
ALTER DATABASE RECOVER managed standby database disconnect
Attempt to start background Managed Standby Recovery process (jingyu)
Mon May 15 00:15:04 2023
MRP0 started with pid=26, OS id=7056
MRP0: Background Managed Standby Recovery process started (jingyu)
started logmerger process
Mon May 15 00:15:09 2023
Managed Standby Recovery not using Real Time Apply
Mon May 15 00:15:09 2023
Errors in file /u01/app/oracle/diag/rdbms/jycas/jingyu/trace/jingyu_dbw0_5520.trc:
ORA-01157: cannot identify/lock data file 4 - see DBWR trace file
ORA-01110: data file 4: '/u01/oradata/JYCAS/datafile/o1_mf_users_1t1s5uaq_.dbf'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
MRP0: Background Media Recovery terminated with error 1110
Errors in file /u01/app/oracle/diag/rdbms/jycas/jingyu/trace/jingyu_pr00_7062.trc:
ORA-01110: data file 4: '/u01/oradata/JYCAS/datafile/o1_mf_users_1t1s5uaq_.dbf'
ORA-01157: cannot identify/lock data file 4 - see DBWR trace file
ORA-01110: data file 4: '/u01/oradata/JYCAS/datafile/o1_mf_users_1t1s5uaq_.dbf'
Slave exiting with ORA-1110 exception
Errors in file /u01/app/oracle/diag/rdbms/jycas/jingyu/trace/jingyu_pr00_7062.trc:
ORA-01110: data file 4: '/u01/oradata/JYCAS/datafile/o1_mf_users_1t1s5uaq_.dbf'
ORA-01157: cannot identify/lock data file 4 - see DBWR trace file
ORA-01110: data file 4: '/u01/oradata/JYCAS/datafile/o1_mf_users_1t1s5uaq_.dbf'
Completed: ALTER DATABASE RECOVER managed standby database disconnect
Recovery Slave PR00 previously exited with exception 1110
MRP0: Background Media Recovery process shutdown (jingyu)

此时尝试去一级备库备份这个4号数据文件,然后传输到二级备库:

RMAN> backup datafile 4 format '/tmp/dbfile4.bak'; 

Starting backup at 2023-05-15 00:14:27
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=12 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00004 name=/u01/oradata/JYADG/datafile/o1_mf_users_0r1s27bm_.dbf
channel ORA_DISK_1: starting piece 1 at 2023-05-15 00:14:27
channel ORA_DISK_1: finished piece 1 at 2023-05-15 00:14:28
piece handle=/tmp/dbfile4.bak tag=TAG20230515T001427 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 2023-05-15 00:14:28 [oracle@db11gadg ~]$ scp /tmp/dbfile4.bak 192.168.1.20:/tmp
oracle@192.168.1.20's password:
dbfile4.bak 100% 1344KB 80.2MB/s 00:00
[oracle@db11gadg ~]$

然后在二级备库恢复这个缺失的4号数据文件:

RMAN> list backup of datafile 4;

using target database control file instead of recovery catalog
specification does not match any backup in the repository RMAN> catalog start with '/tmp/dbfile4.bak'; searching for all files that match the pattern /tmp/dbfile4.bak List of Files Unknown to the Database
=====================================
File Name: /tmp/dbfile4.bak Do you really want to catalog the above files (enter YES or NO)? yes
cataloging files...
cataloging done List of Cataloged Files
=======================
File Name: /tmp/dbfile4.bak RMAN> restore datafile 4; Starting restore at 2023-05-15 00:22:12
using channel ORA_DISK_1 channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00004 to /u01/oradata/JYCAS/datafile/o1_mf_users_1t1s5uaq_.dbf
channel ORA_DISK_1: reading from backup piece /tmp/dbfile4.bak
channel ORA_DISK_1: piece handle=/tmp/dbfile4.bak tag=TAG20230515T001427
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01
Finished restore at 2023-05-15 00:22:14 RMAN>

此时查看alert日志,也是正常恢复:

Mon May 15 00:22:13 2023
Full restore complete of datafile 4 /u01/oradata/JYCAS/datafile/o1_mf_users_l622onyr_.dbf. Elapsed time: 0:00:00
checkpoint is 5941162
last deallocation scn is 3

需要注意下因为使用OMF,恢复的4号数据文件跟之前的名字不一样了!但是不影响。

现在再尝试启动mrp进程恢复就OK了。

SQL> select file#, CHECKPOINT_CHANGE#, STATUS from v$datafile;

     FILE# CHECKPOINT_CHANGE# STATUS
---------- ------------------ -------
1 5941162 SYSTEM
2 5941162 ONLINE
3 5941162 ONLINE
4 5941162 ONLINE
5 5941162 ONLINE
6 5941162 ONLINE
7 5541098 ONLINE
8 5541108 ONLINE 8 rows selected. SQL> recover managed standby database disconnect;
Media recovery complete.
SQL> !ps -ef|grep mrp
oracle 7687 1 0 00:26 ? 00:00:00 ora_mrp0_jingyu
oracle 7713 7606 0 00:27 pts/1 00:00:00 /bin/bash -c ps -ef|grep mrp
oracle 7715 7713 0 00:27 pts/1 00:00:00 grep mrp SQL> select file#, CHECKPOINT_CHANGE#, STATUS from v$datafile; FILE# CHECKPOINT_CHANGE# STATUS
---------- ------------------ -------
1 5942030 SYSTEM
2 5942030 ONLINE
3 5942030 ONLINE
4 5942030 ONLINE
5 5942030 ONLINE
6 5942030 ONLINE
7 5541098 ONLINE
8 5541108 ONLINE 8 rows selected. SQL> @dg NAME VALUE UNIT TIME_COMPUTED DATUM_TIME
---------------------- ---------------------- ------------------------------ ------------------------------ ------------------------------
transport lag +00 00:17:22 day(2) to second(0) interval 05/15/2023 00:28:48 05/15/2023 00:27:53
apply lag day(2) to second(0) interval 05/15/2023 00:28:48
apply finish time day(2) to second(3) interval 05/15/2023 00:28:48
estimated startup time 9 second 05/15/2023 00:28:48

此时主库切换一下日志,再次查询级联备库状态:

SQL> @dg

NAME                   VALUE                  UNIT                           TIME_COMPUTED                  DATUM_TIME
---------------------- ---------------------- ------------------------------ ------------------------------ ------------------------------
transport lag +00 00:00:00 day(2) to second(0) interval 05/15/2023 00:29:14 05/15/2023 00:29:05
apply lag +00 00:00:00 day(2) to second(0) interval 05/15/2023 00:29:14 05/15/2023 00:29:05
apply finish time day(2) to second(3) interval 05/15/2023 00:29:14
estimated startup time 9 second 05/15/2023 00:29:14 SQL> select file#, CHECKPOINT_CHANGE#, STATUS from v$datafile; FILE# CHECKPOINT_CHANGE# STATUS
---------- ------------------ -------
1 5943460 SYSTEM
2 5943460 ONLINE
3 5943460 ONLINE
4 5943460 ONLINE
5 5943460 ONLINE
6 5943460 ONLINE
7 5541098 ONLINE
8 5541108 ONLINE 8 rows selected.
SQL> @dg NAME VALUE UNIT TIME_COMPUTED DATUM_TIME
---------------------- ---------------------- ------------------------------ ------------------------------ ------------------------------
transport lag +00 00:00:48 day(2) to second(0) interval 05/15/2023 00:29:56 05/15/2023 00:29:54
apply lag +00 00:00:48 day(2) to second(0) interval 05/15/2023 00:29:56 05/15/2023 00:29:54
apply finish time day(2) to second(3) interval 05/15/2023 00:29:56
estimated startup time 9 second 05/15/2023 00:29:56

说明这个恢复缺失数据文件再开启mrp进程的方案可行。

但出于从根本解决问题的初衷,我们还是会建议找到具体原因重新跑的方式来恢复。

问题:Duplicate报错RMAN-03009, ORA-17628, ORA-19505的更多相关文章

  1. DG duplicate报错:RMAN-05001:auxiliary file name /u01/app/oracle/oradata/fratbs01.dbf conflicts with a file used by the target database

    问题:rman duplicate时报错: RMAN-05001:auxiliary file name /u01/app/oracle/oradata/fratbs01.dbf conflicts ...

  2. 添加spring-boot-devtools热部署报错

    使用的eclipse部署的spring boot,百度了下,大部分的问题都是说IDE工具使用热部署无法成功的解决方案,看了很懵逼 <!-- 热部署模块 --> <dependency ...

  3. 转 rman 恢复报错

    ###sample 1   原因是新库起了FRA 区,FRA 区的旧文件属于之前的归档日志文件产生,这样会导致无法识别的问题. 解决办法,清空FRA或者恢复时候不启用FRA. RMAN RESTORE ...

  4. [问题]编译报错:clang: error: linker command failed with exit code 1及duplicate symbol xxxx in错误解决方法之一

    今天添加了一个新类(包括m,h,xib文件),还没有调用,-编译遇到如下错误,根据错误提示, duplicate symbol param1 in: /Users/xxxx/Library/Devel ...

  5. Oracle 12.2 报错:ORA-12012: error on auto execute of job "SYS"."ORA$AT_OS_OPT_SY_7458"

    alert报错 2019-01-12T10:10:11.499130+08:00Errors in file /u01/app/oracle/diag/rdbms/rac1/rac112/trace/ ...

  6. Oracle数据库误删文件导致rman备份报错RMAN-06169解决办法

    Oracle数据库误删文件导致rman备份报错RMAN-06169解决办法 可能是误删文件导致在使用rman备份时候出现以下提示 RMAN-06169: could not read file hea ...

  7. rman备份报错,全zero错误处理一例(bbed)

    问题:某数据库在执行rman全备的时候,发现alert日志中有报错,报错提示, file 10,block 305076全部为zero,内容全零,处理过程如下 分析处理: 1. 这个问题可能是 系统或 ...

  8. [报错]编译报错:clang: error: linker command failed with exit code 1及duplicate symbol xxxx in错误解决方法之一

    今天添加了一个新类(包括m,h,xib文件),还没有调用,—编译遇到如下错误,根据错误提示, duplicate symbol param1 in: /Users/xxxx/Library/Devel ...

  9. RDS 在线DDL诡异报错ERROR 1062 (23000): Duplicate entry

    RDS上执行报错如下: MySQL [ad_billing]> ALTER TABLE ad_spending ADD COLUMN impr bigint(20) NOT NULL DEFAU ...

  10. Aptana Studion出现 duplicate location重复定位报错

    1.下载SVN的时候出现报错 duplicate location 2.点击“available software sites”查看已可用的软件网站 3.在这里可以查看到SVN,勾选SVN复选框,点击 ...

随机推荐

  1. 打破监控壁垒,棉花厂3D可视化建设让生产加工更加智能化

    前言 现在的棉花加工行业还停留在传统的反应式维护模式当中,当棉花加下厂的设备突然出现故障时,控制程序需要更换.这种情况下,首先需要客户向设备生产厂家请求派出技术人员进行维护,然后生产厂家才能根据情况再 ...

  2. el-menu菜单过长,显示不全问题

  3. 机器学习-无监督机器学习-kmeans衍生的算法-18

    目录 1. k-Medoids 2. 二分KMEANS 3. KMeans++ 4. elkan KMeans 5. min batch KMeans算法 6.小结: 1. k-Medoids 之前的 ...

  4. 第1篇 numpy 语法

    import numpy as np A = np.array([ [1, 2, 3, 4], [5, 6, 7, 8], ], dtype=int) # dtype指定数据类型int float p ...

  5. python · ssh · SQL | python 连接远程 SQL 数据库

    python 连接本地 SQL 的 教程存档. 如果要连接远程的 SQL 数据库,需要先开一个 ssh 连接,在 ssh 连接里写 pymysql 的 connect 代码. 代码如下: ''' pi ...

  6. 程序&命名-执行环境

    开发程序执行环境 系统级别 -- 编译器或解释器 程序级别 -- 命令行参数.配置文件 执行级别 -- 进程.线程.协程运行时上下文(树(命名空间 -- 函数-局部变量.包或模块-全局变量)) 命令行 ...

  7. [转帖]TiKV 缩容不掉如何解决?

    TiKV节点缩容不掉,通常遇到的情况: 1.经常遇到的情况是:3个节点的tikv集群缩容肯定会一直卡着,因为没有新节点接受要下线kv的region peer. 2.另外就是除缩容tikv外,剩下的KV ...

  8. Tidb 使用minio 进行br备份数据库的过程

    Tidb 使用minio 进行br备份数据库的过程 背景 br 备份恢复时一般需要共享存储. 前段时间一直使用的是nfs 进行共享文件备份. 这样需要所有的机器在 相同的目录下面挂载相同的nfs. 并 ...

  9. [转帖]decimal,float和double的区别是什么?

    https://zhuanlan.zhihu.com/p/352503879 今天复习mysql理论知识,在看常用数据类型的时候发现float和decimal类型都是表示小数,就展开搜索学习了一下区别 ...

  10. [转帖]金仓数据库KingbaseES数据目录结构

    KingbaseES数据库结构 [kingbase@postgresV8]$ tree -LP2data/ . ├── data │ ├── base # 存储用户创建的数据库文件及隶属于用户数据库的 ...