11G RAC 11.2.0.1.0实例evict故障处理

Aix 7.1

参考文档：

https://blogs.oracle.com/database4cn/rac

Resolving ORA-481 and "terminating the instance due to error 481" (Doc ID 1950963.1)

ORA-00481 After "The instance eviction reason is 0x2" due to Lack of Ticket (Doc ID 1644015.1)

归根结底是RAC节点之间消息传输流量控制的问题，除网络、硬件原因外，需要打补丁解决。

以下为第一次处理前具体的alertlog。

◆node1

Wed May 24 01:59:24 2017

Remote instance 2 kill is issued with system inc 228

LMON received an instance eviction notification from instance 1

The instance eviction reason is 0x2

The instance eviction map is 2

Reconfiguration started (old inc 228, new inc 230)

List of instances:

1 (myinst: 1)

Wed May 24 01:59:27 2017

Trace dumping is performing id=[cdmp_20170524015904]

Global Resource Directory frozen

* dead instance detected - domain 0 invalid = TRUE

Communication channels reestablished

Master broadcasted resource hash value bitmaps

Non-local Process blocks cleaned out

Wed May 24 01:59:28 2017

LMS 1: 14 GCS shadows cancelled, 1 closed, 0 Xw survived

Wed May 24 01:59:28 2017

LMS 0: 23 GCS shadows cancelled, 0 closed, 0 Xw survived

Wed May 24 01:59:28 2017

LMS 2: 21 GCS shadows cancelled, 0 closed, 0 Xw survived

Set master node info

Submitted all remote-enqueue requests

Dwn-cvts replayed, VALBLKs dubious

All grantable enqueues granted

Post SMON to start 1st pass IR

Wed May 24 01:59:32 2017

Instance recovery: looking for dead threads

Beginning instance recovery of 1 threads

Wed May 24 01:59:48 2017

parallel recovery started with 32 processes

Started redo scan

Wed May 24 01:59:49 2017

Submitted all GCS remote-cache requests

Post SMON to start 1st pass IR

Fix write in gcs resources

Reconfiguration complete

Wed May 24 02:00:00 2017

Completed redo scan

read 2094379 KB redo, 237345 data blocks need recovery

Wed May 24 02:00:03 2017

Reconfiguration started (old inc 230, new inc 232)

List of instances:

1 2 (myinst: 1)

Global Resource Directory frozen

Communication channels reestablished

Master broadcasted resource hash value bitmaps

Non-local Process blocks cleaned out

Wed May 24 02:00:04 2017

LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived

Wed May 24 02:00:04 2017

Wed May 24 02:00:04 2017

LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived

LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived

Set master node info

Submitted all remote-enqueue requests

Dwn-cvts replayed, VALBLKs dubious

All grantable enqueues granted

Post SMON to start 1st pass IR

Wed May 24 02:02:27 2017

，

，此处大量12170

，

Wed May 24 02:05:34 2017

LMON (ospid: 35979900): terminating the instance due to error 481

Wed May 24 02:05:34 2017

System state dump is made for local instance

System State dumped to trace file /u01/app/11.2.0/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_diag_40894656.trc

Instance terminated by LMON, pid = 35979900

Wed May 24 08:12:27 2017

Starting ORACLE instance (normal)

sskgpgetexecname failed to get name

LICENSE_MAX_SESSION = 0

LICENSE_SESSIONS_WARNING = 0

Interface type 1 en1 192.168.0.0 configured from GPnP Profile for use as a cluster interconnect

Interface type 1 en0 10.209.199.0 configured from GPnP Profile for use as a public interface

Picked latch-free SCN scheme 3

Using LOG_ARCHIVE_DEST_1 parameter default value as USE_DB_RECOVERY_FILE_DEST

Autotune of undo retention is turned on.

LICENSE_MAX_USERS = 0

SYS auditing is disabled

Starting up:

Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production

With the Partitioning, Real Application Clusters, OLAP, Data Mining

and Real Application Testing options.

Using parameter settings in server-side pfile /u01/app/11.2.0/oracle/product/11.2.0/db_1/dbs/initorcl1.ora

System parameters with non-default values:

processes = 2000

timed_statistics = TRUE

sga_max_size = 70G

spfile = "+DATA1/orcl/spfileorcl.ora"

sga_target = 50G

control_files = "+DATA1/orcl/controlfile/current.261.832429113"

control_files = "+ARCDG/orcl/controlfile/current.259.832429113"

db_block_size = 8192

compatible = "11.2.0.0.0"

log_archive_format = "%t_%s_%r.dbf"

cluster_database = TRUE

db_create_file_dest = "+DATA1"

db_recovery_file_dest = "+ARCDG"

db_recovery_file_dest_size= 2048G

thread = 1

undo_tablespace = "UNDOTBS1"

undo_retention = 10800

instance_number = 1

remote_login_passwordfile= "NONE"

db_domain = ""

service_names = "orcl"

dispatchers = "(PROTOCOL=TCP) (SERVICE=orclXDB)"

local_listener = "(ADDRESS = (PROTOCOL = TCP)(HOST = 10.209.199.4)(PORT = 1521))"

remote_listener = "rac-scan:1521"

result_cache_max_size = 268736K

audit_file_dest = "/u01/app/11.2.0/oracle/admin/orcl/adump"

audit_trail = "DB"

db_name = "orcl"

open_cursors = 1000

sql_trace = FALSE

optimizer_index_caching = 90

pga_aggregate_target = 20G

deferred_segment_creation= FALSE

aq_tm_processes = 5

diagnostic_dest = "/u01/app/11.2.0/oracle"

Deprecated system parameters with specified values:

sql_trace

End of deprecated system parameter listing

Cluster communication is configured to use the following interface(s) for this instance

192.168.0.1

cluster interconnect IPC version:Oracle UDP/IP (generic)

IPC Vendor 1 proto 2

Wed May 24 08:12:32 2017

PMON started with pid=2, OS id=35652038

Wed May 24 08:12:32 2017

VKTM started with pid=3, OS id=33489120 at elevated priority

VKTM running at (10)millisec precision with DBRM quantum (100)ms

Wed May 24 08:12:32 2017

GEN0 started with pid=4, OS id=34209898

Wed May 24 08:12:32 2017

DIAG started with pid=5, OS id=36438538

Wed May 24 08:12:33 2017

DBRM started with pid=6, OS id=24576446

Wed May 24 08:12:33 2017

PING started with pid=7, OS id=29687920

Wed May 24 08:12:33 2017

PSP0 started with pid=8, OS id=36635296

Wed May 24 08:12:33 2017

ACMS started with pid=9, OS id=34341462

Wed May 24 08:12:33 2017

DIA0 started with pid=10, OS id=32047870

Wed May 24 08:12:33 2017

LMON started with pid=11, OS id=36045304

Wed May 24 08:12:35 2017

LMD0 started with pid=12, OS id=31391860

Wed May 24 08:12:35 2017

LMS0 started with pid=13, OS id=35979918 at elevated priority

Wed May 24 08:12:35 2017

LMS1 started with pid=14, OS id=22741294 at elevated priority

Wed May 24 08:12:36 2017

LMS2 started with pid=15, OS id=29229354 at elevated priority

Wed May 24 08:12:36 2017

RMS0 started with pid=16, OS id=20185780

Wed May 24 08:12:36 2017

LMHB started with pid=17, OS id=29425998

Wed May 24 08:12:36 2017

MMAN started with pid=18, OS id=9699954

Wed May 24 08:12:36 2017

DBW0 started with pid=19, OS id=31850686

Wed May 24 08:12:36 2017

DBW1 started with pid=20, OS id=36045442

Wed May 24 08:12:36 2017

DBW2 started with pid=21, OS id=40894536

Wed May 24 08:12:36 2017

DBW3 started with pid=22, OS id=29819564

Wed May 24 08:12:36 2017

DBW4 started with pid=23, OS id=36634890

Wed May 24 08:12:36 2017

LGWR started with pid=24, OS id=35783370

Wed May 24 08:12:37 2017

CKPT started with pid=25, OS id=35914258

Wed May 24 08:12:37 2017

SMON started with pid=26, OS id=32374796

Wed May 24 08:12:37 2017

RECO started with pid=27, OS id=35652280

Wed May 24 08:12:37 2017

RBAL started with pid=28, OS id=36438322

Wed May 24 08:12:37 2017

ASMB started with pid=29, OS id=39387144

Wed May 24 08:12:37 2017

MMON started with pid=30, OS id=2621900

NOTE: initiating MARK startup

Wed May 24 08:12:37 2017

Starting background process MARKMMNL started with pid=31, OS id=35979732

starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...

Wed May 24 08:12:37 2017

MARK started with pid=32, OS id=35717688

NOTE: MARK has subscribed

starting up 1 shared server(s) ...

lmon registered with NM - instance number 1 (internal mem no 0)

Reconfiguration started (old inc 0, new inc 236)

List of instances:

1 2 (myinst: 1)

Global Resource Directory frozen

* allocate domain 0, invalid = TRUE

Communication channels reestablished

* domain 0 valid according to instance 2

* domain 0 valid = 1 according to instance 2

Master broadcasted resource hash value bitmaps

Non-local Process blocks cleaned out

LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived

LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived

LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived

Set master node info

Submitted all remote-enqueue requests

Dwn-cvts replayed, VALBLKs dubious

All grantable enqueues granted

Wed May 24 08:12:45 2017

Submitted all GCS remote-cache requests

Fix write in gcs resources

Reconfiguration complete

Wed May 24 08:12:52 2017

LCK0 started with pid=34, OS id=24379778

Wed May 24 08:12:52 2017

Starting background process RSMN

Wed May 24 08:12:52 2017

RSMN started with pid=36, OS id=30802026

ORACLE_BASE from environment = /u01/app/11.2.0/oracle

Wed May 24 08:12:53 2017

ALTER DATABASE MOUNT

Wed May 24 08:12:53 2017

NOTE: Loaded library: System

Wed May 24 08:12:53 2017

SUCCESS: diskgroup DATA1 was mounted

SUCCESS: diskgroup ARCDG was mounted

Wed May 24 08:12:53 2017

NOTE: dependency between database orcl and diskgroup resource ora.DATA1.dg is established

NOTE: dependency between database orcl and diskgroup resource ora.ARCDG.dg is established

Wed May 24 08:12:57 2017

Successful mount of redo thread 1, with mount id 1472569957

Database mounted in Shared Mode (CLUSTER_DATABASE=TRUE)

Lost write protection disabled

Completed: ALTER DATABASE MOUNT

Wed May 24 08:12:58 2017

ALTER DATABASE OPEN

Block change tracking file is current.

Picked broadcast on commit scheme to generate SCNs

Wed May 24 08:12:58 2017

SUCCESS: diskgroup DATA2 was mounted

NOTE: dependency between database orcl and diskgroup resource ora.DATA2.dg is established

SUCCESS: diskgroup DATA3 was mounted

NOTE: dependency between database orcl and diskgroup resource ora.DATA3.dg is established

Thread 1 advanced to log sequence 255944 (thread open)

Thread 1 opened at log sequence 255944

Current log# 2 seq# 255944 mem# 0: +ARCDG/orcl/onlinelog/group_2.261.840376035

Current log# 2 seq# 255944 mem# 1: +ARCDG/orcl/onlinelog/group_2.2113.840376039

Current log# 2 seq# 255944 mem# 2: +ARCDG/orcl/onlinelog/group_2.2114.840376041

Current log# 2 seq# 255944 mem# 3: +ARCDG/orcl/onlinelog/group_2.2116.840376043

Successful open of redo thread 1

MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set

Starting background process CTWR

Wed May 24 08:12:59 2017

CTWR started with pid=39, OS id=32768642

Block change tracking service is active.

Wed May 24 08:13:00 2017

SMON: enabling cache recovery

Successfully onlined Undo Tablespace 2.

Verifying file header compatibility for 11g tablespace encryption..

Verifying 11g file header compatibility for tablespace encryption completed

SMON: enabling tx recovery

Database Characterset is AL32UTF8

No Resource Manager plan active

Starting background process GTX0

Wed May 24 08:13:07 2017

GTX0 started with pid=42, OS id=35062526

Starting background process RCBG

Wed May 24 08:13:07 2017

RCBG started with pid=43, OS id=27721944

replication_dependency_tracking turned off (no async multimaster replication found)

Wed May 24 08:13:08 2017

Starting background process QMNC

Wed May 24 08:13:08 2017

QMNC started with pid=44, OS id=34013894

Completed: ALTER DATABASE OPEN

Wed May 24 08:13:17 2017

Starting background process CJQ0

Wed May 24 08:13:17 2017

CJQ0 started with pid=41, OS id=11469004

Wed May 24 08:13:43 2017

Starting background process SMCO

Wed May 24 08:13:43 2017

SMCO started with pid=120, OS id=42336480

，

，

，

Wed May 24 09:08:16 2017

Errors in file /u01/app/11.2.0/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_j001_33161436.trc:

ORA-12012: 鑎仟錨狮鎊骚琛^啼絕茕竈239 錨呛閊証

ORA-00001: 杩^蒎^蟐湾^辕涓^犁害鎊較哝^荨浠(CM.UK_LTE_ZY_CHECK_HISTORY)

ORA-06512: 錨塄 "CM.JOB_LTE_ZY_CHECK", line 27

ORA-06512: 錨塄 line 1

◆node2

Wed May 24 01:59:01 2017

LMS1 (ospid: 26411086) received an instance eviction notification from instance 1 [2]

Wed May 24 01:59:01 2017

LMON received an instance eviction notification from instance 1

The instance eviction reason is 0x2

The instance eviction map is 2

Wed May 24 01:59:04 2017

PMON (ospid: 50725354): terminating the instance due to error 481

Wed May 24 01:59:04 2017

System state dump is made for local instance

System State dumped to trace file /u01/app/11.2.0/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_diag_42336686.trc

Wed May 24 01:59:04 2017

ORA-1092 : opitsk aborting process

Wed May 24 01:59:04 2017

License high water mark = 516

Instance terminated by PMON, pid = 50725354

USER (ospid: 50987430): terminating the instance

Instance terminated by USER, pid = 50987430

Wed May 24 01:59:29 2017

Starting ORACLE instance (normal)

sskgpgetexecname failed to get name

LICENSE_MAX_SESSION = 0

LICENSE_SESSIONS_WARNING = 0

Interface type 1 en1 192.168.0.0 configured from GPnP Profile for use as a cluster interconnect

Interface type 1 en0 10.209.199.0 configured from GPnP Profile for use as a public interface

Picked latch-free SCN scheme 3

Using LOG_ARCHIVE_DEST_1 parameter default value as USE_DB_RECOVERY_FILE_DEST

Autotune of undo retention is turned on.

LICENSE_MAX_USERS = 0

SYS auditing is disabled

Starting up:

Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production

With the Partitioning, Real Application Clusters, OLAP, Data Mining

and Real Application Testing options.

Using parameter settings in server-side pfile /u01/app/11.2.0/oracle/product/11.2.0/db_1/dbs/initorcl2.ora

System parameters with non-default values:

processes = 2000

timed_statistics = TRUE

sga_max_size = 70G

spfile = "+DATA1/orcl/spfileorcl.ora"

sga_target = 50G

control_files = "+DATA1/orcl/controlfile/current.261.832429113"

control_files = "+ARCDG/orcl/controlfile/current.259.832429113"

db_block_size = 8192

compatible = "11.2.0.0.0"

log_archive_format = "%t_%s_%r.dbf"

cluster_database = TRUE

db_create_file_dest = "+DATA1"

db_recovery_file_dest = "+ARCDG"

db_recovery_file_dest_size= 2048G

thread = 2

undo_tablespace = "UNDOTBS2"

undo_retention = 10800

instance_number = 2

remote_login_passwordfile= "NONE"

db_domain = ""

dispatchers = "(PROTOCOL=TCP) (SERVICE=orclXDB)"

local_listener = "(ADDRESS = (PROTOCOL = TCP)(HOST = 10.209.199.5)(PORT = 1521))"

remote_listener = "rac-scan:1521"

result_cache_max_size = 268736K

audit_file_dest = "/u01/app/11.2.0/oracle/admin/orcl/adump"

audit_trail = "DB"

db_name = "orcl"

open_cursors = 1000

sql_trace = FALSE

optimizer_index_caching = 90

pga_aggregate_target = 20G

deferred_segment_creation= FALSE

aq_tm_processes = 5

diagnostic_dest = "/u01/app/11.2.0/oracle"

Deprecated system parameters with specified values:

sql_trace

End of deprecated system parameter listing

Cluster communication is configured to use the following interface(s) for this instance

192.168.0.2

cluster interconnect IPC version:Oracle UDP/IP (generic)

IPC Vendor 1 proto 2

Wed May 24 01:59:34 2017

PMON started with pid=2, OS id=66584656

Wed May 24 01:59:34 2017

VKTM started with pid=3, OS id=66846744 at elevated priority

VKTM running at (10)millisec precision with DBRM quantum (100)ms

Wed May 24 01:59:34 2017

GEN0 started with pid=4, OS id=26608090

Wed May 24 01:59:34 2017

DIAG started with pid=5, OS id=26083716

Wed May 24 01:59:34 2017

DBRM started with pid=6, OS id=24510972

Wed May 24 01:59:34 2017

PING started with pid=7, OS id=65077306

Wed May 24 01:59:34 2017

PSP0 started with pid=8, OS id=66781402

Wed May 24 01:59:34 2017

ACMS started with pid=9, OS id=66978040

Wed May 24 01:59:34 2017

DIA0 started with pid=10, OS id=66519050

Wed May 24 01:59:34 2017

LMON started with pid=11, OS id=66453694

Wed May 24 01:59:37 2017

LMD0 started with pid=12, OS id=23658758

Wed May 24 01:59:37 2017

LMS0 started with pid=13, OS id=66322614 at elevated priority

Wed May 24 01:59:37 2017

LMS1 started with pid=14, OS id=65798210 at elevated priority

Wed May 24 01:59:37 2017

LMS2 started with pid=15, OS id=15204812 at elevated priority

Wed May 24 01:59:38 2017

RMS0 started with pid=16, OS id=65732820

Wed May 24 01:59:38 2017

LMHB started with pid=17, OS id=65339438

Wed May 24 01:59:38 2017

MMAN started with pid=18, OS id=52298036

Wed May 24 01:59:38 2017

DBW0 started with pid=19, OS id=52232458

Wed May 24 01:59:38 2017

DBW1 started with pid=20, OS id=65273872

Wed May 24 01:59:38 2017

DBW2 started with pid=21, OS id=65208354

Wed May 24 01:59:38 2017

DBW3 started with pid=22, OS id=65994980

Wed May 24 01:59:38 2017

DBW4 started with pid=23, OS id=61210752

Wed May 24 01:59:38 2017

LGWR started with pid=24, OS id=6095298

Wed May 24 01:59:38 2017

CKPT started with pid=25, OS id=65470648

Wed May 24 01:59:38 2017

SMON started with pid=26, OS id=64946374

Wed May 24 01:59:38 2017

RECO started with pid=27, OS id=51642854

Wed May 24 01:59:39 2017

RBAL started with pid=28, OS id=64618728

Wed May 24 01:59:39 2017

ASMB started with pid=29, OS id=64553208

Wed May 24 01:59:39 2017

MMON started with pid=30, OS id=64422054

NOTE: initiating MARK startup

Wed May 24 01:59:39 2017

MMNL started with pid=31, OS id=42336716

Starting background process MARK

starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...

Wed May 24 01:59:39 2017

MARK started with pid=32, OS id=51577168

NOTE: MARK has subscribed

Wed May 24 01:59:39 2017

starting up 1 shared server(s) ...

lmon registered with NM - instance number 2 (internal mem no 1)

Reconfiguration started (old inc 0, new inc 232)

List of instances:

1 2 (myinst: 2)

Global Resource Directory frozen

* allocate domain 0, invalid = TRUE

Communication channels reestablished

* domain 0 valid = 0 according to instance 1

Master broadcasted resource hash value bitmaps

Non-local Process blocks cleaned out

LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived

LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived

LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived

Set master node info

Submitted all remote-enqueue requests

Dwn-cvts replayed, VALBLKs dubious

All grantable enqueues granted

Wed May 24 01:59:48 2017

Submitted all GCS remote-cache requests

Wed May 24 02:05:10 2017

Trace dumping is performing id=[cdmp_20170524020534]

Wed May 24 02:05:18 2017

Reconfiguration started (old inc 232, new inc 234)

List of instances:

2 (myinst: 2)

Nested reconfiguration detected.

Global Resource Directory frozen

* dead instance detected - domain 0 invalid = TRUE

Communication channels reestablished

Master broadcasted resource hash value bitmaps

Non-local Process blocks cleaned out

Wed May 24 02:05:19 2017

LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived

Wed May 24 02:05:19 2017

LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived

Wed May 24 02:05:19 2017

LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived

Set master node info

Submitted all remote-enqueue requests

Dwn-cvts replayed, VALBLKs dubious

All grantable enqueues granted

Post SMON to start 1st pass IR

Submitted all GCS remote-cache requests

Post SMON to start 1st pass IR

Fix write in gcs resources

Reconfiguration complete

Wed May 24 02:05:25 2017

LCK0 started with pid=34, OS id=51118396

Wed May 24 02:05:25 2017

Starting background process RSMN

Wed May 24 02:05:25 2017

RSMN started with pid=36, OS id=51773786

ORACLE_BASE not set in environment. It is recommended

that ORACLE_BASE be set in the environment

Reusing ORACLE_BASE from an earlier startup = /u01/app/11.2.0/oracle

Wed May 24 02:05:25 2017

ALTER DATABASE MOUNT

This instance was first to mount

Wed May 24 02:05:26 2017

NOTE: Loaded library: System

Wed May 24 02:05:26 2017

SUCCESS: diskgroup DATA1 was mounted

Wed May 24 02:05:26 2017

NOTE: dependency between database orcl and diskgroup resource ora.DATA1.dg is established

SUCCESS: diskgroup ARCDG was mounted

NOTE: dependency between database orcl and diskgroup resource ora.ARCDG.dg is established

Wed May 24 02:05:30 2017

Successful mount of redo thread 2, with mount id 1472569957

Database mounted in Shared Mode (CLUSTER_DATABASE=TRUE)

Lost write protection disabled

Completed: ALTER DATABASE MOUNT

ALTER DATABASE OPEN

This instance was first to open

Wed May 24 02:05:30 2017

SUCCESS: diskgroup DATA2 was mounted

NOTE: dependency between database orcl and diskgroup resource ora.DATA2.dg is established

SUCCESS: diskgroup DATA3 was mounted

NOTE: dependency between database orcl and diskgroup resource ora.DATA3.dg is established

Block change tracking file is current.

Beginning crash recovery of 2 threads

parallel recovery started with 32 processes

Started redo scan

Wed May 24 02:05:44 2017

Completed redo scan

read 2158203 KB redo, 230150 data blocks need recovery

Started redo application at

Thread 1: logseq 255942, block 163388

Thread 2: logseq 236286, block 2067920

Recovery of Online Redo Log: Thread 1 Group 3 Seq 255942 Reading mem 0

Mem# 0: +ARCDG/orcl/onlinelog/group_3.2085.840375427

Mem# 1: +ARCDG/orcl/onlinelog/group_3.2084.840375445

Mem# 2: +ARCDG/orcl/onlinelog/group_3.263.840375447

Mem# 3: +ARCDG/orcl/onlinelog/group_3.2089.840375449

Recovery of Online Redo Log: Thread 2 Group 7 Seq 236286 Reading mem 0

Mem# 0: +ARCDG/orcl/onlinelog/group_7.2049.840374587

Mem# 1: +ARCDG/orcl/onlinelog/group_7.2052.840374591

Mem# 2: +ARCDG/orcl/onlinelog/group_7.2057.840374593

Mem# 3: +ARCDG/orcl/onlinelog/group_7.2058.840374597

Recovery of Online Redo Log: Thread 2 Group 5 Seq 236287 Reading mem 0

Mem# 0: +ARCDG/orcl/onlinelog/group_5.2025.840374445

Mem# 1: +ARCDG/orcl/onlinelog/group_5.2026.840374483

Mem# 2: +ARCDG/orcl/onlinelog/group_5.2029.840374489

Mem# 3: +ARCDG/orcl/onlinelog/group_5.2034.840374493

Wed May 24 02:06:05 2017

Completed redo application of 1702.07MB

Completed crash recovery at

Thread 1: logseq 255942, block 291036, scn 12495715727030

Thread 2: logseq 236287, block 2742038, scn 12495715600520

230150 data blocks read, 229989 data blocks written, 2158203 redo k-bytes read

Thread 1 advanced to log sequence 255943 (thread recovery)

Picked broadcast on commit scheme to generate SCNs

Wed May 24 02:06:06 2017

Thread 2 advanced to log sequence 236288 (thread open)

Thread 2 opened at log sequence 236288

Current log# 6 seq# 236288 mem# 0: +ARCDG/orcl/onlinelog/group_6.2040.840374575

Current log# 6 seq# 236288 mem# 1: +ARCDG/orcl/onlinelog/group_6.2041.840374579

Current log# 6 seq# 236288 mem# 2: +ARCDG/orcl/onlinelog/group_6.2042.840374581

Current log# 6 seq# 236288 mem# 3: +ARCDG/orcl/onlinelog/group_6.2048.840374585

Successful open of redo thread 2

MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set

Starting background process CTWR

Wed May 24 02:06:07 2017

CTWR started with pid=89, OS id=49349114

Block change tracking service is active.

Wed May 24 02:06:07 2017

SMON: enabling cache recovery

Successfully onlined Undo Tablespace 4.

Verifying file header compatibility for 11g tablespace encryption..

Verifying 11g file header compatibility for tablespace encryption completed

SMON: enabling tx recovery

Database Characterset is AL32UTF8

No Resource Manager plan active

Starting background process GTX0

Wed May 24 02:06:11 2017

GTX0 started with pid=98, OS id=42992042

Starting background process RCBG

Wed May 24 02:06:11 2017

RCBG started with pid=52, OS id=40370648

replication_dependency_tracking turned off (no async multimaster replication found)

Starting background process QMNC

Wed May 24 02:06:13 2017

QMNC started with pid=119, OS id=47514086

Completed: ALTER DATABASE OPEN

SMON: Parallel transaction recovery tried

Starting background process SMCO

Wed May 24 02:06:17 2017

SMCO started with pid=153, OS id=50724962

Wed May 24 02:06:17 2017

db_recovery_file_dest_size of 2097152 MB is 4.35% used. This is a

user-specified limit on the amount of space that will be used by this

database for recovery-related files, and does not reflect the amount of

space available in the underlying filesystem or ASM diskgroup.

Wed May 24 02:06:19 2017

Starting background process CJQ0

Wed May 24 02:06:19 2017

CJQ0 started with pid=46, OS id=43581760

Wed May 24 02:07:35 2017

Thread 2 advanced to log sequence 236289 (LGWR switch)

Current log# 7 seq# 236289 mem# 0: +ARCDG/orcl/onlinelog/group_7.2049.840374587

Current log# 7 seq# 236289 mem# 1: +ARCDG/orcl/onlinelog/group_7.2052.840374591

Current log# 7 seq# 236289 mem# 2: +ARCDG/orcl/onlinelog/group_7.2057.840374593

Current log# 7 seq# 236289 mem# 3: +ARCDG/orcl/onlinelog/group_7.2058.840374597

Wed May 24 02:08:09 2017

Thread 2 advanced to log sequence 236290 (LGWR switch)

Current log# 5 seq# 236290 mem# 0: +ARCDG/orcl/onlinelog/group_5.2025.840374445

Current log# 5 seq# 236290 mem# 1: +ARCDG/orcl/onlinelog/group_5.2026.840374483

Current log# 5 seq# 236290 mem# 2: +ARCDG/orcl/onlinelog/group_5.2029.840374489

Current log# 5 seq# 236290 mem# 3: +ARCDG/orcl/onlinelog/group_5.2034.840374493

Wed May 24 02:08:51 2017

Thread 2 advanced to log sequence 236291 (LGWR switch)

Current log# 6 seq# 236291 mem# 0: +ARCDG/orcl/onlinelog/group_6.2040.840374575

Current log# 6 seq# 236291 mem# 1: +ARCDG/orcl/onlinelog/group_6.2041.840374579

Current log# 6 seq# 236291 mem# 2: +ARCDG/orcl/onlinelog/group_6.2042.840374581

Current log# 6 seq# 236291 mem# 3: +ARCDG/orcl/onlinelog/group_6.2048.840374585

Wed May 24 02:09:30 2017

Thread 2 advanced to log sequence 236292 (LGWR switch)

Current log# 7 seq# 236292 mem# 0: +ARCDG/orcl/onlinelog/group_7.2049.840374587

Current log# 7 seq# 236292 mem# 1: +ARCDG/orcl/onlinelog/group_7.2052.840374591

Current log# 7 seq# 236292 mem# 2: +ARCDG/orcl/onlinelog/group_7.2057.840374593

Current log# 7 seq# 236292 mem# 3: +ARCDG/orcl/onlinelog/group_7.2058.840374597

针对这个RAC集群，做了加多的配置修正、优化，主要是如下所述。

◆数据库配置优化

0、前期大量的日志切换无法完成，导致数据库挂起，为此新增了loggroup，并删掉无用多余的logmember。

1、增大了ASM实例的内存，由350M--->2G

2、RAC并行进程强制在本地实例分配

3、为减少对性能的影响，关闭audit审计

4、AWR 资料保存时间由7天增加到30天

5、SGA增大10G至60G

◆修改网络相关的系统核心参数

/usr/sbin/no -p -o tcp_sendspace=4194304

/usr/sbin/no -p -o tcp_recvspace=4194304

/usr/sbin/no -p -o rfc1323=1

/usr/sbin/no -p -o sb_max=8388608

/usr/sbin/no -p -o udp_ephemeral_low=9000

/usr/sbin/no -p -o tcp_ephemeral_low=9000

◆修改内联网卡的参数

chdev -l en1 -a tcp_sendspace=1048576 -a rfc1323=1 -a tcp_recvspace=1048576

◆SWAP

物理内存大于 16GB，但 SWAP 空间低于 16 GB，不符合 Oracle 安装的标准建议。增大PS。chps -s 192 hd6

◆时区及时间确认，安全起见，暂不修改。

◆ntp确认，使用GI集群自身的时间同步，确保各个节点的时间同步，目前是几十秒的偏差，不是问题。

◆HA集群配置修改，当初安装集群时部分内容没有按照官方文档实施，本次做了补救，但不全面，可能仍存在未知的风险。

5月25日重启后，经多一段时间的观察，有所改善，但于6月7日9点又发生一次节点2当。

但此次已经不是节点的驱逐，说明上次的修正与配置优化起到了作用，初步断定与网络参数配置修改相关。日志如下，无evict日志。

Wed Jun 07 09:02:57 2017

Errors in file /u01/app/11.2.0/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_lms0_7930030.trc  (incident=1023882):

ORA-00600: internal error code, arguments: [kjbrref:pkey], [3881577], [6], [7493495], [0], [], [], [], [], [], [], []

Incident details in: /u01/app/11.2.0/oracle/diag/rdbms/orcl/orcl2/incident/incdir_1023882/orcl2_lms0_7930030_i1023882.trc

Wed Jun 07 09:02:59 2017

Trace dumping is performing id=[cdmp_20170607090259]

Errors in file /u01/app/11.2.0/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_lms0_7930030.trc:

ORA-00600: internal error code, arguments: [kjbrref:pkey], [3881577], [6], [7493495], [0], [], [], [], [], [], [], []

LMS0 (ospid: 7930030): terminating the instance due to error 484

Instance terminated by LMS0, pid = 7930030

Wed Jun 07 09:14:01 2017

Starting ORACLE instance (normal)

针对此次的600内部错误，初步结论：

5月25日优化重启后，6月7日又发生过1次节点down的情况，经排查，基本确认是oracle BUG，在某些情况下会发生节点down。目前数据库PSU版本11.2.0.1.0，需要打补丁到最新版本11.2.0.1.6。后续需择机打补丁。

11G RAC 11.2.0.1.0实例evict故障处理的更多相关文章

oracle 11G rac 11.2.0.1 打补丁9413827
这是升级到以后11.2.0.2.11.2.0.3.11.2.0.4的基础主要参考两篇文章: Upgrade_11.2.0.1_GI_CRS_to_11.2.0.2_in_Linux.PDF文件 ht ...
Oracle Database 11g Release 2(11.2.0.3.0) RAC On Redhat Linux 5.8 Using Vmware Workstation 9.0
一,简介二,配置虚拟机 1,创建虚拟机 (1)添加三块儿网卡: 主节点二节点 eth0: 公网 192.168.1.20/24 NAT eth0: 公网 192.168.1 ...
Oracle 11g R2(11.2.0.4) RAC 数据文件路径错误解决--ORA-01157 ORA-01110: 数据文件
Oracle 11g R2(11.2.0.1) RAC 数据文件路径错误解决--ORA-01157 ORA-01110: 数据文件 oracle 11g R2(11.2.0.4) rac--scan ...
Oracle Linux 6.3下安装Oracle 11g R2(11.2.0.3)
本文主要描写叙述了在Oracle Linux 6.3下安装Oracle 11gR2(11.2.0.3).从Oracle 11g開始,Oracle官方站点不再提供其Patch的下载链接,须要使用Meat ...
Vmware Workstation实现CentOS6.10_x64 下ORACLE RAC 11.2.0.4的搭建
想必大家在学习ORACLE 11g时,都想搭建一个RAC的实验环境.在搭建RAC实验环境时,会碰到诸如IP怎么规划.虚拟机环境下怎么共享磁盘.ASM磁盘创建,以及安装过程中会遇到这样那样的问题.搭建一 ...
ORACLE RAC 11.2.0.4 CentOS release 6.9 静默安装1.0版本
RAC11.2.0.4静默安装 1.0版本,20180613 #本文档IP地址使用X隐藏,个人可按照自己的当前环境IP进行适当修改 1. 清除原环境中的单实例软件 #清除原环境: 删除/etc/ora ...
Oracle Study之--Oracle 单实例11.2.0.1.0升级到11.2.0.3.0
Oracle Study之--Oracle 单实例11.2.0.1.0升级到11.2.0.3.0 系统环境: 操作系统:RedHat EL6(64位) Oracle: Oracle 11gR2 ...
Oracle 11.2.0.4单实例打PSU,OJVM PSU补丁快速参考
写在前面: 1.Oracel打每个补丁的操作有时存在差异,所以不管多熟悉,都应该在打任何补丁之前阅读新补丁中附带的readme. 2.Oracle每季度都会更新一个最新的PSU,本文最新指的是当前最新 ...
oracle 11.2.0.4单实例文件系统安装与补丁
[TOC] 一,预安装处理 1.版本准备操作系统:RHEL 6.5 数据库版本:Oracle 11.2.0.4 相关包:p13390677_112040_Linux-x86-64_1of7.zip ...
Asianux 7.3安装Oracle 11.2.0.4单实例体验
环境:Asianux 7.3 需求:安装Oracle 11.2.0.4 单实例背景:系统使用默认的最小安装部署,Oracle安装额外需要的包统一使用yum安装. 查看当前系统相关信息: [root@ ...

随机推荐

Linux系统运维之FastDFS集群部署
一.简介 FastDFS是一个开源的轻量级分布式文件系统,它对文件进行管理,功能包括:文件存储.文件同步.文件访问(文件上传.文件下载)等,解决了大容量存储和负载均衡的问题.FastDFS服务端有两个 ...
SQL SERVER 查看表说明，字段属性
查询表字段属性,说明等: 1 SELECT 2 表名=case when a.colorder=1 then d.name else '' end, 3 表说明=case when a.colorde ...
基于Avalonia 11.0.0+ReactiveUI 的跨平台项目开发1-通用框架
基于Avalonia 11.0.0+ReactiveUI 的跨平台项目开发1-通用框架 Avalonia简介: Avalonia是.NET的一个跨平台UI框架,提供了一个灵活的样式系统,支持广泛的操作 ...
行行AI人才直播第13期：刘红林律师《AIGC创业者4大法律问题需注意》
行行AI人才(海南行行智能科技有限公司)是博客园和顺顺智慧共同运营的AI行业人才全生命周期服务平台. AIGC爆火至今,商业落地已成为各行各业焦点的问题.它的广泛应用也带来了一系列的法律风险和挑战.一 ...
git: 解决每次操作git时都要输入账户、密码
解决方案 $ git config --global credential.helper store 使用该命令即可解决问题
文心一言 VS 讯飞星火 VS chatgpt （70）-- 算法导论6.5 9题
九.请设计一个时间复杂度为 (n lgk)的算法,它能够将 k 个有序链表合并为一个有序链表,这里 n 是所有输入链表包含的总的元素个数.(提示:使用最小堆来完成 k 路归并. 文心一言: 要设计一个 ...
[selenium]点击元素出现的obscure问题
前言我们一般使用如下方式点击元素: elem = driver.find_element(...) elem.click() # 或者使用带等待条件的方式 elem = WebDriverWait( ...
python中将时间转换为时间戳
某平台url中的时间格式为时间戳,将时间变量传入url前,需要将固定格式的时间转换为时间戳.使用python中的time模块,对时间的几种格式进行转换. strptime(),将时间字符串转换成结构 ...
每日一库：gosec
gosec 是一个用于在 Go 代码中查找安全问题的开源工具,它可以帮助发现可能的漏洞和潜在的安全风险.以下是关于 gosec 的详细介绍: 1. 工具概述: gosec 是一个静态分析工具,用于扫描 ...
LDAP：如何在windows系统下安装LDAP及连接测试
1.LDAP介绍 LDAP是一个基于X.500标准的轻量目录访问协议,与X.500不同,LDAP协议支持TCP/IP连接.全称为Lightweight Directory Access Protoco ...

11G RAC 11.2.0.1.0实例evict故障处理

11G RAC 11.2.0.1.0实例evict故障处理的更多相关文章

随机推荐

热门专题