Aix 7.1

参考文档:

https://blogs.oracle.com/database4cn/rac

Resolving ORA-481 and "terminating the instance due to error 481" (Doc ID 1950963.1)

ORA-00481 After "The instance eviction reason is 0x2" due to Lack of Ticket (Doc ID 1644015.1)

归根结底是RAC节点之间消息传输流量控制的问题,除网络、硬件原因外,需要打补丁解决。

以下为第一次处理前具体的alertlog。

  1. node1
  2. Wed May 24 01:59:24 2017
  3. Remote instance 2 kill is issued with system inc 228
  4. LMON received an instance eviction notification from instance 1
  5. The instance eviction reason is 0x2
  6. The instance eviction map is 2
  7. Reconfiguration started (old inc 228, new inc 230)
  8. List of instances:
  9. 1 (myinst: 1)
  10. Wed May 24 01:59:27 2017
  11. Trace dumping is performing id=[cdmp_20170524015904]
  12. Global Resource Directory frozen
  13. * dead instance detected - domain 0 invalid = TRUE
  14. Communication channels reestablished
  15. Master broadcasted resource hash value bitmaps
  16. Non-local Process blocks cleaned out
  17. Wed May 24 01:59:28 2017
  18. LMS 1: 14 GCS shadows cancelled, 1 closed, 0 Xw survived
  19. Wed May 24 01:59:28 2017
  20. LMS 0: 23 GCS shadows cancelled, 0 closed, 0 Xw survived
  21. Wed May 24 01:59:28 2017
  22. LMS 2: 21 GCS shadows cancelled, 0 closed, 0 Xw survived
  23. Set master node info
  24. Submitted all remote-enqueue requests
  25. Dwn-cvts replayed, VALBLKs dubious
  26. All grantable enqueues granted
  27. Post SMON to start 1st pass IR
  28. Wed May 24 01:59:32 2017
  29. Instance recovery: looking for dead threads
  30. Beginning instance recovery of 1 threads
  31. Wed May 24 01:59:48 2017
  32. parallel recovery started with 32 processes
  33. Started redo scan
  34. Wed May 24 01:59:49 2017
  35. Submitted all GCS remote-cache requests
  36. Post SMON to start 1st pass IR
  37. Fix write in gcs resources
  38. Reconfiguration complete
  39. Wed May 24 02:00:00 2017
  40. Completed redo scan
  41. read 2094379 KB redo, 237345 data blocks need recovery
  42. Wed May 24 02:00:03 2017
  43. Reconfiguration started (old inc 230, new inc 232)
  44. List of instances:
  45. 1 2 (myinst: 1)
  46. Global Resource Directory frozen
  47. Communication channels reestablished
  48. Master broadcasted resource hash value bitmaps
  49. Non-local Process blocks cleaned out
  50. Wed May 24 02:00:04 2017
  51. LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
  52. Wed May 24 02:00:04 2017
  53. Wed May 24 02:00:04 2017
  54. LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
  55. LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
  56. Set master node info
  57. Submitted all remote-enqueue requests
  58. Dwn-cvts replayed, VALBLKs dubious
  59. All grantable enqueues granted
  60. Post SMON to start 1st pass IR
  61. Wed May 24 02:02:27 2017

  62. ,此处大量12170

  63. Wed May 24 02:05:34 2017
  64. LMON (ospid: 35979900): terminating the instance due to error 481
  65. Wed May 24 02:05:34 2017
  66. System state dump is made for local instance
  67. System State dumped to trace file /u01/app/11.2.0/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_diag_40894656.trc
  68. Instance terminated by LMON, pid = 35979900
  69. Wed May 24 08:12:27 2017
  70. Starting ORACLE instance (normal)
  71. sskgpgetexecname failed to get name
  72. LICENSE_MAX_SESSION = 0
  73. LICENSE_SESSIONS_WARNING = 0
  74. Interface type 1 en1 192.168.0.0 configured from GPnP Profile for use as a cluster interconnect
  75. Interface type 1 en0 10.209.199.0 configured from GPnP Profile for use as a public interface
  76. Picked latch-free SCN scheme 3
  77. Using LOG_ARCHIVE_DEST_1 parameter default value as USE_DB_RECOVERY_FILE_DEST
  78. Autotune of undo retention is turned on.
  79. LICENSE_MAX_USERS = 0
  80. SYS auditing is disabled
  81. Starting up:
  82. Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
  83. With the Partitioning, Real Application Clusters, OLAP, Data Mining
  84. and Real Application Testing options.
  85. Using parameter settings in server-side pfile /u01/app/11.2.0/oracle/product/11.2.0/db_1/dbs/initorcl1.ora
  86. System parameters with non-default values:
  87. processes = 2000
  88. timed_statistics = TRUE
  89. sga_max_size = 70G
  90. spfile = "+DATA1/orcl/spfileorcl.ora"
  91. sga_target = 50G
  92. control_files = "+DATA1/orcl/controlfile/current.261.832429113"
  93. control_files = "+ARCDG/orcl/controlfile/current.259.832429113"
  94. db_block_size = 8192
  95. compatible = "11.2.0.0.0"
  96. log_archive_format = "%t_%s_%r.dbf"
  97. cluster_database = TRUE
  98. db_create_file_dest = "+DATA1"
  99. db_recovery_file_dest = "+ARCDG"
  100. db_recovery_file_dest_size= 2048G
  101. thread = 1
  102. undo_tablespace = "UNDOTBS1"
  103. undo_retention = 10800
  104. instance_number = 1
  105. remote_login_passwordfile= "NONE"
  106. db_domain = ""
  107. service_names = "orcl"
  108. dispatchers = "(PROTOCOL=TCP) (SERVICE=orclXDB)"
  109. local_listener = "(ADDRESS = (PROTOCOL = TCP)(HOST = 10.209.199.4)(PORT = 1521))"
  110. remote_listener = "rac-scan:1521"
  111. result_cache_max_size = 268736K
  112. audit_file_dest = "/u01/app/11.2.0/oracle/admin/orcl/adump"
  113. audit_trail = "DB"
  114. db_name = "orcl"
  115. open_cursors = 1000
  116. sql_trace = FALSE
  117. optimizer_index_caching = 90
  118. pga_aggregate_target = 20G
  119. deferred_segment_creation= FALSE
  120. aq_tm_processes = 5
  121. diagnostic_dest = "/u01/app/11.2.0/oracle"
  122. Deprecated system parameters with specified values:
  123. sql_trace
  124. End of deprecated system parameter listing
  125. Cluster communication is configured to use the following interface(s) for this instance
  126. 192.168.0.1
  127. cluster interconnect IPC version:Oracle UDP/IP (generic)
  128. IPC Vendor 1 proto 2
  129. Wed May 24 08:12:32 2017
  130. PMON started with pid=2, OS id=35652038
  131. Wed May 24 08:12:32 2017
  132. VKTM started with pid=3, OS id=33489120 at elevated priority
  133. VKTM running at (10)millisec precision with DBRM quantum (100)ms
  134. Wed May 24 08:12:32 2017
  135. GEN0 started with pid=4, OS id=34209898
  136. Wed May 24 08:12:32 2017
  137. DIAG started with pid=5, OS id=36438538
  138. Wed May 24 08:12:33 2017
  139. DBRM started with pid=6, OS id=24576446
  140. Wed May 24 08:12:33 2017
  141. PING started with pid=7, OS id=29687920
  142. Wed May 24 08:12:33 2017
  143. PSP0 started with pid=8, OS id=36635296
  144. Wed May 24 08:12:33 2017
  145. ACMS started with pid=9, OS id=34341462
  146. Wed May 24 08:12:33 2017
  147. DIA0 started with pid=10, OS id=32047870
  148. Wed May 24 08:12:33 2017
  149. LMON started with pid=11, OS id=36045304
  150. Wed May 24 08:12:35 2017
  151. LMD0 started with pid=12, OS id=31391860
  152. Wed May 24 08:12:35 2017
  153. LMS0 started with pid=13, OS id=35979918 at elevated priority
  154. Wed May 24 08:12:35 2017
  155. LMS1 started with pid=14, OS id=22741294 at elevated priority
  156. Wed May 24 08:12:36 2017
  157. LMS2 started with pid=15, OS id=29229354 at elevated priority
  158. Wed May 24 08:12:36 2017
  159. RMS0 started with pid=16, OS id=20185780
  160. Wed May 24 08:12:36 2017
  161. LMHB started with pid=17, OS id=29425998
  162. Wed May 24 08:12:36 2017
  163. MMAN started with pid=18, OS id=9699954
  164. Wed May 24 08:12:36 2017
  165. DBW0 started with pid=19, OS id=31850686
  166. Wed May 24 08:12:36 2017
  167. DBW1 started with pid=20, OS id=36045442
  168. Wed May 24 08:12:36 2017
  169. DBW2 started with pid=21, OS id=40894536
  170. Wed May 24 08:12:36 2017
  171. DBW3 started with pid=22, OS id=29819564
  172. Wed May 24 08:12:36 2017
  173. DBW4 started with pid=23, OS id=36634890
  174. Wed May 24 08:12:36 2017
  175. LGWR started with pid=24, OS id=35783370
  176. Wed May 24 08:12:37 2017
  177. CKPT started with pid=25, OS id=35914258
  178. Wed May 24 08:12:37 2017
  179. SMON started with pid=26, OS id=32374796
  180. Wed May 24 08:12:37 2017
  181. RECO started with pid=27, OS id=35652280
  182. Wed May 24 08:12:37 2017
  183. RBAL started with pid=28, OS id=36438322
  184. Wed May 24 08:12:37 2017
  185. ASMB started with pid=29, OS id=39387144
  186. Wed May 24 08:12:37 2017
  187. MMON started with pid=30, OS id=2621900
  188. NOTE: initiating MARK startup
  189. Wed May 24 08:12:37 2017
  190. Starting background process MARKMMNL started with pid=31, OS id=35979732
  191.  
  192. starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
  193. Wed May 24 08:12:37 2017
  194. MARK started with pid=32, OS id=35717688
  195. NOTE: MARK has subscribed
  196. starting up 1 shared server(s) ...
  197. lmon registered with NM - instance number 1 (internal mem no 0)
  198. Reconfiguration started (old inc 0, new inc 236)
  199. List of instances:
  200. 1 2 (myinst: 1)
  201. Global Resource Directory frozen
  202. * allocate domain 0, invalid = TRUE
  203. Communication channels reestablished
  204. * domain 0 valid according to instance 2
  205. * domain 0 valid = 1 according to instance 2
  206. Master broadcasted resource hash value bitmaps
  207. Non-local Process blocks cleaned out
  208. LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
  209. LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
  210. LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
  211. Set master node info
  212. Submitted all remote-enqueue requests
  213. Dwn-cvts replayed, VALBLKs dubious
  214. All grantable enqueues granted
  215. Wed May 24 08:12:45 2017
  216. Submitted all GCS remote-cache requests
  217. Fix write in gcs resources
  218. Reconfiguration complete
  219. Wed May 24 08:12:52 2017
  220. LCK0 started with pid=34, OS id=24379778
  221. Wed May 24 08:12:52 2017
  222. Starting background process RSMN
  223. Wed May 24 08:12:52 2017
  224. RSMN started with pid=36, OS id=30802026
  225. ORACLE_BASE from environment = /u01/app/11.2.0/oracle
  226. Wed May 24 08:12:53 2017
  227. ALTER DATABASE MOUNT
  228. Wed May 24 08:12:53 2017
  229. NOTE: Loaded library: System
  230. Wed May 24 08:12:53 2017
  231. SUCCESS: diskgroup DATA1 was mounted
  232. SUCCESS: diskgroup ARCDG was mounted
  233. Wed May 24 08:12:53 2017
  234. NOTE: dependency between database orcl and diskgroup resource ora.DATA1.dg is established
  235. NOTE: dependency between database orcl and diskgroup resource ora.ARCDG.dg is established
  236. Wed May 24 08:12:57 2017
  237. Successful mount of redo thread 1, with mount id 1472569957
  238. Database mounted in Shared Mode (CLUSTER_DATABASE=TRUE)
  239. Lost write protection disabled
  240. Completed: ALTER DATABASE MOUNT
  241. Wed May 24 08:12:58 2017
  242. ALTER DATABASE OPEN
  243. Block change tracking file is current.
  244. Picked broadcast on commit scheme to generate SCNs
  245. Wed May 24 08:12:58 2017
  246. SUCCESS: diskgroup DATA2 was mounted
  247. NOTE: dependency between database orcl and diskgroup resource ora.DATA2.dg is established
  248. SUCCESS: diskgroup DATA3 was mounted
  249. NOTE: dependency between database orcl and diskgroup resource ora.DATA3.dg is established
  250. Thread 1 advanced to log sequence 255944 (thread open)
  251. Thread 1 opened at log sequence 255944
  252. Current log# 2 seq# 255944 mem# 0: +ARCDG/orcl/onlinelog/group_2.261.840376035
  253. Current log# 2 seq# 255944 mem# 1: +ARCDG/orcl/onlinelog/group_2.2113.840376039
  254. Current log# 2 seq# 255944 mem# 2: +ARCDG/orcl/onlinelog/group_2.2114.840376041
  255. Current log# 2 seq# 255944 mem# 3: +ARCDG/orcl/onlinelog/group_2.2116.840376043
  256. Successful open of redo thread 1
  257. MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
  258. Starting background process CTWR
  259. Wed May 24 08:12:59 2017
  260. CTWR started with pid=39, OS id=32768642
  261. Block change tracking service is active.
  262. Wed May 24 08:13:00 2017
  263. SMON: enabling cache recovery
  264. Successfully onlined Undo Tablespace 2.
  265. Verifying file header compatibility for 11g tablespace encryption..
  266. Verifying 11g file header compatibility for tablespace encryption completed
  267. SMON: enabling tx recovery
  268. Database Characterset is AL32UTF8
  269. No Resource Manager plan active
  270. Starting background process GTX0
  271. Wed May 24 08:13:07 2017
  272. GTX0 started with pid=42, OS id=35062526
  273. Starting background process RCBG
  274. Wed May 24 08:13:07 2017
  275. RCBG started with pid=43, OS id=27721944
  276. replication_dependency_tracking turned off (no async multimaster replication found)
  277. Wed May 24 08:13:08 2017
  278. Starting background process QMNC
  279. Wed May 24 08:13:08 2017
  280. QMNC started with pid=44, OS id=34013894
  281. Completed: ALTER DATABASE OPEN
  282. Wed May 24 08:13:17 2017
  283. Starting background process CJQ0
  284. Wed May 24 08:13:17 2017
  285. CJQ0 started with pid=41, OS id=11469004
  286. Wed May 24 08:13:43 2017
  287. Starting background process SMCO
  288. Wed May 24 08:13:43 2017
  289. SMCO started with pid=120, OS id=42336480



  290. Wed May 24 09:08:16 2017
  291. Errors in file /u01/app/11.2.0/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_j001_33161436.trc:
  292. ORA-12012: 鑎仟錨狮鎊骚琛^啼絕茕竈239 錨呛閊証
  293. ORA-00001: 杩^蒎^蟐湾^辕涓^犁害鎊較哝^荨浠(CM.UK_LTE_ZY_CHECK_HISTORY)
  294. ORA-06512: 錨塄 "CM.JOB_LTE_ZY_CHECK", line 27
  295. ORA-06512: 錨塄 line 1
  296.  
  297. node2
  298. Wed May 24 01:59:01 2017
  299. LMS1 (ospid: 26411086) received an instance eviction notification from instance 1 [2]
  300. Wed May 24 01:59:01 2017
  301. LMON received an instance eviction notification from instance 1
  302. The instance eviction reason is 0x2
  303. The instance eviction map is 2
  304. Wed May 24 01:59:04 2017
  305. PMON (ospid: 50725354): terminating the instance due to error 481
  306. Wed May 24 01:59:04 2017
  307. System state dump is made for local instance
  308. System State dumped to trace file /u01/app/11.2.0/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_diag_42336686.trc
  309. Wed May 24 01:59:04 2017
  310. ORA-1092 : opitsk aborting process
  311. Wed May 24 01:59:04 2017
  312. License high water mark = 516
  313. Instance terminated by PMON, pid = 50725354
  314. USER (ospid: 50987430): terminating the instance
  315. Instance terminated by USER, pid = 50987430
  316. Wed May 24 01:59:29 2017
  317. Starting ORACLE instance (normal)
  318. sskgpgetexecname failed to get name
  319. LICENSE_MAX_SESSION = 0
  320. LICENSE_SESSIONS_WARNING = 0
  321. Interface type 1 en1 192.168.0.0 configured from GPnP Profile for use as a cluster interconnect
  322. Interface type 1 en0 10.209.199.0 configured from GPnP Profile for use as a public interface
  323. Picked latch-free SCN scheme 3
  324. Using LOG_ARCHIVE_DEST_1 parameter default value as USE_DB_RECOVERY_FILE_DEST
  325. Autotune of undo retention is turned on.
  326. LICENSE_MAX_USERS = 0
  327. SYS auditing is disabled
  328. Starting up:
  329. Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
  330. With the Partitioning, Real Application Clusters, OLAP, Data Mining
  331. and Real Application Testing options.
  332. Using parameter settings in server-side pfile /u01/app/11.2.0/oracle/product/11.2.0/db_1/dbs/initorcl2.ora
  333. System parameters with non-default values:
  334. processes = 2000
  335. timed_statistics = TRUE
  336. sga_max_size = 70G
  337. spfile = "+DATA1/orcl/spfileorcl.ora"
  338. sga_target = 50G
  339. control_files = "+DATA1/orcl/controlfile/current.261.832429113"
  340. control_files = "+ARCDG/orcl/controlfile/current.259.832429113"
  341. db_block_size = 8192
  342. compatible = "11.2.0.0.0"
  343. log_archive_format = "%t_%s_%r.dbf"
  344. cluster_database = TRUE
  345. db_create_file_dest = "+DATA1"
  346. db_recovery_file_dest = "+ARCDG"
  347. db_recovery_file_dest_size= 2048G
  348. thread = 2
  349. undo_tablespace = "UNDOTBS2"
  350. undo_retention = 10800
  351. instance_number = 2
  352. remote_login_passwordfile= "NONE"
  353. db_domain = ""
  354. dispatchers = "(PROTOCOL=TCP) (SERVICE=orclXDB)"
  355. local_listener = "(ADDRESS = (PROTOCOL = TCP)(HOST = 10.209.199.5)(PORT = 1521))"
  356. remote_listener = "rac-scan:1521"
  357. result_cache_max_size = 268736K
  358. audit_file_dest = "/u01/app/11.2.0/oracle/admin/orcl/adump"
  359. audit_trail = "DB"
  360. db_name = "orcl"
  361. open_cursors = 1000
  362. sql_trace = FALSE
  363. optimizer_index_caching = 90
  364. pga_aggregate_target = 20G
  365. deferred_segment_creation= FALSE
  366. aq_tm_processes = 5
  367. diagnostic_dest = "/u01/app/11.2.0/oracle"
  368. Deprecated system parameters with specified values:
  369. sql_trace
  370. End of deprecated system parameter listing
  371. Cluster communication is configured to use the following interface(s) for this instance
  372. 192.168.0.2
  373. cluster interconnect IPC version:Oracle UDP/IP (generic)
  374. IPC Vendor 1 proto 2
  375. Wed May 24 01:59:34 2017
  376. PMON started with pid=2, OS id=66584656
  377. Wed May 24 01:59:34 2017
  378. VKTM started with pid=3, OS id=66846744 at elevated priority
  379. VKTM running at (10)millisec precision with DBRM quantum (100)ms
  380. Wed May 24 01:59:34 2017
  381. GEN0 started with pid=4, OS id=26608090
  382. Wed May 24 01:59:34 2017
  383. DIAG started with pid=5, OS id=26083716
  384. Wed May 24 01:59:34 2017
  385. DBRM started with pid=6, OS id=24510972
  386. Wed May 24 01:59:34 2017
  387. PING started with pid=7, OS id=65077306
  388. Wed May 24 01:59:34 2017
  389. PSP0 started with pid=8, OS id=66781402
  390. Wed May 24 01:59:34 2017
  391. ACMS started with pid=9, OS id=66978040
  392. Wed May 24 01:59:34 2017
  393. DIA0 started with pid=10, OS id=66519050
  394. Wed May 24 01:59:34 2017
  395. LMON started with pid=11, OS id=66453694
  396. Wed May 24 01:59:37 2017
  397. LMD0 started with pid=12, OS id=23658758
  398. Wed May 24 01:59:37 2017
  399. LMS0 started with pid=13, OS id=66322614 at elevated priority
  400. Wed May 24 01:59:37 2017
  401. LMS1 started with pid=14, OS id=65798210 at elevated priority
  402. Wed May 24 01:59:37 2017
  403. LMS2 started with pid=15, OS id=15204812 at elevated priority
  404. Wed May 24 01:59:38 2017
  405. RMS0 started with pid=16, OS id=65732820
  406. Wed May 24 01:59:38 2017
  407. LMHB started with pid=17, OS id=65339438
  408. Wed May 24 01:59:38 2017
  409. MMAN started with pid=18, OS id=52298036
  410. Wed May 24 01:59:38 2017
  411. DBW0 started with pid=19, OS id=52232458
  412. Wed May 24 01:59:38 2017
  413. DBW1 started with pid=20, OS id=65273872
  414. Wed May 24 01:59:38 2017
  415. DBW2 started with pid=21, OS id=65208354
  416. Wed May 24 01:59:38 2017
  417. DBW3 started with pid=22, OS id=65994980
  418. Wed May 24 01:59:38 2017
  419. DBW4 started with pid=23, OS id=61210752
  420. Wed May 24 01:59:38 2017
  421. LGWR started with pid=24, OS id=6095298
  422. Wed May 24 01:59:38 2017
  423. CKPT started with pid=25, OS id=65470648
  424. Wed May 24 01:59:38 2017
  425. SMON started with pid=26, OS id=64946374
  426. Wed May 24 01:59:38 2017
  427. RECO started with pid=27, OS id=51642854
  428. Wed May 24 01:59:39 2017
  429. RBAL started with pid=28, OS id=64618728
  430. Wed May 24 01:59:39 2017
  431. ASMB started with pid=29, OS id=64553208
  432. Wed May 24 01:59:39 2017
  433. MMON started with pid=30, OS id=64422054
  434. NOTE: initiating MARK startup
  435. Wed May 24 01:59:39 2017
  436. MMNL started with pid=31, OS id=42336716
  437. Starting background process MARK
  438. starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
  439. Wed May 24 01:59:39 2017
  440. MARK started with pid=32, OS id=51577168
  441. NOTE: MARK has subscribed
  442. Wed May 24 01:59:39 2017
  443. starting up 1 shared server(s) ...
  444. lmon registered with NM - instance number 2 (internal mem no 1)
  445. Reconfiguration started (old inc 0, new inc 232)
  446. List of instances:
  447. 1 2 (myinst: 2)
  448. Global Resource Directory frozen
  449. * allocate domain 0, invalid = TRUE
  450. Communication channels reestablished
  451. * domain 0 valid = 0 according to instance 1
  452. Master broadcasted resource hash value bitmaps
  453. Non-local Process blocks cleaned out
  454. LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
  455. LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
  456. LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
  457. Set master node info
  458. Submitted all remote-enqueue requests
  459. Dwn-cvts replayed, VALBLKs dubious
  460. All grantable enqueues granted
  461. Wed May 24 01:59:48 2017
  462. Submitted all GCS remote-cache requests
  463. Wed May 24 02:05:10 2017
  464. Trace dumping is performing id=[cdmp_20170524020534]
  465. Wed May 24 02:05:18 2017
  466. Reconfiguration started (old inc 232, new inc 234)
  467. List of instances:
  468. 2 (myinst: 2)
  469. Nested reconfiguration detected.
  470. Global Resource Directory frozen
  471. * dead instance detected - domain 0 invalid = TRUE
  472. Communication channels reestablished
  473. Master broadcasted resource hash value bitmaps
  474. Non-local Process blocks cleaned out
  475. Wed May 24 02:05:19 2017
  476. LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
  477. Wed May 24 02:05:19 2017
  478. LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
  479. Wed May 24 02:05:19 2017
  480. LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
  481. Set master node info
  482. Submitted all remote-enqueue requests
  483. Dwn-cvts replayed, VALBLKs dubious
  484. All grantable enqueues granted
  485. Post SMON to start 1st pass IR
  486. Submitted all GCS remote-cache requests
  487. Post SMON to start 1st pass IR
  488. Fix write in gcs resources
  489. Reconfiguration complete
  490. Wed May 24 02:05:25 2017
  491. LCK0 started with pid=34, OS id=51118396
  492. Wed May 24 02:05:25 2017
  493. Starting background process RSMN
  494. Wed May 24 02:05:25 2017
  495. RSMN started with pid=36, OS id=51773786
  496. ORACLE_BASE not set in environment. It is recommended
  497. that ORACLE_BASE be set in the environment
  498. Reusing ORACLE_BASE from an earlier startup = /u01/app/11.2.0/oracle
  499. Wed May 24 02:05:25 2017
  500. ALTER DATABASE MOUNT
  501. This instance was first to mount
  502. Wed May 24 02:05:26 2017
  503. NOTE: Loaded library: System
  504. Wed May 24 02:05:26 2017
  505. SUCCESS: diskgroup DATA1 was mounted
  506. Wed May 24 02:05:26 2017
  507. NOTE: dependency between database orcl and diskgroup resource ora.DATA1.dg is established
  508. SUCCESS: diskgroup ARCDG was mounted
  509. NOTE: dependency between database orcl and diskgroup resource ora.ARCDG.dg is established
  510. Wed May 24 02:05:30 2017
  511. Successful mount of redo thread 2, with mount id 1472569957
  512. Database mounted in Shared Mode (CLUSTER_DATABASE=TRUE)
  513. Lost write protection disabled
  514. Completed: ALTER DATABASE MOUNT
  515. ALTER DATABASE OPEN
  516. This instance was first to open
  517. Wed May 24 02:05:30 2017
  518. SUCCESS: diskgroup DATA2 was mounted
  519. NOTE: dependency between database orcl and diskgroup resource ora.DATA2.dg is established
  520. SUCCESS: diskgroup DATA3 was mounted
  521. NOTE: dependency between database orcl and diskgroup resource ora.DATA3.dg is established
  522. Block change tracking file is current.
  523. Beginning crash recovery of 2 threads
  524. parallel recovery started with 32 processes
  525. Started redo scan
  526. Wed May 24 02:05:44 2017
  527. Completed redo scan
  528. read 2158203 KB redo, 230150 data blocks need recovery
  529. Started redo application at
  530. Thread 1: logseq 255942, block 163388
  531. Thread 2: logseq 236286, block 2067920
  532. Recovery of Online Redo Log: Thread 1 Group 3 Seq 255942 Reading mem 0
  533. Mem# 0: +ARCDG/orcl/onlinelog/group_3.2085.840375427
  534. Mem# 1: +ARCDG/orcl/onlinelog/group_3.2084.840375445
  535. Mem# 2: +ARCDG/orcl/onlinelog/group_3.263.840375447
  536. Mem# 3: +ARCDG/orcl/onlinelog/group_3.2089.840375449
  537. Recovery of Online Redo Log: Thread 2 Group 7 Seq 236286 Reading mem 0
  538. Mem# 0: +ARCDG/orcl/onlinelog/group_7.2049.840374587
  539. Mem# 1: +ARCDG/orcl/onlinelog/group_7.2052.840374591
  540. Mem# 2: +ARCDG/orcl/onlinelog/group_7.2057.840374593
  541. Mem# 3: +ARCDG/orcl/onlinelog/group_7.2058.840374597
  542. Recovery of Online Redo Log: Thread 2 Group 5 Seq 236287 Reading mem 0
  543. Mem# 0: +ARCDG/orcl/onlinelog/group_5.2025.840374445
  544. Mem# 1: +ARCDG/orcl/onlinelog/group_5.2026.840374483
  545. Mem# 2: +ARCDG/orcl/onlinelog/group_5.2029.840374489
  546. Mem# 3: +ARCDG/orcl/onlinelog/group_5.2034.840374493
  547. Wed May 24 02:06:05 2017
  548. Completed redo application of 1702.07MB
  549. Completed crash recovery at
  550. Thread 1: logseq 255942, block 291036, scn 12495715727030
  551. Thread 2: logseq 236287, block 2742038, scn 12495715600520
  552. 230150 data blocks read, 229989 data blocks written, 2158203 redo k-bytes read
  553. Thread 1 advanced to log sequence 255943 (thread recovery)
  554. Picked broadcast on commit scheme to generate SCNs
  555. Wed May 24 02:06:06 2017
  556. Thread 2 advanced to log sequence 236288 (thread open)
  557. Thread 2 opened at log sequence 236288
  558. Current log# 6 seq# 236288 mem# 0: +ARCDG/orcl/onlinelog/group_6.2040.840374575
  559. Current log# 6 seq# 236288 mem# 1: +ARCDG/orcl/onlinelog/group_6.2041.840374579
  560. Current log# 6 seq# 236288 mem# 2: +ARCDG/orcl/onlinelog/group_6.2042.840374581
  561. Current log# 6 seq# 236288 mem# 3: +ARCDG/orcl/onlinelog/group_6.2048.840374585
  562. Successful open of redo thread 2
  563. MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
  564. Starting background process CTWR
  565. Wed May 24 02:06:07 2017
  566. CTWR started with pid=89, OS id=49349114
  567. Block change tracking service is active.
  568. Wed May 24 02:06:07 2017
  569. SMON: enabling cache recovery
  570. Successfully onlined Undo Tablespace 4.
  571. Verifying file header compatibility for 11g tablespace encryption..
  572. Verifying 11g file header compatibility for tablespace encryption completed
  573. SMON: enabling tx recovery
  574. Database Characterset is AL32UTF8
  575. No Resource Manager plan active
  576. Starting background process GTX0
  577. Wed May 24 02:06:11 2017
  578. GTX0 started with pid=98, OS id=42992042
  579. Starting background process RCBG
  580. Wed May 24 02:06:11 2017
  581. RCBG started with pid=52, OS id=40370648
  582. replication_dependency_tracking turned off (no async multimaster replication found)
  583. Starting background process QMNC
  584. Wed May 24 02:06:13 2017
  585. QMNC started with pid=119, OS id=47514086
  586. Completed: ALTER DATABASE OPEN
  587. SMON: Parallel transaction recovery tried
  588. Starting background process SMCO
  589. Wed May 24 02:06:17 2017
  590. SMCO started with pid=153, OS id=50724962
  591. Wed May 24 02:06:17 2017
  592. db_recovery_file_dest_size of 2097152 MB is 4.35% used. This is a
  593. user-specified limit on the amount of space that will be used by this
  594. database for recovery-related files, and does not reflect the amount of
  595. space available in the underlying filesystem or ASM diskgroup.
  596. Wed May 24 02:06:19 2017
  597. Starting background process CJQ0
  598. Wed May 24 02:06:19 2017
  599. CJQ0 started with pid=46, OS id=43581760
  600. Wed May 24 02:07:35 2017
  601. Thread 2 advanced to log sequence 236289 (LGWR switch)
  602. Current log# 7 seq# 236289 mem# 0: +ARCDG/orcl/onlinelog/group_7.2049.840374587
  603. Current log# 7 seq# 236289 mem# 1: +ARCDG/orcl/onlinelog/group_7.2052.840374591
  604. Current log# 7 seq# 236289 mem# 2: +ARCDG/orcl/onlinelog/group_7.2057.840374593
  605. Current log# 7 seq# 236289 mem# 3: +ARCDG/orcl/onlinelog/group_7.2058.840374597
  606. Wed May 24 02:08:09 2017
  607. Thread 2 advanced to log sequence 236290 (LGWR switch)
  608. Current log# 5 seq# 236290 mem# 0: +ARCDG/orcl/onlinelog/group_5.2025.840374445
  609. Current log# 5 seq# 236290 mem# 1: +ARCDG/orcl/onlinelog/group_5.2026.840374483
  610. Current log# 5 seq# 236290 mem# 2: +ARCDG/orcl/onlinelog/group_5.2029.840374489
  611. Current log# 5 seq# 236290 mem# 3: +ARCDG/orcl/onlinelog/group_5.2034.840374493
  612. Wed May 24 02:08:51 2017
  613. Thread 2 advanced to log sequence 236291 (LGWR switch)
  614. Current log# 6 seq# 236291 mem# 0: +ARCDG/orcl/onlinelog/group_6.2040.840374575
  615. Current log# 6 seq# 236291 mem# 1: +ARCDG/orcl/onlinelog/group_6.2041.840374579
  616. Current log# 6 seq# 236291 mem# 2: +ARCDG/orcl/onlinelog/group_6.2042.840374581
  617. Current log# 6 seq# 236291 mem# 3: +ARCDG/orcl/onlinelog/group_6.2048.840374585
  618. Wed May 24 02:09:30 2017
  619. Thread 2 advanced to log sequence 236292 (LGWR switch)
  620. Current log# 7 seq# 236292 mem# 0: +ARCDG/orcl/onlinelog/group_7.2049.840374587
  621. Current log# 7 seq# 236292 mem# 1: +ARCDG/orcl/onlinelog/group_7.2052.840374591
  622. Current log# 7 seq# 236292 mem# 2: +ARCDG/orcl/onlinelog/group_7.2057.840374593
  623. Current log# 7 seq# 236292 mem# 3: +ARCDG/orcl/onlinelog/group_7.2058.840374597

针对这个RAC集群,做了加多的配置修正、优化,主要是如下所述。

◆数据库配置优化

0、前期大量的日志切换无法完成,导致数据库挂起,为此新增了loggroup,并删掉无用多余的logmember。

1、增大了ASM实例的内存,由350M--->2G

2、RAC并行进程强制在本地实例分配

3、为减少对性能的影响,关闭audit审计

4、AWR 资料保存时间由7天增加到30天

5、SGA增大10G至60G

◆修改网络相关的系统核心参数

/usr/sbin/no -p -o tcp_sendspace=4194304

/usr/sbin/no -p -o tcp_recvspace=4194304

/usr/sbin/no -p -o rfc1323=1

/usr/sbin/no -p -o sb_max=8388608

/usr/sbin/no -p -o udp_ephemeral_low=9000

/usr/sbin/no -p -o tcp_ephemeral_low=9000

◆修改内联网卡的参数

chdev -l en1 -a tcp_sendspace=1048576 -a rfc1323=1 -a tcp_recvspace=1048576

◆SWAP

物理内存大于 16GB,但 SWAP 空间低于 16 GB,不符合 Oracle 安装的标准建议。增大PS。chps -s 192 hd6

◆时区及时间确认,安全起见,暂不修改。

◆ntp确认,使用GI集群自身的时间同步,确保各个节点的时间同步,目前是几十秒的偏差,不是问题。

◆HA集群配置修改,当初安装集群时部分内容没有按照官方文档实施,本次做了补救,但不全面,可能仍存在未知的风险。

5月25日重启后,经多一段时间的观察,有所改善,但于6月7日9点又发生一次节点2当。

但此次已经不是节点的驱逐,说明上次的修正与配置优化起到了作用,初步断定与网络参数配置修改相关。日志如下,无evict日志。

  1. Wed Jun 07 09:02:57 2017
  2. Errors in file /u01/app/11.2.0/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_lms0_7930030.trc (incident=1023882):
  3. ORA-00600: internal error code, arguments: [kjbrref:pkey], [3881577], [6], [7493495], [0], [], [], [], [], [], [], []
  4. Incident details in: /u01/app/11.2.0/oracle/diag/rdbms/orcl/orcl2/incident/incdir_1023882/orcl2_lms0_7930030_i1023882.trc
  5. Wed Jun 07 09:02:59 2017
  6. Trace dumping is performing id=[cdmp_20170607090259]
  7. Errors in file /u01/app/11.2.0/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_lms0_7930030.trc:
  8. ORA-00600: internal error code, arguments: [kjbrref:pkey], [3881577], [6], [7493495], [0], [], [], [], [], [], [], []
  9. LMS0 (ospid: 7930030): terminating the instance due to error 484
  10. Instance terminated by LMS0, pid = 7930030
  11. Wed Jun 07 09:14:01 2017
  12. Starting ORACLE instance (normal)

针对此次的600内部错误,初步结论:

5月25日优化重启后,6月7日又发生过1次节点down的情况,经排查,基本确认是oracle BUG,在某些情况下会发生节点down。目前数据库PSU版本11.2.0.1.0,需要打补丁到最新版本11.2.0.1.6。后续需择机打补丁。

11G RAC 11.2.0.1.0实例evict故障处理的更多相关文章

  1. oracle 11G rac 11.2.0.1 打补丁9413827

    这是升级到以后11.2.0.2.11.2.0.3.11.2.0.4的基础 主要参考两篇文章: Upgrade_11.2.0.1_GI_CRS_to_11.2.0.2_in_Linux.PDF文件 ht ...

  2. Oracle Database 11g Release 2(11.2.0.3.0) RAC On Redhat Linux 5.8 Using Vmware Workstation 9.0

    一,简介 二,配置虚拟机 1,创建虚拟机 (1)添加三块儿网卡:   主节点 二节点 eth0:    公网  192.168.1.20/24   NAT eth0:    公网  192.168.1 ...

  3. Oracle 11g R2(11.2.0.4) RAC 数据文件路径错误解决--ORA-01157 ORA-01110: 数据文件

    Oracle 11g R2(11.2.0.1) RAC  数据文件路径错误解决--ORA-01157 ORA-01110: 数据文件 oracle 11g R2(11.2.0.4) rac--scan ...

  4. Oracle Linux 6.3下安装Oracle 11g R2(11.2.0.3)

    本文主要描写叙述了在Oracle Linux 6.3下安装Oracle 11gR2(11.2.0.3).从Oracle 11g開始,Oracle官方站点不再提供其Patch的下载链接,须要使用Meat ...

  5. Vmware Workstation实现CentOS6.10_x64 下ORACLE RAC 11.2.0.4的搭建

    想必大家在学习ORACLE 11g时,都想搭建一个RAC的实验环境.在搭建RAC实验环境时,会碰到诸如IP怎么规划.虚拟机环境下怎么共享磁盘.ASM磁盘创建,以及安装过程中会遇到这样那样的问题.搭建一 ...

  6. ORACLE RAC 11.2.0.4 CentOS release 6.9 静默安装1.0版本

    RAC11.2.0.4静默安装 1.0版本,20180613 #本文档IP地址使用X隐藏,个人可按照自己的当前环境IP进行适当修改 1. 清除原环境中的单实例软件 #清除原环境: 删除/etc/ora ...

  7. Oracle Study之--Oracle 单实例11.2.0.1.0升级到11.2.0.3.0

    Oracle Study之--Oracle 单实例11.2.0.1.0升级到11.2.0.3.0 系统环境: 操作系统:RedHat EL6(64位) Oracle:    Oracle 11gR2 ...

  8. Oracle 11.2.0.4单实例打PSU,OJVM PSU补丁快速参考

    写在前面: 1.Oracel打每个补丁的操作有时存在差异,所以不管多熟悉,都应该在打任何补丁之前阅读新补丁中附带的readme. 2.Oracle每季度都会更新一个最新的PSU,本文最新指的是当前最新 ...

  9. oracle 11.2.0.4单实例文件系统安装与补丁

    [TOC] 一,预安装处理 1.版本准备 操作系统:RHEL 6.5 数据库版本:Oracle 11.2.0.4 相关包:p13390677_112040_Linux-x86-64_1of7.zip  ...

  10. Asianux 7.3安装Oracle 11.2.0.4单实例体验

    环境:Asianux 7.3 需求:安装Oracle 11.2.0.4 单实例 背景:系统使用默认的最小安装部署,Oracle安装额外需要的包统一使用yum安装. 查看当前系统相关信息: [root@ ...

随机推荐

  1. 4. DI相关内容

    我们先来思考 向一个类中传递数据的方式有几种? 普通方法(set 方法) 构造方法 依赖注入描述了在容器中建立 bean 与 bean 之间的依赖关系的过程,如果 bean 运行需要的是数字或字符串呢 ...

  2. python学习--解析网页

    # -*- coding: utf-8 -*-"""Created on Thu Oct 17 14:04:21 2019 @author: DELL"&quo ...

  3. Python 运行 shell 命令的一些方法

    哈喽大家好,我是咸鱼 我们知道,python 在自动化领域中被广泛应用,可以很好地自动化处理一些任务 就比如编写 Python 脚本自动化执行重复性的任务,如文件处理.数据处理.系统管理等需要运行其他 ...

  4. 如何修改电脑的BIOS密码?

      本文介绍设置.修改Windows电脑BIOS模式密码的具体方法.   一般的,电脑默认都是不含有BIOS密码的,可以直接在开机时不输入任何密码进入BIOS模式:而在某些特定的场合,我们可能需要对其 ...

  5. fread()模板

    char buf[1<<20],*p1,*p2;#define GC (p1==p2&&(p2=(p1=buf)+fread(buf,1,1<<20,stdin ...

  6. Linux 命令:grub2-mkconfig

    检索这个命令的,肯定都知道 grub 是 bootloader 程序,用于引导系统启动.配置文件是 grub.conf,现在一般的 grub 版本是grub2. 当机器上安装有多个内核.或者多个操作系 ...

  7. 2021-10-09 Core学习

    控制器学习 如果有ID参数,根据前面定义的{controller=Home}/{action=Index}/{id?} 可以换成一下格式 页面学习 视图 基架搭建 然后在nuget控制台添加 Add- ...

  8. ubuntu安装msf签名认证失败

    添加命令 apt-get --allow-unauthenticated upgrade 来允许未认证签名软件安装,但是可能有恶意软件安装进来,可以使用 sudo apt-key adv --keys ...

  9. Nacos启动时报错No DataSource set排查

    问题描述 最近在学习Nacos组件,使用的是最新版本:2.2.3. 在本地虚拟机CentOS 8.5.2111环境中安装Nacos,并使用standalone模式启动,同时配置使用外部MySQL数据库 ...

  10. 洛谷 P1122 最大子树和 题解

    一道入门的树形DP. 首先我们对于数据进行有序化处理,这便于我们利用数据结构特点(可排序性)来发觉数据性质(有序.单调.子问题等等性质),以便于后续的转化.推理和处理.有序化可以"转化和创造 ...