Had to fail... for sure! Curious things about Oracle ® Database Management: Fri, 13th :o(

I am not superstitious, but when a database crashes due to a storage failure on a Fri, 13^th it is hard to avoid the jokes about bad luck :o)

This is a RAC 11.2.0.2 database with 3 nodes, and oracle alert log files show no evidence of crash (of course, as the ADR base directory points to the crashed storage). Checking AIX Error Log facility it is possible to notice the storage issue:

LABEL: DSI_PROC

IDENTIFIER: <<Identifier>>

Date/Time: Fri Jul 13 11:43:56 CDT 2012

Sequence Number: 1072

Machine Id: <<machine ID>>

Node Id: <<server name>>

Class: S

Type: PERM

WPAR: Global

Resource Name: SYSVMM

Description

DATA STORAGE INTERRUPT, PROCESSOR

Probable Causes

SOFTWARE PROGRAM

Failure Causes

SOFTWARE PROGRAM

Recommended Actions

IF PROBLEM PERSISTS THEN DO THE FOLLOWING

CONTACT APPROPRIATE SERVICE REPRESENTATIVE

Detail Data

DATA STORAGE INTERRUPT STATUS REGISTER

0000 0000 4000 0000

SEGMENT REGISTER, SEGREG

0000 001B 9853 9590

DATA STORAGE INTERRUPT ADDRESS REGISTER

F000 0000 2000 0010

After the storage issue was fixed, it was time to bring the database back up.

$ srvctl start database -d DB_NAME

PRCR-1079 : Failed to start resource ora.db_name.db

CRS-5017: The resource action "ora.db_name.db start" encountered the following error:

ORA-03113: end-of-file on communication channel

Process ID: 0

Session ID: 0 Serial number: 0

CRS-2674: Start of 'ora.db_name.db' on 'node-1' failed

CRS-5017: The resource action "ora.db_name.db start" encountered the following error:

ORA-03113: end-of-file on communication channel

Process ID: 0

Session ID: 0 Serial number: 0

CRS-2674: Start of 'ora.reap101.db' on 'node-2' failed

CRS-2632: There are no more servers to try to place resource 'ora.db_name.db' on that would satisfy its placement policy

CRS-5017: The resource action "ora.db_name.db start" encountered the following error:

ORA-03113: end-of-file on communication channel

Process ID: 0

Session ID: 0 Serial number: 0

CRS-2674: Start of 'ora.db_name.db' on 'node-3' failed

The actual error is identified by checking the alert log file:

Errors in file .../DB_NAME_lms1_11235114.trc (incident=251123):

ORA-04031: unable to allocate 1048528 bytes of shared memory ("shared pool","unknown object","sga heap(1,0)","gcs resources")

The ORA-04031 error during the startup occurred due to multiple partially-allocated permanent shared pool chunks – analyzing the SGA heap dump generated by oracle it is possible to see several chunks with allocated value less than the chunk size.

The issue is pretty simple to resolve: increase SGA or memory target when using Automatic MemoryManagement or the shared pool size when setting memory manually.

Considering the database was up and running before the storage failure, the real question here is: what has changed to cause the error during the database startup?

Previously the database was started up using automatic memory management by setting sga_target. The shared pool size was set also (the value specified worked as a minimum size for the shared pool), but it was not enough as oracle increased the shared pool size. This can be confirmed by checking the server parameter file for the presence of __shared_pool_size with a value greater than the value specified to the shared_pool_size parameter. Later the memory management was changed from auto to manual by setting sga_target to 0 (the database would be restarted during a scheduled maintenance window), but the parameter shared_pool_size was not changed to reflect the real value being used by oracle.

It is interesting to note SGA resizes continue occurring in 11.2 databases despite the fact that automatic memory management is disabled – see oracle note 1269139.1. Therefore, it is important to keep in mind to check the current memory size used by the SGA components by querying gv$sga_dynamic_components before changing memory parameters.

Had to fail... for sure! Curious things about Oracle ® Database Management

Wednesday, July 18, 2012

Fri, 13th :o(

No comments:

Post a Comment

Search This Blog