Working Around an ASM Startup Problem

Scenario

I recently hit the problem covered in MOS note “ASM Instance Is Not Coming Up ORA-00064 (1,4468736,Kfchl Array) Kfchl Array (Doc ID 1328620.1)” mid-way through a Grid Infrastructure upgrade from 11.2.0.2 BP16 to 11.2.0.4 BP3 on Exadata. Specifically, it was the application of prerequisite patch 17783101 (required for downgrades) to node 3 that hit the problem. At this point nodes 1 and 2 had been successfully patched with the prerequisite patch and rootupgrade.sh for 11.2.0.4 had been run without issue on these nodes, i.e., the cluster was in a rolling upgrade.

The approach used to resolve the problem (detailed below) is applicable to similar issues on non-Exadata systems, so even if you don’t work on Exadata this might be interesting.

The support note cites a few options to fix the problem of ASM not starting:

  1. Recreate the spfile from a pfile where the offending parameter (__shared_pool_size) has been removed – This would allow the startup problem for +ASM3 to be addressed, but there are a couple of point to note: a) “create spfile from pfile” would need to use a different diskgroup for the spfile than the original spfile resides in (at least as an intermediate step) due to the limitation of not being able create an ASM parameter file in the same diskgroup as an in use ASM parameter file[1]; b) “create spfile from pfile” updates the Grid Plug and Play (GPnP) profile to point to the newly created spfile on all nodes, so any restarted ASM instance would be running from a different spfile than the currently in use spfile on all running ASM instances. Neither of these are catastrophic consequences, but I was keen to resolve the issue without needing to restart ASM/Grid Infrastructure on any node other than node 3.
  2. Update the spfile file using “alter system reset” to remove the offending parameter (__shared_pool_size) for the problem instance from an running ASM instance on another node – That was not possible due to the command returning ORA-32000 (write to SPFILE requested but SPFILE is not modifiable) as a result of the rolling upgrade that was in-flight.

A third solution is mentioned that is very similar to number 1. It also requires the creation of a pfile followed by recreation of the spfile from the pfile. That was out for the same reasons.

It seems worth commenting that the support note states that it applies to “Oracle Exadata Storage Server Software – Version 11.2.2.1.0 to 11.2.2.3.0 [Release 11.2]”, but the system where this was encountered was running 11.2.3.2.1 so that is not accurate.

What next?

Step 1

The first thing I tried was to start +ASM3 from a pfile where I’d manually removed “__shared_pool_size” from the file for that instance – It worked. ASM started and then everything else sprang into life.

The above got the database instances on node 3 up, but left me with a problem. If Grid Infrastructure was restarted, which it would be during the pending second attempt to apply the prerequisite patch and again during the execution of rootupgrade.sh as part of the actual upgrade to 11.2.0.4, then ASM would fail due to ORA-00064 again.

An interesting situation.

Step 2

After a bit of pondering I realised that an option to try would be updating the GPnP profile only on node 3 using gpnptool. This would allow +ASM3 to be started via “crsctl start crs” using the pfile I’d already used to successfully start it, but leave the GPnP profile on all other nodes pointing to the current spfile stored in ASM.

Updating the GPnP Profile:

[+ASM1@c01db03 ~]$ gpnptool edit -asm:asm_spf=/tmp/asm.ora -p=$ORACLE_HOME/gpnp/$(hostname -s)/profiles/peer/profile.xml -o=$ORACLE_HOME/gpnp/$(hostname -s)/profiles/peer/profile.xml -ovr
Resulting profile written to "/u01/app/11.2.0.2/grid/gpnp/c01db03/profiles/peer/profile.xml".
Success.
[+ASM1@c01db03 ~]$

Signing the Profile:

[+ASM1@c01db03 ~]$ gpnptool sign -p=$ORACLE_HOME/gpnp/$(hostname -s)/profiles/peer/profile.xml -o=$ORACLE_HOME/gpnp/$(hostname -s)/profiles/peer/profile.xml -ovr -w=file:$ORACLE_HOME/gpnp/$(hostname -s)/wallets/peer -rmws
Resulting profile written to "/u01/app/11.2.0.2/grid/gpnp/c01db03/profiles/peer/profile.xml".
Success.
[+ASM1@c01db03 ~]$

After the profile change ASM (and the rest of Grid Infrastructure) was restarted using “crsctl stop crs” followed by “crsctl start crs”. The Grid Infrastructure stack started as expected, which meant the patching and upgrade work could continue.

Problem solved with no cluster-wide outage or need to restart ASM/Grid Infrastructure on any node other than the one where the problem was encountered.

Step 3

The important “step 3″ was to use “alter system reset” to remove the offending parameter from the spfile for +ASM3 once the upgrade to 11.2.0.4 had been completed, reverse the update to the GPNP profile and restart +ASM3 to get all ASM instances referencing that same spfile again. Note that the reversal of the GPnP profile change needed to be run in the 11.2.0.4 Grid Infrastructure Home as it was performed after the upgrade completed.

All done.

Footnotes

[1] – Attempting to create an ASM spfile in a diskgroup containing an in use (by any ASM instance) spfile, including via an alias results in either ORA-32002 or ORA-17502, depending on whether or not the diskgroup is specified for the spfile:

SYS@+ASM1> !asmcmd spget
+FRA/c01/ASMPARAMETERFILE/registry.253.843340551

SYS@+ASM1> !asmcmd find --type asmparameterfile + \*
+DATA/c01/ASMPARAMETERFILE/REGISTRY.253.825178021
+FRA/c01/ASMPARAMETERFILE/REGISTRY.253.843340551

SYS@+ASM1> create spfile from pfile='/tmp/asm.ora';
create spfile from pfile='/tmp/asm.ora'
*
ERROR at line 1:
ORA-32002: cannot create SPFILE already being used by the instance

SYS@+ASM1> create spfile='+FRA' from pfile='/tmp/asm.ora';
create spfile='+FRA' from pfile='/tmp/asm.ora'
*
ERROR at line 1:
ORA-17502: ksfdcre:4 Failed to create file +FRA
ORA-15268: internal Oracle file +FRA.253.1 already exists.

SYS@+ASM1> create spfile='+FRA/asmspfile.ora' from pfile='/tmp/asm.ora';
create spfile='+FRA/asmspfile.ora' from pfile='/tmp/asm.ora'
*
ERROR at line 1:
ORA-17502: ksfdcre:4 Failed to create file +FRA/asmspfile.ora
ORA-15268: internal Oracle file +FRA.253.1 already exists.

SYS@+ASM1>