Working Around an ASM Startup Problem

Scenario

I recently hit the problem covered in MOS note “ASM Instance Is Not Coming Up ORA-00064 (1,4468736,Kfchl Array) Kfchl Array (Doc ID 1328620.1)” mid-way through a Grid Infrastructure upgrade from 11.2.0.2 BP16 to 11.2.0.4 BP3 on Exadata. Specifically, it was the application of prerequisite patch 17783101 (required for downgrades) to node 3 that hit the problem. At this point nodes 1 and 2 had been successfully patched with the prerequisite patch and rootupgrade.sh for 11.2.0.4 had been run without issue on these nodes, i.e., the cluster was in a rolling upgrade.

The approach used to resolve the problem (detailed below) is applicable to similar issues on non-Exadata systems, so even if you don’t work on Exadata this might be interesting.

The support note cites a few options to fix the problem of ASM not starting:

  1. Recreate the spfile from a pfile where the offending parameter (__shared_pool_size) has been removed – This would allow the startup problem for +ASM3 to be addressed, but there are a couple of point to note: a) “create spfile from pfile” would need to use a different diskgroup for the spfile than the original spfile resides in (at least as an intermediate step) due to the limitation of not being able create an ASM parameter file in the same diskgroup as an in use ASM parameter file[1]; b) “create spfile from pfile” updates the Grid Plug and Play (GPnP) profile to point to the newly created spfile on all nodes, so any restarted ASM instance would be running from a different spfile than the currently in use spfile on all running ASM instances. Neither of these are catastrophic consequences, but I was keen to resolve the issue without needing to restart ASM/Grid Infrastructure on any node other than node 3.
  2. Update the spfile file using “alter system reset” to remove the offending parameter (__shared_pool_size) for the problem instance from an running ASM instance on another node – That was not possible due to the command returning ORA-32000 (write to SPFILE requested but SPFILE is not modifiable) as a result of the rolling upgrade that was in-flight.

A third solution is mentioned that is very similar to number 1. It also requires the creation of a pfile followed by recreation of the spfile from the pfile. That was out for the same reasons.

It seems worth commenting that the support note states that it applies to “Oracle Exadata Storage Server Software – Version 11.2.2.1.0 to 11.2.2.3.0 [Release 11.2]“, but the system where this was encountered was running 11.2.3.2.1 so that is not accurate.

What next?

Step 1

The first thing I tried was to start +ASM3 from a pfile where I’d manually removed “__shared_pool_size” from the file for that instance – It worked. ASM started and then everything else sprang into life.

The above got the database instances on node 3 up, but left me with a problem. If Grid Infrastructure was restarted, which it would be during the pending second attempt to apply the prerequisite patch and again during the execution of rootupgrade.sh as part of the actual upgrade to 11.2.0.4, then ASM would fail due to ORA-00064 again.

An interesting situation.

Step 2

After a bit of pondering I realised that an option to try would be updating the GPnP profile only on node 3 using gpnptool. This would allow +ASM3 to be started via “crsctl start crs” using the pfile I’d already used to successfully start it, but leave the GPnP profile on all other nodes pointing to the current spfile stored in ASM.

Updating the GPnP Profile:

[+ASM1@c01db03 ~]$ gpnptool edit -asm:asm_spf=/tmp/asm.ora -p=$ORACLE_HOME/gpnp/$(hostname -s)/profiles/peer/profile.xml -o=$ORACLE_HOME/gpnp/$(hostname -s)/profiles/peer/profile.xml -ovr
Resulting profile written to "/u01/app/11.2.0.2/grid/gpnp/c01db03/profiles/peer/profile.xml".
Success.
[+ASM1@c01db03 ~]$

Signing the Profile:

[+ASM1@c01db03 ~]$ gpnptool sign -p=$ORACLE_HOME/gpnp/$(hostname -s)/profiles/peer/profile.xml -o=$ORACLE_HOME/gpnp/$(hostname -s)/profiles/peer/profile.xml -ovr -w=file:$ORACLE_HOME/gpnp/$(hostname -s)/wallets/peer -rmws
Resulting profile written to "/u01/app/11.2.0.2/grid/gpnp/c01db03/profiles/peer/profile.xml".
Success.
[+ASM1@c01db03 ~]$

After the profile change ASM (and the rest of Grid Infrastructure) was restarted using “crsctl stop crs” followed by “crsctl start crs”. The Grid Infrastructure stack started as expected, which meant the patching and upgrade work could continue.

Problem solved with no cluster-wide outage or need to restart ASM/Grid Infrastructure on any node other than the one where the problem was encountered.

Step 3

The important “step 3″ was to use “alter system reset” to remove the offending parameter from the spfile for +ASM3 once the upgrade to 11.2.0.4 had been completed, reverse the update to the GPNP profile and restart +ASM3 to get all ASM instances referencing that same spfile again. Note that the reversal of the GPnP profile change needed to be run in the 11.2.0.4 Grid Infrastructure Home as it was performed after the upgrade completed.

All done.

Footnotes

[1] – Attempting to create an ASM spfile in a diskgroup containing an in use (by any ASM instance) spfile, including via an alias results in either ORA-32002 or ORA-17502, depending on whether or not the diskgroup is specified for the spfile:

SYS@+ASM1> !asmcmd spget
+FRA/c01/ASMPARAMETERFILE/registry.253.843340551

SYS@+ASM1> !asmcmd find --type asmparameterfile + \*
+DATA/c01/ASMPARAMETERFILE/REGISTRY.253.825178021
+FRA/c01/ASMPARAMETERFILE/REGISTRY.253.843340551

SYS@+ASM1> create spfile from pfile='/tmp/asm.ora';
create spfile from pfile='/tmp/asm.ora'
*
ERROR at line 1:
ORA-32002: cannot create SPFILE already being used by the instance

SYS@+ASM1> create spfile='+FRA' from pfile='/tmp/asm.ora';
create spfile='+FRA' from pfile='/tmp/asm.ora'
*
ERROR at line 1:
ORA-17502: ksfdcre:4 Failed to create file +FRA
ORA-15268: internal Oracle file +FRA.253.1 already exists.

SYS@+ASM1> create spfile='+FRA/asmspfile.ora' from pfile='/tmp/asm.ora';
create spfile='+FRA/asmspfile.ora' from pfile='/tmp/asm.ora'
*
ERROR at line 1:
ORA-17502: ksfdcre:4 Failed to create file +FRA/asmspfile.ora
ORA-15268: internal Oracle file +FRA.253.1 already exists.

SYS@+ASM1>

Internet Access with VirtualBox & Host-only Networks (on OS X Mavericks)

Introduction

When creating VMs on my laptop I like to configure the minimum number network interfaces. I also tend to end up with environments where I want multiple VMs to be able to see each other, see the internet and see my physical host. It seems many people using VirtualBox use the approach of having a “Host-only Adapter” interface and a “NAT” interface. The only reason I have for not liking this is that it is possible for a “Host-only Adapter” to be able to access the wider world via the physical host and therefore I see the NAT interface as surplus to requirements.

In the past when running Snow Leopard I’d worked out that enabling “Internet Sharing” on the Mac allowed my VMs with “Host-only Adapter” to be routed out to through whatever network my Mac was connected to (assuming the network, resolv.conf, etc were appropriately configured on the VM). I’ve suggested this approach to others in the past without questioning what OS X version they were running. I had the odd report of it not working and people resorting to adding a NAT interface, but didn’t have opportunity to investigate.

Anyway, I got a new computer recently and it came with OS X Mavericks. On creating my first VM in VirtualBox I opted for my preferred approach of using “Host-only Adapter”. I then spent a while working out how to get my Mac to NAT the VMs. Here is the solution I came up with.

Commands

1. Enable IP forwarding:

$ sudo sysctl -w net.inet.ip.forwarding=1

If you want this to be persistent across reboots then you can add it to /etc/sysctl.conf.

2. Edit the pfctl configuration file (/etc/pf.conf) adding the following line below “nat-anchor”:

nat on en0 from vboxnet0:network -> (en0)

The above assumes that your Mac is connected to the internet over Airport/WiFi (en0) and that you want to allow the first VirtualBox “Host-only Network” (vboxnet0) to be NAT’d.

It is also possible to use natd & ipfw, as covered here, but they are deprecated in Mavericks, so you should probably adopt pfctl now.

3. Once pf.conf has been modified the file needs to be loaded:

$ sudo pfctl -f /etc/pf.conf

4. … And you need to ensure that PF is enabled:

$ sudo pfctl -e

The VMs will now be able to access the internet via their “Host-only Adapter” through the physical host.

Result

This gives me a situation where my Mac can see all my VMs, my VMs can see each other and importantly my VMs can get out to the internet – All with a single interface on each VM. KISS.

[Update] Persistent Through Reboot

As mentioned above the sysctl change can be made persistent through reboots by writing it to /etc/sysctl.conf, which you will probably need to create unless you’ve already been tinkering.

The changes to /etc/pf.conf will remain in place through restarts, but there are two changes required to ensure that PF is brought up automatically after startup. The first change is covered in section 2 of this Mavericks Server knowledge base article from Apple. The part that is relevant is:

$ sudo defaults write /System/Library/LaunchDaemons/com.apple.pfctl ProgramArguments '(pfctl, -f, /etc/pf.conf, -e)'

The chmod & ptutil commands that follow the above in the article were not required in my case as the permissions on the file were already as appropriate and the plist file was already in XML format. That said their is no harm in running them.

The other change you will need to make is to modify the syntax used in /etc/pf.conf. At the point PF is started the “vboxnet0″ interface will not exist and therefore PF will not be able to determine the required network information for them, which results in PF not being enabled. In order to avoid this problem it is necessary to switch to the following syntax (assuming your haven’t changed vboxnet0 from the default configuration):

nat on en0 from 192.168.56.0/24 -> (en0)

If you have changed from the default configuration or use multiple “Host-only Networks” in your VirtualBox environment then I’d imagine you can work out how to match the above to your environment.

Kdump to NFS in UEK (Solution)

I’ve previously written about a problem I encountered when kdump is configured to write to an NFS location with UEK (in Exadata software version 11.2.3.2.1). I’m please to report that the root cause of the problem has been identified and there is a very simple workaround.

There were some frustrating times working this particular SR, the most notable being a response that was effectively, “It works for me (and so I’ll just put the status to Customer Working).”

After a bit more to-ing and fro-ing it emerged that the environment where Oracle had demonstrated kdump could write to NFS had the NFS server on the same subnet as the host where kdump was being tested. After a quick test of my own, using a second compute node in the Exadata system as the NFS server, I confirmed that kdump was able to write to an NFS location on the same subnet in my environment as well.

Soon after reporting the above test in the SR I was pointed to MOS note 1533611.1, which unfortunately is not publicly available (yet) and so I cannot read it… The crux of the issue is that the network interface configuration files have BOOTPROTO=none and kdump is not handling this appropriately, which results in an incomplete network configuration for bond1 when switching to the dump kernel during a crash.

The fix: Change BOOTPROTO=none to BOOTPROTO=static

The part of all this that I find interesting is that the documentation for RHEL 5 and RHEL 6, and even RHEL 7, all list the following options for BOOTPROTO:

  • none
  • bootp
  • dhcp

“static” does not appear to be a formally valid value. In an attempt to find more information about the behaviour I looked for more details and only got as far as Red Hat BZ#805803 and BZ#802928, neither of which I can access directly, but I can see a summary here and here respectively.

In conclusion, it appears that the issue is actually a kdump bug. More specifically a mkdumprd bug. Thankfully the workaround is simple, it just took a long time to get to it.

Hidden Perils of Thin-style Service Name Syntax with UCP & FCF

I’ve touched on this issue before over at my previous blog and also talked about it at UKOUG Tech13. The presentation at Tech13 was more than 90% demos so hasn’t left a lasting record of the issue and the previous blog post doesn’t go into significant detail. I think it is worth highlighting what I see as a potentially very significant issue when configuring Oracle’s Universal Connection Pool (UCP) for Fast Connection Failover (FCF).

I’ll cut straight to the chase: If you use Thin-style Service Name Syntax (aka Easy Connect) in the URL for UCP when using FCF then you are leaving yourself open to wondering why your connection pool has not opened any connections on the instance that you just started the service it uses on. If you have enabled logging for UCP (11.2) then you would find yourself looking at the following:

oracle.ucp.jdbc.oracle.OracleDatabaseInstanceInfoList.getConnectionToNamedInstance URL invalid for connecting to named instance

The scenario in which you encounter this is specific: A service is started on an instance that didn’t offer the service at the time the connection pool was established.

A few examples of when this can happen are:

  1. You have a service that is defined as preferred on some instances and available on others then one of the instances where the service is preferred dies and it starts on an available instance.
  2. You have a service that is preferred on an instance, but that instance is not up and therefore not offering the service at the time the application connection pool is established.
  3. You have a service that has failed over to an available instance at the time the application connection pool is established and you then want to move the service from an available instance to a preferred instance without forcing a disconnect.

Returning to the error message “URL invalid for connecting to named instance” and considering the full syntax for Easy Connect, specifically the option to include the “instance_name”, this seems like a JDBC bug (as I think I recall Tim Hopkins suggesting).

So what’s the answer? Simply ensure that your clients use an Oracle Net connection descriptor. In case it needs stating: there is no problem with using the SCAN in the Oracle Net connection descriptor. I currently believe that using LDAP or TNS entries will also work fine as they ultimately result in the client having an Oracle Net connection descriptor, but I haven’t found the time to test them yet.

Anyway, I live in a world where restarting your application or relying on inactiveConnectionTimeout in order to establish connections to all instances that offer the relevant service would not be seen as acceptable, and rightly so. We have the technology available to allow us to move database services around without the application feeling it. You just need to make sure you have everything set up exactly how it needs to be.

It’s worth mentioning that Martin Bach recently published a post on Application Continuity where he also encountered a problem using the Thin-style Service Name Syntax.

12c Improvement in Database Service Management

Removing a service via srvctl has not historically resulted in the service being fully removed from the database and it would still be visible in DBA_SERVICES as show below in an 11.2.0.3 database:

Create the service:

$ srvctl add service -d orcl -s demo -r "ORCL1,ORCL2"

Have a look in DBA_SERVICES:

SYS@ORCL1> select name service_name from dba_services;

SERVICE_NAME
----------------------------------------------------------------
SYS$BACKGROUND
SYS$USERS
ORCL

Start the service:

$ srvctl start service -d orcl -s demo

Have another look in DBA_SERVICES:

SYS@ORCL1> select name service_name from dba_services;

SERVICE_NAME
----------------------------------------------------------------
SYS$BACKGROUND
SYS$USERS
ORCL
demo

Stop the service:

$ srvctl stop service -d orcl -s demo

Remove the service:

$ srvctl remove service -d orcl -s demo

Have yet another look in DBA_SERVICES:

SYS@ORCL1> select name service_name from dba_services;

SERVICE_NAME
----------------------------------------------------------------
SYS$BACKGROUND
SYS$USERS
ORCL
demo

So as you can see the service remains in the database and is visible through DBA_SERVICES.

The only way I’m aware of to remove the service from the database is via DBMS_SERVICE.DELETE_SERVICE:

SYS@ORCL1> exec dbms_service.delete_service('demo') 

PL/SQL procedure successfully completed.

SYS@ORCL1>

A final look in DBA_SERVICES:

SYS@ORCL1> select name service_name from dba_services;

SERVICE_NAME
----------------------------------------------------------------
SYS$BACKGROUND
SYS$USERS
ORCL

I can imagine you’re thinking, “So what?”

Well, this can have some knock on implications as I recently demonstrated during my presentation on Fast Connection Failover at UKOUG Tech13.

Anyway, I’m please to report that Oracle have done a bit of tidying up in 12c and now when you remove a service via srvctl it is also removed from the database (most of the time).

Repeating the tests above in a 12.1.0.1 database [Strictly I should be using the new syntax for srvctl in the 12c example that follows, but as the old syntax still works and is consistent with the 11.2.0.3 example above I've stuck with it here]:

Create the service:

$ srvctl add service -d orcl -s demo -r "ORCL1,ORCL2"

Have a look in DBA_SERVICES:

SYS@ORCL1> select name service_name from dba_services;

SERVICE_NAME
----------------------------------------------------------------
SYS$BACKGROUND
SYS$USERS
ORCLXDB
ORCL

Start the service:

$ srvctl start service -d orcl -s demo

Have another look in DBA_SERVICES:

SYS@ORCL1> select name service_name from dba_services;

SERVICE_NAME
----------------------------------------------------------------
SYS$BACKGROUND
SYS$USERS
demo
ORCLXDB
ORCL

Stop the service:

$ srvctl stop service -d orcl -s demo

Have another look in DBA_SERVICES:

SYS@ORCL1> select name service_name from dba_services;

SERVICE_NAME
----------------------------------------------------------------
SYS$BACKGROUND
SYS$USERS
demo
ORCLXDB
ORCL

It’s still there.

Remove the service:

$ srvctl remove service -d orcl -s demo

Have yet another look in DBA_SERVICES:

SYS@ORCL1> select name service_name from dba_services;

SERVICE_NAME
----------------------------------------------------------------
SYS$BACKGROUND
SYS$USERS
ORCLXDB
ORCL

SYS@ORCL1> 

It’s gone.

As I write this I realise most people (even those working with Oracle databases) aren’t going to be that trilled by this minor improvement, but it pleases me.

The question that immediately occurred to me is, “So what happens in the database is down when the service is removed?”

Add and start the service:

$ srvctl add service -d orcl -s demo -r "ORCL1,ORCL2"
$ srvctl start service -d orcl -s demo

Verify that it exists in DBA_SERVICES:

SYS@ORCL1> select name service_name from dba_services;

SERVICE_NAME
----------------------------------------------------------------
SYS$BACKGROUND
SYS$USERS
demo
ORCLXDB
ORCL

Stop the database, remove the service and start the database:

$ srvctl stop database -d orcl 
$ srvctl remove service -d orcl -s demo
$ srvctl start database -d orcl

Reconnect to the database and have a look in DBA_SERVICES:

SYS@ORCL1> conn / as sysdba
Connected.
SYS@ORCL1> select name service_name from dba_services;

SERVICE_NAME
----------------------------------------------------------------
SYS$BACKGROUND
SYS$USERS
demo
ORCLXDB
ORCL

SYS@ORCL1> 

So in this situation you’re left to clear things up with DBMS_SERVICE yourself.

At the risk of going overboard with testing scenarios I also wondered what would happen if only the local instance was down. I’ll save you the output and simply stated that if the local instance is down then the service is still cleaned up in the database (by oraagent.bin).

Kdump to NFS Broken in UEK

There were problems that affected UEK and NFS when 11.2.3.2.1 was initially released (as covered by Andy Colvin). As mentioned in the comments of Andy’s post: Oracle released an updated ISO with fixes for this problem (patch 16432033).

There were also problems with kdump not functioning after 11.2.3.2.1 as listed in “Issue 1.17″ of Oracle MOS “Exadata 11.2.3.2.1 release and patch (14522699) for Exadata 11.1.3.3, 11.2.1.2.x, 11.2.2.2.x, 11.2.2.3.x, 11.2.2.4.x, 11.2.3.1.x, 11.2.3.2.x (Doc ID 1485475.1)”.

After updating to 11.2.3.2.1 using the updated ISO (with NFS fixes) and installing the later version of kexec-tools as per the fix for “Issue 1.17″ I was not able to get kdump to write to NFS during a crash.

I had previously tested kdump to NFS when running Exadata software version 11.2.2.4.2, so I knew that worked. 11.2.2.4.2 uses the RHEL kernel (“non-UEK” as Oracle would have you say) so I decided to test 11.2.3.2.1 after switching to the RHEL kernel, which also involves reverting the kexec-tools to the earlier version. That confirmed the issue was only evident when using UEK. Time for an SR…

Bug 17730336 – UNABLE TO KDUMP OVER NFS WITH KEXEC-TOOLS-2.0.3-3.0.4.EL5.X86_64.RPM INSTALLED

The bug listed above has been raised, but no fix has been supplied yet.

Not being able to direct kdump to NFS is probably not going to be your biggest worry, but definitely something to be aware of if you’re running UEK.

Note that I have only tested with 2.6.32-400.21.1el5uek. Oracle have confirmed they have reproduced the issue with 2.6.39-400.126.1.el5uek. Other UEK kernels may or may not be affected.

Sudo Keystoke Optimisation

If like me, and a couple of others I’ve spoken to recently, you were not previously aware of “sudo -i”[1] then you might be interested to know that you can save yourself two keystokes by switching from:

sudo su -

To:

sudo -i

From the man page:

‑i‑-login

Run the shell specified by the target user’s password database entry as a login shell. This means that login-specific resource files such as .profile or .login will be read by the shell. If a command is specified, it is passed to the shell for execution via the shell’s ‑c option. If no command is specified, an interactive shell is executed. sudo attempts to change to that user’s home directory before running the shell. The command is run with an environment similar to the one a user would receive at log in. The Command Environment section in the sudoers(5) manual documents how the ‑i option affects the environment in which a command is run when the sudoers policy is in use.

Simple, but useful if you’re someone that enjoys the quest to reduce keystokes.

Footnotes
[1] – This post was originally written about “-s”, but as pointed out by Paul in the comments this is not equivalent to “sudo su -”.

UKOUG Tech13

I will be presenting on two topics at the UKOUG Tech13 conference in Manchester.

Goodbye KVM… Hello KVM - Monday (2nd December) @ 16:50 in Exchange 7 (45 mins) - If you use virtualisation in your “home lab”, but have never considered KVM then this session is aimed at you.

Pitfalls, Pain and Pleasure with RAC Connectivity - Wednesday (4th December) @ 08:30 in Exchange 10 (45 mins) - If you plan to implement Fast Connection Failover (FCF) for your connection pools to 11gR2 databases then there are some valuable “lessons learnt” in this presentation. If you’ve already implemented FCF to 11gR2 databases there’s still probably a few useful points covered.

Hope to see you there.

Creating That Just Scanned Look

Have you ever found yourself being asked to provide a scan of a document that you’ve printed and signed? Have you ever found yourself in that position without a printer and/or scanner?

That happened to me recently. I’d been asked to provide a signed copy of document and so I pasted an image of my signature in the appropriate place in the electronic (word processor) document, generated a PDF from the document and emailed it to those concerned.

An hour or so later I got a response stating that the document was not acceptable as it had been “electronically signed”. I responded pointing out that it was a little behind the times to be demanding scans of hand signed documents and asked whether or not it would be acceptable if I took the time to make the PDF appear to be a scan of a piece of paper that I had signed.

I didn’t get a written response to my question, but received a phone call explaining why asking for a scan of a signed copy of a print out of a word processed document was to protect me from fraudulent activity. After questioning what additional security regarding my identity they were getting, considering they already had a scan of my passport, I took on the challenge of creating a PDF that appeared to be scanned…

May I introduce the excellent www.lookslikescanned.com. As it says, “Beat the bureaucrats.”

I sent the new document and shortly after was informed that everything was now in order. What a mad world!

It seems worth pointing out that the PDF you download from www.lookslikescanned.com after it has been “scanned” has some embedded data that could give you away if the person receiving the PDF is really on the ball. Having spotted this it seemed prudent to generate a PDF from the PDF before emailing it.

Resource Manager Privilege Management Bug

When working with Oracle Database Resource Manager recently I ran into what seemed like unexpected behaviour and turned out to be bug 13101791 as documented in MOS ID 13101791.8. The note states workaround as “none”, but that depends on what you view as an acceptable workaround.

The bug description starts:

The Resource Manager consumer group is not changed to the right one when it is granted via a role.

That gives a high level view of the issue, but there are some subtleties that seem worth noting.

Demonstration

  • SESSION 1 is SYS and is used to grant privileges and manage the Resource Manager configuration
  • SESSION 2 is a test user named MARTIN and is used to verify the Resource Manager behaviour by logging in and checking the assigned consumer group

Validation of the current privileges for switching consumer group

SESSION 1> select * from dba_rsrc_consumer_group_privs;

GRANTEE      GRANTED_GROUP                  GRA INI
------------ ------------------------------ --- ---
PUBLIC       DEFAULT_CONSUMER_GROUP         YES YES
SYSTEM       SYS_GROUP                      NO  YES
PUBLIC       LOW_GROUP                      NO  NO

3 rows selected.

Validation of the current privileges for switching consumer group

SESSION 1> select * from dba_rsrc_group_mappings;

ATTRIBUTE         VALUE        CONSUMER_GROUP   STATUS
----------------- ------------ ---------------- -------
SERVICE_NAME      DBRM_SRV1    GOLD_GROUP
ORACLE_USER       SYS          SYS_GROUP
ORACLE_USER       SYSTEM       SYS_GROUP
ORACLE_FUNCTION   BACKUP       BATCH_GROUP
ORACLE_FUNCTION   COPY         BATCH_GROUP
ORACLE_FUNCTION   DATALOAD     ETL_GROUP

6 rows selected.

Connection as user MARTIN to service DBRM_SRV1 and verification of consumer group

SESSION 2> select username, resource_consumer_group, service_name from v$session where sid = sys_context('userenv','sid');

USERNAME   RESOURCE_CONSUMER_GROUP   SERVICE_NAME
---------- ------------------------- ---------------
MARTIN     OTHER_GROUPS              dbrm_srv1

Nothing unexpected here – MARTIN does not have permission to switch to GOLD_GROUP, so he can’t.

Granting MARTIN permission to switch to GOLD_GROUP directly

SESSION 1> exec dbms_resource_manager_privs.grant_switch_consumer_group(grantee_name => 'martin',consumer_group => 'GOLD_GROUP',grant_option => FALSE)

PL/SQL procedure successfully completed.

Querying V$SESSION for MARTIN (no reconnection)

SESSION 2> select username, resource_consumer_group, service_name from v$session where sid = sys_context('userenv','sid');

USERNAME   RESOURCE_CONSUMER_GROUP   SERVICE_NAME
---------- ------------------------- ---------------
MARTIN     GOLD_GROUP                dbrm_srv1

Revoking from MARTIN permission to switch to GOLD_GROUP directly

SESSION 1> exec dbms_resource_manager_privs.revoke_switch_consumer_group(revokee_name => 'martin',consumer_group => 'GOLD_GROUP')

PL/SQL procedure successfully completed.

Querying V$SESSION for MARTIN (no reconnection)

SESSION 2> select username, resource_consumer_group, service_name from v$session where sid = sys_context('userenv','sid');

USERNAME   RESOURCE_CONSUMER_GROUP   SERVICE_NAME
---------- ------------------------- ---------------
MARTIN     OTHER_GROUPS              dbrm_srv1

Again, not too surprising here – Granting and revoking the privilege to be in GOLD_GROUP to user MARTIN results in him moving both into and out of the GOLD_GROUP.

Creating a ROLE named APP_ROLE and granting that to MARTIN

SESSION 1> create role app_role;

Role created.

SESSION 1> grant app_role to martin;

Grant succeeded.

Enable the newly granted APP_ROLE for user MARTIN in current session

SESSION 2> set role all;

Role set.

SESSION 2> select role from session_roles;

ROLE
-----------------------
SELECT_CATALOG_ROLE
HS_ADMIN_SELECT_ROLE
APP_ROLE

Granting APP_ROLE permission to switch to GOLD_GROUP directly

SESSION 1> exec dbms_resource_manager_privs.grant_switch_consumer_group(grantee_name => 'app_role',consumer_group => 'GOLD_GROUP',grant_option => FALSE)

PL/SQL procedure successfully completed.

Querying V$SESSION for MARTIN (no reconnection)

SESSION 2> select username, resource_consumer_group, service_name from v$session where sid = sys_context('userenv','sid');

USERNAME   RESOURCE_CONSUMER_GROUP   SERVICE_NAME
---------- ------------------------- ---------------
MARTIN     OTHER_GROUPS              dbrm_srv1

Validation of the current privileges for switching consumer group

SESSION 1> select * from dba_rsrc_consumer_group_privs;

GRANTEE      GRANTED_GROUP                  GRA INI
------------ ------------------------------ --- ---
PUBLIC       DEFAULT_CONSUMER_GROUP         YES YES
APP_ROLE     GOLD_GROUP                     NO  NO
SYSTEM       SYS_GROUP                      NO  YES
PUBLIC       LOW_GROUP                      NO  NO

4 rows selected.

Testing switching consumer group manually using DBMS_SESSION.SWITCH_CURRENT_CONSUMER_GROUP

SESSION 2> var old_group varchar2(10)
SESSION 2> exec dbms_session.switch_current_consumer_group('GOLD_GROUP',:old_group,FALSE)

PL/SQL procedure successfully completed.

Querying V$SESSION for MARTIN (no reconnection)

SESSION 2> select username, resource_consumer_group, service_name from v$session where sid = sys_context('userenv','sid');

USERNAME   RESOURCE_CONSUMER_GROUP   SERVICE_NAME
---------- ------------------------- ---------------
MARTIN     GOLD_GROUP                dbrm_srv1

So the consumer group was successfully switched

Revoking from APP_ROLE permission to switch to GOLD_GROUP directly

SESSION 1> exec dbms_resource_manager_privs.revoke_switch_consumer_group(revokee_name => 'app_role',consumer_group => 'GOLD_GROUP')

PL/SQL procedure successfully completed.

Querying V$SESSION for MARTIN (no reconnection)

SESSION 2> select username, resource_consumer_group, service_name from v$session where sid = sys_context('userenv','sid');

USERNAME   RESOURCE_CONSUMER_GROUP   SERVICE_NAME
---------- ------------------------- ---------------
MARTIN     OTHER_GROUPS              dbrm_srv1

Summary

When the privilege to switch consumer group is granted to a user and a session is connected to a service that maps to the same consumer group the session is automatically switched to the consumer group. However, when the privilege to switch consumer group is granted to a role the same behaviour is not observed and the session is only switched to the consumer group when specified to do so manually.

Conclusions

MOS 13101791.8 states that the bug is fixed in 12.1.0.1 and 11.2.0.4. I haven’t tested to verify this. I expect the vast majority of organisations are not yet running these versions and so it seems worth discussing workarounds. The two options that occur to me are:

  1. Granting the privilege to switch at a user level – Depending on the number of database users, the rate of new users and the method with which they are provisioned this could be more or less painful.
  2. Use a logon trigger to call dbms_session.switch_current_consumer_group to switch to the desired consumer group – I’m less keen on this option, but the appropriateness over workaround (1) would depend on the specifics of the environment.