Friday, June 29, 2012

DBMS_NETWORK_ACL_ADMIN.ASSIGN_ACL - when a wildcard becomes a villain

Fine-grained access control to the UTL_TCP, UTL_SMTP, UTL_MAIL, UTL_HTTP, or UTL_INADDR packages using Oracle XML DB is one of the big changes introduced with Oracle Database 11g.

What was controlled in the past through simple execute privilege, now is controlled through access control list (ACL), that requires Oracle XML DB installed.

A database supported by my team has a stored procedure that uses the package UTL_INADDR. This stored procedure is used by the application to retrieve the local database server IP during user log in.

As a required taskFine-Grained access to external network services was configured after upgrading the database to 11g, and everything was working perfectly, until it failed :o( Suddenly it just stopped to work, with the application receiving the error ORA-24247: network access denied by access control list (ACL) when a user was trying to establish a new connection to the database.

When I was engaged, my first thought was: well, someone removed the access. For my surprise it was not removed, but there was a new ACL: oracle-sysman-ocm-Resolve-Access.xml - take a look at the events that led to the error:

1- The stored procedure calls the function UTL_INADDR.get_host_address without any parameter. So it will return the IP address of the local host.

2- After upgrading the database a new Access Control List (ACL) was created and the privilege "resolve" was granted to the application user.

3- That new ACL was assigned to the database host computer using an * as the host name, instead of using the individual host name with its domain.

4- Months later, the primary DBA for the database was working with Oracle to resolve an ORA-600 error. He was asked to run the Remote Diagnostic Agent (RDA).

5- RDA was installed and set to configure Oracle Configuration Manager (OCM) in connected mode. At this point the problem started, as OCM created the ACL oracle-sysman-ocm-Resolve-Access.xml, and assigned it to the database host using its name.

From Oracle documentation: "The ACL assigned to a domain takes a lower precedence than the other ACLs assigned to sub-domains, which take a lower precedence than the ACLs assigned to the individual hosts". For example, the following hosts are listed in decreasing order of precedence:

localhost.localdomain
*.localdomain
*

Therefore, the ACL oracle-sysman-ocm-Resolve-Access.xml took precedence over the ACL created after the database upgrade, and as the application user was not part of oracle-sysman-ocm-Resolve-Access.xmlOracle was just working as it should, i.e., denying the access for a non-authorized user.

Probably the reason for using * instead of the host name was a misunderstanding about what host names and IP addresses the application could retrieve - think about a request like this: the application must be able to retrieve the host name or IP adress of any remote host.

Well, wildcards are not villains (I cannot imagine UNIX without them), we just need to take care about how to use them.

Sunday, June 17, 2012

When checking Oracle Support at first is not a good idea!

That was not complex, but it was trick.

The DBA was running the installer of Oracle Database 11g Release 2 for Linux 64 bits, and it was failing with this error:

An internal error occurred within cluster verification framework
Unable to get the current group.

The install log file provided more information:

SEVERE: [FATAL] [INS-32038] The operating system group specified for central inventory (oraInventory) ownership is invalid.
CAUSE: User performing installation is not a member of the operating system group specified for central inventory(oraInventory) ownership.
ACTION: Specify an operating system group that the installing user is a member of. All the members of this operating system group will have write permission to the central inventory directory (oraInventory).

At that point the DBA decided to check Oracle Support and found this note:

HP-UX: 11gR2 runInstaller Fails with "An internal error occurred within cluster verification framework" [ID 983713.1], stating the cause is due to OS patch PHCO_40381 not installed. Although the problem reported to Oracle Support was related to HP-UX Itanium, the DBA started to wonder if he was not facing a similar problem with Linux.

Actually, there is no missing patch for Linux or even an issue with Oracle installer - the issue was resolved with a simple log off and log in.

The DBA installing Oracle created 3 groups: oracle, dba, and oinstall - ids 502, 503, and 504 respectively. By mistake the user oracle was created with oracle as the primary group, instead of oinstall (we are not using Job Role Separation).
/usr/sbin/useradd -u 501 -g oracle -G dba,oinstall oracle
Realizing the mistake the DBA removed the group oracle, but did not log in again. So, the id command was reporting something like this:

uid=501(oracle) gid=502 groups=504(oinstall),503(dba),502

Note the missing group name after the gid=502 - the correct user setting will report something like this:

uid=501(oracle) gid=504(oinstall) groups=503(dba), 504(oinstall)

It caused the error Unable to get the current groupBottom line is: sometimes we start searching for solutions too fast, without a full comprehension about the issue. Of course Oracle Support (and Google) is an invaluable source of information, but in this case the search produced an answer that led to waste time.

I have a statement I like to keep in mind: who does not know what looks for, does not understand what finds :o)

Monday, June 11, 2012

Not AUTO segment space management error during upgrade to 11g

I must admit my only thought about the statements used to check SYSAUX attributes in the catupstr (CATalog UPgrade STaRt called when upgrading a database) script was that they could be more elegant. Something like using an anonymous block to check and raise the error if necessary. Never realized it could fail, as Oracle states clearly and concisely how the SYSAUX tablespace must be created: local with segment space management auto.

These are the checking statements related to SYSAUX performed by the script:

SELECT TO_NUMBER('No SYSAUX tablespace') FROM dual
WHERE 'SYSAUX' NOT IN (SELECT name from ts$);

SELECT TO_NUMBER('Not ONLINE for READ/WRITE') from ts$
WHERE name='SYSAUX' AND online$ !=1;

SELECT TO_NUMBER ('Not PERMANENT') from ts$
WHERE name='SYSAUX' AND
(contents$ !=0 or (contents$ = 0 AND bitand(flags, 16)= 16));

SELECT TO_NUMBER ('Not LOCAL extent management') from ts$
WHERE name='SYSAUX' AND bitmapped = 0;

SELECT TO_NUMBER ('Not AUTO segment space management') from ts$
WHERE name='SYSAUX' AND bitand(flags,32) != 32;

While trying to upgrade a database to 11g it failed as the SYSAUX tablespace was created with manual segment space management. Well, there is not much to do other than recreate the database. This is an old database, created with 9i and upgraded manually to 10g. But there is a mystery: in order to upgrade to 10g the SYSAUX creation is mandatory, and the 10g upgrade script performs the same checking as 11g. So, how come the database was upgraded to 10g? Answer: creativity!

The DBA running the upgrade to 10g just commented the AUTO segment space management checking. My best guess for this situation is:

- the SYSAUX was created in 9i with manual space management
- the database was started for upgrade with the compatible parameter set to 10
- when the DBA started the catupgrd script it just failed, as the SYSAUX did not meet the requirements
- At this point the DBA was stuck: it was not possible to start the database with 9i to recreate the SYSAUX, neither to proceed with the upgrade to 10g.

Instead of restoring the database backup and scheduling a new outage for the upgrade, the DBA decided for the creative path – this is the kind of not appreciated innovation :o)

About The COMPATIBLE Initialization Parameter