Wednesday, August 22, 2012

Changes...

Sometimes we get so used to the way things are that we forget things change, and normally such changes get us out of our comfort zone. Last Saturday I was working on another project, installing a 3 nodes RAC database, 11g release 2. Oracle has improved its installers, so normally they are able to do a good job checking and fixing grid and database pre-requisites, especially on Linux platforms. But this time I could not enjoy the convenience of a GUI as I was working remotely through (a slow) VPN. So before running a silent installation I set manually the SSH configuration between the nodes and ran the cluster verify utility:

 

 

node1:grid: ./runcluvfy.sh stage -pre crsinst –n node1,node2,node3 -fixup

 

WARNING:

Could not access or create trace file path "/tmp/bootstrap/cv/log". Trace information could not be collected

 

Performing pre-checks for cluster services setup

 

Checking node reachability...

node1.domain: node1.domain

 

Check: Node reachability from node "null"

  Destination Node                    Reachable?

  ------------------------------------  ------------------------

  node1                                        no

  node2                                        no

  node3                                        no

 

Result: Node reachability check failed from node "null"

 

ERROR:

Unable to reach any of the nodes

Verification cannot proceed

 

 

At first it seems a DNS problem, but:

 

node1:grid: /usr/bin/nslookup node1

Server:         10.xxx.x.254

Address:        10.xxx.x.254#53

 

Name:   node1

Address: 200.xxx.xxx.54

 

 

Ping between nodes worked…

 

node1:grid: ping 200.xxx.xxx.54

PING 200.xxx.xxx.54 (200.xxx.xxx.54) 56(84) bytes of data.

64 bytes from 200.xxx.xxx.54: icmp_seq=1 ttl=64 time=0.027 ms

64 bytes from 200.xxx.xxx.54: icmp_seq=2 ttl=64 time=0.026 ms

 

 

Google returned 2 or 3 posts, but all them related to wrong server/domain names or the server name not set correctly in the DNS.

 

Finally I found the problem: the file /etc/nsswitch.conf was wrong. DNS was not part of the Name Service resolution :o(

 

The file should have a line like this:

hosts:  dns files

 

However the "hosts:" service line was:

hosts:  files

And the server names were not defined in the /etc/hosts files.

 

The Installation Guides did not make any mention to nsswitch.conf file. I found it on this book: Oracle® Database 2 Day + Real Application Clusters Guide topic 8.

 

The funny thing is the file was wrong on just one node (that node was used to run some connectivity tests before installing Oracle). Well, documentation is my friend, and I got one more item for my checklist – things change: all the time ;o)