Archive for February, 2010

How linux LVM snapshots work, part 1

February 12, 2010



Creating an example volume

Creating a snapshot

Data consistency



Since everything I’ll show is based on linux LVM, If you haven’t used any LVM before, it is necessary to read LVM HOWTO to familiarize yourself with LVM concepts, and if you used one, but not on linux, it’s still good have a look there for available commands and naming.

Before we can talk about all the cool things that can be done with Oracle and LVM, we’ll need to talk about LVM snapshots themselves. Not only what they do, but also how they work.

And since in linux LVM is just a frontend (much more user-friendly) to a subsystem called Device Mapper, we’ll focus on that backend part more, because at some point we’ll reach the limits of LVM and will be forced to use device mapper only.

In part one we’ll create a small test environment to show how snapshots work and in part two I’ll discuss the internals and more specific snapshot features.

Creating an example volume

First, let’s create a demo volume and take a look at it.

[root@tullus ~]# lvcreate -n demo -L 100M vg00
Rounding up size to full physical extent 128,00 MB
Logical volume "demo" created

And since LVM is just a frontend to device mapper, let’s have a look what our command really did:
it created an entry in /dev/mapper directory named VolumeGroupName-LogicalVolumeName

[root@tullus ~]# ls -l /dev/mapper/*demo*
brw-rw---- 1 root disk 253, 11 feb 11 10:56 /dev/mapper/vg00-demo

This entry coresponds to the following map

[root@tullus ~]# dmsetup table /dev/mapper/vg00-demo
0 262144 linear 9:2 872677760

The fields mean:
0 – first block of device vg00-demo that the entry describes
262144 – number of blocks in the entry
linear – type of mapping – linear means 1:1 block mapping
9:2 – major and minor of the source device
872677760 starting block on the source device

So what this map really means? It means that block 0 of vg00 will really be block 872677760 of device 9:2 (/dev/md2 in my case) and so on. The map can contain many entries. If we extend our volume by another 128MB, we’ll see it

[root@tullus ~]# lvextend -L+128M /dev/vg00/demo /dev/hdb5
Extending logical volume demo to 256.00 MB
Logical volume demo successfully resized
[root@tullus ~]# dmsetup table vg00-demo
0 262144 linear 9:2 872677760
262144 262144 linear 3:69 628752768

So now there are two entries with the second saying, that 262144 blocks (128MB) of device vg00-demo starting from block 262144 are really mapped to device 3:69 starting from block 628752768 of /dev/hdb5 and so on.

We can now create a filesystem on it and put some contents there.

[root@tullus ~]# mkfs.ext3 -m 0 /dev/vg00/demo
mke2fs 1.39 (29-May-2006)
Filesystem label=
OS type: Linux
Block size=1024 (log=0)
Fragment size=1024 (log=0)
65536 inodes, 262144 blocks
0 blocks (0.00%) reserved for the super user
First data block=1
Maximum filesystem blocks=67371008
32 block groups
8192 blocks per group, 8192 fragments per group
2048 inodes per group
Superblock backups stored on blocks:
8193, 24577, 40961, 57345, 73729, 204801, 221185

Writing inode tables: done
Creating journal (8192 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 26 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.
[root@tullus ~]# mkdir /mnt/demo
[root@tullus ~]# mount /dev/vg00/demo /mnt/demo
[root@tullus ~]# echo "This is the first file" > /mnt/demo/first
[root@tullus ~]# echo "This is the second file" > /mnt/demo/second
[root@tullus ~]# ls /mnt/demo -l
total 14
-rw-r--r-- 1 root root 23 feb 11 11:12 first
drwx------ 2 root root 12288 feb 11 11:11 lost+found
-rw-r--r-- 1 root root 24 feb 11 11:12 second

Creating a snapshot

When we have out test subject ready, we can briefly look at what snapshots do.

[root@tullus ~]# lvcreate -s -n demo_snap -l 1 /dev/vg00/demo
Logical volume "demo_snap" created
[root@tullus ~]# lvscan | grep demo
ACTIVE Original '/dev/vg00/demo' [256,00 MB] inherit
ACTIVE Snapshot '/dev/vg00/demo_snap' [64,00 MB] inherit

As we can see, the snapshot was created and is one extent in size. From the operating system point of view it’ll be a block device containing the information from /dev/vg00/demo from the moment the snapshot was taken.

Let’s mount it and make some changes to our files.

[root@tullus ~]# mkdir /mnt/demo_snap
[root@tullus ~]# mount /dev/vg00/demo_snap /mnt/demo_snap
[root@tullus ~]# df -hP | grep demo
/dev/mapper/vg00-demo 248M 11M 238M 5% /mnt/demo
/dev/mapper/vg00-demo_snap 248M 11M 238M 5% /mnt/demo_snap
[root@tullus ~]# ls -l /mnt/demo_snap/
total 14
-rw-r--r-- 1 root root 23 feb 11 11:12 first
drwx------ 2 root root 12288 feb 11 11:11 lost+found
-rw-r--r-- 1 root root 24 feb 11 11:12 second

The contents are the same – it even claims that there’s the same amount of free space as on the original volume, which may be illusive, but we’ll get to that.

[root@tullus ~]# echo "changing second file on snapshot" >> /mnt/demo_snap/second
[root@tullus ~]# echo "changing first file on original volume" >> /mnt/demo/first

So we changed one file on the original volume and the other on the snapshot. The results are:
[root@tullus ~]# cat /mnt/demo/first
This is the first file
changing first file on original volume
[root@tullus ~]# cat /mnt/demo_snap/first
This is the first file

[root@tullus ~]# cat /mnt/demo/second
This is the second file
[root@tullus ~]# cat /mnt/demo_snap/second
This is the second file
changing second file on snapshot

So we can see that both devices are independent from each other and changes made on one of them don’t show up on the other.

Data consistency

I won’t discuss this issue in details because this subject generally is quite broad, but a few words have need to be said. When taking a snapshot, from a filesystem’s point of view it’s like pulling the plug on the drive at that moment, so similar restrictions apply, since we mostly take snapshots of mounted filesystems. Therefore in the beginning it will be in an inconsistent state on the snapshot. For journaled filesystem like ext3 it is not a problem, since automatic recovery will take place upon mount, but i.e. ext2 will complain about it:
EXT2-fs warning: mounting unchecked fs, running e2fsck is recommended

It is (or at least I think so) safe if ext3 is mounted with data=ordered (default) or data=journal options, but I’m not sure what will happen when data=writeback is used – the man page for mount says that it may cause data (not internal metadata) consistency issues after a crash, so the same probably applies to snapshots, but for the same reason you probably don’t want to use it to store your database in the first place;)

Another thing worth thinking about is sync. I’m not sure, but I think that lvm syncs the volume before taking a snapshot, and personally I don’t call sync before creating snapshots, but it may be worth considering.

Summing up – when using default settings on Centos for ext3 (data=ordered) I’ve never had and data corruption problems so far,  but it’s always a good idea to have data consistency in mind.


Ok, so what are snapshots good for? Very many things

  • consistent filesystem/database backups – at least thats what I see people most often do
  1. Create a snapshot
  2. Take a backup based on a snapshot
  3. Delete a snapshot

It’s probably the reason snapshots were created in the first place, and it doesn’t even require them to be writeable – in fact in AIX they are read only by design, unfortunately.

This usage however implies that snapshots have mostly short lifespan and don’t contain important data. It is probably the reason snapshot handling in LVM lacks few features that could be very useful. Most painful is that you can’t move a snapshot using pvmove, you can’t mirror a snapshot and they don’t grow automatically when they fill up, so if they fill up, they will fail, but in those usage cases it won’t matter anyway. You also can’t copy a snapshot and take a snapshot of a snapshot (but the other is device mapper limit, not LVM).

But since in linux they are writeable, more interesting things can be done with them, but that is also when lack of above mentioned features becomes a pain…

to be continued…


Configuring Oracle Gateway 10.2 for ODBC for PostgreSQL

February 4, 2010

I’ll describe here how to configure an Oracle ODBC gateway to connect to a postgres database. As it turns out, it’s fairy straightforward – if it works 🙂 With 10.2 these are three simple steps and voila 😉 With 11.1 there are very similar steps, but so far I didn’t manage to make it work :).

The process involves three steps and it’s good to complete and test them one at a time, so that any occurring problems are easier to locate and fix.

#1. Check connectivity with the database

Ensure that we can connect to the database with a native client:

[root@ora ~]# psql -h -p 5432 -d callcenter -U cc
Welcome to psql 8.3.1 (server 8.1.11), the PostgreSQL interactive terminal.
Type: \copyright for distribution terms
\h for help with SQL commands
\? for help with psql commands
\g or terminate with semicolon to execute query
\q to quit
WARNING: You are connected to a server with major version 8.1,
but your psql client is major version 8.3. Some backslash commands,
such as \d, might not work properly.


So this works. Most common problems here could be firewalls not allowing the connection or missing configuration entry in the pg_hba.conf

#2. Configure and check ODBC

Next we have to configure ODBC connectivity. I’m using unixODBC package supplied with Centos 5. It’s system-wide configuration is in /etc/odbc.ini and I’ll be using that file, but if you don’t have access to it, you can use ~/.odbc.ini in oracle owner home directory. Actually you can use any filename as long as you configure Oracle to use it too, but only these two are seen by isql and operating system ODBC tools.

We add a  section describing our datasource:


Driver = PostgreSQL
Database = callcenter
Servername =
Port = 5432
ReadOnly = No
Username = cc
Password = xxxxx

We could also use the ODBCConfig tool from unixODBC-kde to configure it “the clicky way”. Anyway, now we can test if ODBC works:

[root@ora ~]# isql cctest
| Connected!                            |
|                                       |
| sql-statement                         |
| help [tablename]                      |
| quit                                  |
|                                       |

And it works. At least for me – I really don’t have any troubleshooting experience with ODBC, because it worked for me out-of-the-box, but if something is wrong at this point, recheck the config file and if it’s ok – strace is your friend – this tool can sometimes provide some answers really quickly.

#3. Configure Oracle Gateway

Configuring Oracle is also a three step process:

  • create initSID.ora file for the agent
  • setup a listener and a service
  • add a TNS entry pointing to our listener

First I’ll show it for the 10.2 engine and then point out the differences between 11.1.

Create initSID.ora

In $ORACLE_HOME/hs/admin we create initSID.ora file, where SID is the name for an instance of Heterogeneous Services Agent that will serve out ODBC datasource. It doesn’t have to be the same as datasource name, but here it is for simplicity sake.


set ODBCINI=/etc/odbc.ini

The HS_FDS_CONNECT_INFO however points to our datasource and so it has to be the same as in /etc/odbc.ini and HS_FDS_SHAREABLE_NAME should point to the actual driver.

Configure the listener

Next we add a new listener and a service (we can also use an existing listener if we want)


        (ADDRESS_LIST =
                        ADDRESS= (PROTOCOL=tcp)
                        (HOST = ora)
                        (PORT = 1526)

        (SID_LIST =
                (SID_DESC =
                        (SID_NAME = cctest)
                        (ORACLE_HOME = /u00/oracle/product/10.2.0/db_1)
                        (PROGRAM = hsodbc)

The SID_NAME parameter must match the SID from initSID.ora file, not necessarily the name of a datasource

And now start the listener:

oracle@ora[DB25]:/u00/oracle/product/10.2.0/db_1/network/admin> lsnrctl start listenerhs 

LSNRCTL for Linux: Version - Production on 04-LUT-2010 21:41:02

Copyright (c) 1991, 2006, Oracle.  All rights reserved.

Starting /u00/oracle/product/10.2.0/db_1/bin/tnslsnr: please wait...

TNSLSNR for Linux: Version - Production
System parameter file is /u00/oracle/product/10.2.0/db_1/network/admin/listener.ora
Log messages written to /u00/oracle/product/10.2.0/db_1/network/log/listenerhs.log
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=ora)(PORT=1526)))

Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=ora)(PORT=1526))
Alias                     listenerhs
Version                   TNSLSNR for Linux: Version - Production
Start Date                04-LUT-2010 21:41:02
Uptime                    0 days 0 hr. 0 min. 0 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u00/oracle/product/10.2.0/db_1/network/admin/listener.ora
Listener Log File         /u00/oracle/product/10.2.0/db_1/network/log/listenerhs.log
Listening Endpoints Summary...
Services Summary...
Service "cctest" has 1 instance(s).
  Instance "cctest", status UNKNOWN, has 1 handler(s) for this service...
The command completed successfully

Add a TNS entry

And all that’s left is to add a line to tnsnames.ora pointing to our listener:

cctest = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = ora) (PORT = 1526)) (CONNECT_DATA = (SID = cctest)) (HS = OK))

For Heterogeneous Services to work, the HS = OK part must be present.

Now we can create a database link to a Postgres  database:


And now we can do SELECT * from “sometable”@cctest;

User and object names must be quoted. I’ll explain why at a later time.