Contact the author.
HOWTO use OpenDLM as the locking manager for OpenGFS (V0.04) Authors: Ben Cahill Stanley Wang This document describes how to use OpenDLM as the locking manager in an OpenGFS cluster. It provides a simple example configuration for a 2-node cluster. Within this document, we'll try to provide the basics of getting started, without the need to study the various components before setting up OpenGFS. However, you *should* study the projects sometime! Recommended reading: OpenGFS: WHATIS-opengfs, HOWTO-nopool (some steps are required reading!) OpenDLM: WHATIS-opendlm, dlmbook_final.pdf (Programmer's Guide) HOWTO (Build, Install, and Configure OpenDLM ... required reading!) You can find OpenDLM docs at: http://opendlm.sourceforge.net/docs.php SOFTWARE COMPONENTS: -------------------- OpenDLM is a Distributed Lock Manager, and provides an alternative to the single point of failure characteristic of OpenGFS' legacy "memexp" locking protocol; even though memexp's lock management is distributed among the computer nodes, memexp has only a single lock storage server. In contrast, OpenDLM distributes both lock management *and* lock storage among all of the computer nodes in the cluster. If one of the nodes crashes, recovery of relevant lock state is possible by the surviving nodes. See the HOWTO doc on the OpenDLM project site for information on the software components it depends on (linux-ha heartbeat or ccm, and libnet). BUILDING AND INSTALLING OPENDLM AS LOCK SERVICE FOR OPENGFS ----------------------------------------------------------- The following instructions should cover all types of Linux distributions, since they describe how to download source code tarballs and build from scratch. For best results, we recommend following this download/build procedure on each machine in the cluster (rather than building on a single build machine, then installing in the cluster machines). 1. Patch your kernel for OpenGFS, and build the kernel: See steps 1. and 2. in OpenGFS' HOWTO-nopool doc, then return here. IMPORTANT: Use OpenGFS *CVS* code base. OpenDLM is not yet supported in any OpenGFS release. We're still working on code stability, so you should use the latest CVS! HINT: Don't build the OpenGFS code yet (step 3 in HOWTO-nopool). You'll need OpenDLM source before doing that. HINT: Once you're done with this step, you should have rebooted and be running the patched kernel! 2. Build, install, and configure OpenDLM: Just to be safe, since OpenDLM build requires kernel source, we're not building OpenDLM until *after* we've already patched the kernel (although it probably doesn't make a difference). Follow instructions in the OpenDLM HOWTO doc, at: http://opendlm.sourceforge.net/docs.php You will need to follow all instructions up to, but not including "Start Locking Service". If you want to, you could even start the locking service at this point (and finish the OpenDLM HOWTO), but it won't be needed quite yet. 3. Build OpenGFS: Now that you've obtained the OpenDLM source code, you can build OpenGFS. See step 3 in OpenGFS' HOWTO-nopool doc, using (at least) the following options for ./configure: --enable-opendlm --with-opendlm_includes=/your/path/to/opendlm/src/include then return here. IMPORTANT: Use OpenGFS CVS code base. OpenDLM is not yet supported in any OpenGFS release. We're still working on code stability, so you should use the latest CVS! 4. If you have a pre-existing OpenGFS filesystem, you do *not* need to lose all of your data by re-partioning and re-making your filesystem!! You can switch back and forth between using the legacy memexp and the new OpenDLM lock protocols, without changing anything on disk. To specify OpenDLM as your lock protcol, add the following option to the mount command line during step 9 (later in this document): -o lockproto=opendlm Alternatively, you may set opendlm as the default locking protocol, by using the ogfs_tool utility (see man page for ogfs_tool). This writes "opendlm" into the filesystem superblock, so you will not need to specify opendlm in the mount command line. In either case, you will not need to partition a drive, or make the filesystem, so skip the next two steps, and continue with step 7, Start Locking Service. 5. Partition the shared drive (using only one computer), into 2 partitions: (based on HOWTO-nopool, but using only 2 partitions, no cidev) HINT: If you are creating a new filesystem, and want to be able to switch back and forth between the legacy memexp, and the new OpenDLM lock protocols, see HOWTO-nopool for information on creating a cluster information device (cidev), which requires its own small partition in addition to the filesystem partition and any external journal partitions. HOWTO-nopool also describes how to make the filesystem. Once done, return to this document at step 7, Start Locking Service. For a strictly OpenDLM (no option to switch to memexp) setup, we'll create an example configuration using 2 partitions. A medium partition (~128MB) will be used for an external journal. A large partition (the rest of the drive) will be used for the filesystem data and an internal journal. These instructions assume you are using a "real" device, but you may instead want to use a volume manager to partition your drive. See OpenGFS HOWTO-evms for information on doing this with EVMS. Use the dev name of your shared drive in place of "sdX" below. See the man page for sfdisk for more information: # sfdisk -R /dev/sdX (make sure no drive partitions are in use) # sfdisk /dev/sdX (this partitions the disk; follow the prompts) Hint: sfdisk works in units of "cylinders". When partitioning my drive, sfdisk showed that each cylinder was 1048576 bytes. So, for the first (small) partition, I entered 0,128 to start at cylinder 0, with a size of 128 cylinders (~128MB). For the second (large) partition, I entered nothing (except the Enter key), which defaulted to use the rest of the drive. For the next two (sfdisk asks for 4 partitions), I also entered nothing (except the Enter key). These last 2 "partitions", of course, are empty. After you enter all 4 partitions, sfdisk asks you if you really want to write to the disk, so you can experiment a bit before committing. Check for success: cat /proc/partitions shows the new partitions HINT: These partitions must show up on all computers in the cluster, and must be named consistently. After you create new partitions using one machine, you may need to find a way for the other machines to re-scan for partitions, or simply reboot the other machines. HINT: If you find that the other computers see the partitions with different names, or if there is any chance you will be re-configuring your cluster computers and thereby affecting the device names, you will *need* to use a volume manager such as EVMS (see OpenGFS HOWTO-evms for information). If using EVMS, you will need to create "native volumes". These provide consistent naming from machine to machine. Check for success: cat /proc/partitions shows the new partitions on every machine, with identical names In the following instructions, we'll call the two partitions /dev/sdx1 and /dev/sdx2, but you will need to substitute appropriate names in their stead. 6. Make the OpenGFS Filesystem (using only one computer): (copied from HOWTO-nopool and edited for OpenDLM) Use the OpenGFS tool "mkfs.ogfs" to create the file system on disk. This step writes one superblock and a number of resource group (a.k.a. block group) headers for the filesystem, and creates journals. In the superblock, it writes strings indicating the name of the default locking protocol (e.g. "opendlm") and a cluster-wide filesystem identifier (a.k.a. lock namespace or lockspace), as specified in the mkfs.ogfs command line. See the man page for mkfs.ogfs for more information on options. A. Edit a configuration file named journal.cf to read like the following. You can find a copy of this file, with comments, as opengfs/docs/journal.cf. This file, and all other opengfs configuration files, may reside anywhere on your computer. Remember, you will need to substitute appropriate names for sdx2 (the filesystem device, large partition) and sdx1 (the external journal device, medium partition). fsdev /dev/sdx2 journals 2 journal 0 int 256 journal 1 ext /dev/sdx1 B. Run the following command to make sure everything is okay. The -v prints extra information, and -n prevents mkfs.ogfs from writing anything to disk. Check the output to verify device sizes, etc. The external journal should start at a *very* high number. The -t option tells the lock protocol something to identify the cluster-wide lock namespace of the filesystem you are mounting. For the legacy memexp protocol, this string was a path to a device (the "cidev") that contained "cluster information". Such a device is not used for OpenDLM, but if you've been using memexp, you may continue to use the same identifier string. Or, if you're just going to use OpenDLM (and never memexp), just make one up yourself. You *must* have a unique name for each OpenGFS filesystem that you mount. For now, let's just use "/dev/sdx001" (an arbitrary, meaningless name) for this filesystem. HINT: If you want to be able to switch back and forth between OpenDLM and memexp, you *must* use the cidev identifier. See HOWTO-nopool for info on creating the cidev. Remember, for memexp, you will need to substitute an appropriate name for sdx001 (the cluster information device, small partition). # mkfs.ogfs -p opendlm -t /dev/sdx001 -c journal.cf -v -n HINT: If you want even more output, try the -d option. C. Run the following command to write the filesystem and internal journal onto the filesystem device, and the external journal onto the external journal device. # mkfs.ogfs -p opendlm -t /dev/sdx001 -c journal.cf HINT: This can take some time to write to disk, and shows no output until it is done. Be patient. 7. Start locking service (on each computer): Follow instructions in the OpenDLM HOWTO doc, at: http://opendlm.sourceforge.net/docs.php You will need to follow all instructions in the step labeled "Start Locking Service" (unless you already did this in step 2, above). 8. Update /lib/modules/[version]/modules.dep, and insert OpenGFS modules: Root privileges required for all following operations: # depmod -a # modprobe ogfs # modprobe opendlm HINT: For debug output from opendlm lock module, use: # modprobe opendlm debug=1 Check for success: cat /proc/filesystems shows "ogfs", among others Check for success: cat /proc/modules shows "ogfs", among others 9. Mount the Filesystem (on each computer). The following commands mount the ogfs file system at the /ogfs mount point. You will need to substitute an appropriate name for sdx2 (the filesystem device, large partition). # mkdir /ogfs # mount -t ogfs /dev/sdx2 /ogfs HINT: If you are using OpenDLM with a pre-existing OpenGFS filesystem, use the following additional option to use opendlm instead of your pre-existing default lock protocol (memexp): -o lockproto=opendlm The -o hostdata=192.168.0.x option, required when using memexp, is not needed for OpenDLM. HINT: If you see an error from mount like: "mount: dev/sdc3 is not a valid block device", a) make sure you are using a valid /dev/* in the command line, or b) you may be trying the mount from a machine *other* than the one you used for creating the new partitions, and this machine hasn't seen the new partitions yet. Try rebooting so *this* machine can detect the new partitions. HINT: If you freeze while mounting, check to make sure that your /etc/dlm.conf files are correct (especially the lines describing the nodes). See OpenDLM HOWTO. HINT: If you encounter an assertion regarding LVB size, check to make sure that you verified the value of MAXLOCKVAL in dlm.h. See OpenDLM HOWTO. HINT: If mount fails, check your syslog (e.g. /var/log/messages) for messages about the cause. A mount will fail if OpenDLM has not put its node table together yet (as of this writing, it seems to take 10 - 20 seconds or more). You may be able to simply try mounting again, and be successful. That's it, you are done! You should now be able to use this like any other filesystem. SHUTTING DOWN CLEANLY --------------------- 1. Unmount file system: # umount /ogfs 2. Stop OpenDLM and HA heartbeat: # killall dlmdu # /etc/init.d/heartbeat stop (this also kills ccm, if you're using it) 3. Unload the modules: # modprobe -r opendlm # modprobe -r ogfs # modprobe -r libdlmk # modprobe -r dlmdk.core STARTING OpenGFS (e.g. after boot-up) ------------------------------------ As an alternative to manually executing the steps below, look in opengfs/scripts for pool.* and ogfs.* startup scripts for Debian and Red Hat distributions. These will require modification for your particular setup (especially since they were written before OpenDLM was an option!). Once OpenGFS has been installed on your computers and storage media, only a few steps are needed to get it going after a boot-up. The following steps assume that your storage media hardware and drivers are installed and visible by all computers in the cluster. Check for success: cat /proc/partitions shows shared drive(s) You will need root privilege for all steps below. See the OpenDLM HOWTO for information on the first 3 steps: 1. Start heartbeat (on each computer). # /etc/init.d/heartbeat start 2. If you're using CCM for OpenDLM membership, start it (on each computer). # /usr/lib/heartbeat/ccm & 3. Start OpenDLM (on each computer, after all nodes' heartbeats are started). # /usr/local/sbin/dlmdu -C /etc/dlm.conf 4. Load OpenGFS and OpenDLM kernel modules (on each computer). # modprobe ogfs # modprobe libdlmk # modprobe opendlm Check for success: cat /proc/filesystems shows "ogfs", among others Check for success: cat /proc/modules shows "opendlm", "ogfs", "libdlmk" among others 5. Mount the Filesystem (on each computer). Remember, you will need to substitute an appropriate name for sdx2 (the filesystem device, large partition). # mount -t ogfs /dev/sdx2 /ogfs -o lockproto=opendlm (the -o lockproto= option is needed only if opendlm is *not* the default lock protocol for your OpenGFS filesystem. See step 9, above). That's it, you are done! You should now be able to use this like any other filesystem. Copyright 2002-2004 The OpenGFS Project