HOWTO_build_and_install_OpenGFS_(nopool,_new) (Jul 29 2004)


Contact the author.

     HOWTO install OpenGFS No-pool

Author:
Ben Cahill

This document describes how to build/install/start/stop OpenGFS so a cluster
of computers can share a single storage device.  It provides a simple example
configuration for one shared drive with 3 partitions, shared between
2 computers, using one internal and one external journal.  The external
journal capability is new, and supports certain OpenGFS features, without
depending on OpenGFS' legacy "pool" volume manager module.

NOTE:  This document applies for current (June, 2003 and later) CVS code, and
for releases after that time.  For release 0.2.1 (or earlier), see the
HOWTO-generic document.

NOTE:  Use kernel 2.4.20 or later for nopool support.  2.5.x and 2.6.x are
not yet supported by OpenGFS.

NOTE:  It is now (for CVS code since July 30, 2003, and any releases after
that) a *requirement* that patches for the DM device manager are applied to
your 2.4.x kernel!  See step 2 in the instructions below.

This document describes how to set up OpenGFS *without* using OpenGFS' legacy
"pool" clustered volume manager module and utilities (see "Pool and Utilities"
document, ogfs-pool).  Instead of pool, you may now use other volume
managers/mappers (or even raw, unmapped devices, if each computer node sees
them with a consistent name).  The no-pool code achieves this by supporting
"external" journals on specific devices/partitions separate from the main
filesystem volume, in addition to internal journals within the filesystem
volume . . .

Background:  mkfs.ogfs, in conjunction with pool (via a private ioctl),
has traditionally supported the ability to assign particular journals
(associated with particular computer nodes) to particular devices within the
makeup of the filesystem volume, which can consist of many physical devices.
Such assignment can enhance system performance, reliability, and manageability.
To do this, mkfs.ogfs needed to know the details of how the filesystem volume
was assembled by the pool volume manager.

The no-pool code can now create journals on particular external devices
(outside of the filesystem volume), so it does not need to know the makeup of
the filesystem volume at all.  This avoids the need for the private ioctl
(supported only by pool), and thus supports the use of "generic" volume
managers/mappers.

You may still use OpenGFS with pool (see OpenGFS HOWTO-generic).  You may even
create external journals on separate pools.  After all, pool *is* a volume
mapper.  The OpenGFS software base, build configuration, and binaries are
identical for pool vs. no-pool usage; pool vs. non-pool usage is set up by
options to mkfs.ogfs when creating the filesystem, and (obviously) whether and
how you use pool.  However, the OpenGFS project will not be maintaining pool,
so we recommend using other commonly available volume managers/mappers instead.

Depending on your setup, you may want to see other documentation as well:

  OpenDLM (Distributed Lock Manager):  This can now be used as the inter-node
  locking protocol for OpenGFS, avoiding the single point of failure
  characteristic of the OpenGFS legacy memexpd lock storage server.
  See the OpenGFS HOWTO-opendlm document.

  EVMS (Enterprise Volume Management System):  This is a good cluster-aware
  volume manager for use with OpenGFS.  See the OpenGFS HOWTO-evms document,
  and use in conjuction with this document for setting up OpenGFS with EVMS.

  IEEE-1394 (firewire-attached) devices:  See the OpenGFS HOWTO-1394 document
  for instructions on patching and re-building the kernel, including some
  1394-specific patches.  Then return to this document for the remaining steps
  of setting up OpenGFS.

  iSCSI (Internet SCSI):  See the OpenGFS HOWTO-iSCSI document for instructions
  on setting up iSCSI, then return to this document for setting up OpenGFS.

  UML (User-Mode Linux):  See the OpenGFS HOWTO-uml document.  This provides
  an environment for debugging kernel code in user space.  All instructions
  for building and installing OpenGFS under UML are in the HOWTO-uml document,
  but you may want to return to this document in addition, for more information.

No-Pool Features/Requirements:

-- The filesystem device can be any any device, real (e.g. /dev/sdc1) or
   virtual (mapped, e.g. /dev/evms/whatever).  For clustered usage, the
   filesystem device must appear as /dev/something identically on each computer
   in the cluster.  This identity must stay consistent over time/reboots.  You
   may want to use a volume manager/mapper (e.g. EVMS) to assure compliance
   with this requirement.

-- External journals can likewise use any devices, real or virtual.  You must
   assign each external journal to a unique device or partition (external
   journals cannot currently share a partition).  As with the filesystem device,
   each journal device must appear identically on each computer in the cluster,
   over the course of time and reboots.

-- Internal journals (within the filesystem device) are still supported.

-- Pool is still supported (but not required, unless you want to keep using
   a pre-existing filesystem that used pool).

-- Filesystem expansion (after enlarging the available space on the filesystem
   device by means of a volume manager) is supported via ogfs_expand utility.

-- Journal addition is supported via ogfs_jadd utility.  Internal journals
   require enlarging the available space on the filesystem device.  External
   journals require one dedicated device or partition for each journal.

Restrictions when not using pool:

-- No support via pool for hardware-based (DMEP) locking.  As an alternative,
   OpenGFS supports(?) hardware DMEP via kernel's generic SCSI support.
   Currently, DMEP support is not maintained in OpenGFS . . . good luck!

-- No support for striping.  Pool has support for that, but we'll rely on
   other volume managers/mappers for that from now on.


INSTALLATION PRELIMINARIES
-------------------------------

This document assumes that you already have set up a shared storage device.
Typically, this means installing interface hardware (e.g. a fibre channel
host bus adapter) and drivers (e.g. qla2300 driver for QLogic 23x0 HBA) on
each computer that will be sharing the storage device(s), and connecting
each computer to the storage device (e.g. with fibre cable).  You do not
need to partition or initialize the storage device/drives (yet).  Installation
success is indicated by the drive(s) appearing in /proc/partitions on each
computer in the shared storage cluster.

HINT:  Take a look at the "big convenience" patches described in step 2A
below.  They contain some popular fibre channel drivers, among other things.
If you use a "big" patch, your fibre channel drives will not show up in
/proc/partitions until after you apply the patch and build your kernel in
step 2 below.

Check for success:  cat /proc/partitions shows shared drive(s)

This document also assumes that your cluster member computers are
interconnected by a LAN, and that you know the IP address of each computer.
You will *probably* want to use static IP addresses for this purpose!

Most of the steps below must be executed on *all* computers in the cluster,
while some (the steps that write to the shared storage device) need to be
executed just once, on only one computer.

You may find it easiest to follow the instructions by building the code
separately on each computer in the cluster (especially if different computers
run different kernel versions for some reason).  However, if you want to do
one build (e.g. on your development platform), and install on multiple target
computers, there are two options:

  The opengfs.spec file in the CVS download version allows you to create an
  RPM package for easy installation on multiple computers.

  Use --prefix and --with-linux_moduledir options when running ./configure
  (see below), to install the build results into isolated directories, which
  you can copy to the other machines.


BUILDING AND INSTALLING OpenGFS
-------------------------------

1. Get the OpenGFS source, either from CVS or via tarball (some releases are
	also available as source RPMs).  Copy to each computer in the cluster.

	A.  Retrieve (check out) the OpenGFS source from CVS:

	export CVSROOT=:pserver:anonymous@cvs.opengfs.sourceforge.net:/cvsroot/opengfs
	cvs login		(just hit "enter" key for the password)
	cvs -z3 co opengfs	(-z3 invokes compression, if desired)

	HINT:  Use "cvs up opengfs" to retrieve subsequent updates to files
	already checked out, or "cvs co opengfs" to make sure that you pick
	up any new files.

		*OR*

	B.  Download a release tarball from http://opengfs.sourceforge.net.
	Click on "Downloads", then on "Sourceforge Download Page", and
	download the release you want, then:

	tar -xvzf opengfs-n.n.n.tar.gz	(substitute version for n.n.n)

	or use bunzip2 if the tarball has a .bz2 suffix.

	HINT:  The no-pool code was checked into CVS on 08 May, 2003.
	Support for filesystem expansion and journal addition for no-pool
	was checked into CVS between 23 May and 30 May, 2003.
	Please keep this in mind when selecting tarballs!
	Release 0.2.1 (13 May, 2003) does *not* support no-pool operation;
	even though its release date overlaps the CVS checkin, it does not
	contain no-pool support!

2. Patch/rebuild kernel with the OpenGFS and DM patches (on each computer).
	The OpenGFS patches only modify the existing kernel code.  They do not
	contain the OpenGFS kernel modules.  Those will be built in step 3.

	OpenGFS now (as of July 30, 2003, and any releases after that date)
        requires patches for the DM device mapper to be in your kernel.
	You have (at least) 2 choices for obtaining them:

	A.  Apply "big" patch.
		If you are going to use EVMS, or want to use QLogic or Feral
		drivers for a QLogic host bus adaptor with kernel 2.4.20 or
		later, you may wish to use one of the big "convenience" patches
		available from the OpenGFS download site (*not* in CVS!):

		http://opengfs/sourceforge.net/download.php

		These tarballs contain kernel patches for OpenGFS, EVMS, DM
		KDB (kernel debugger), and drivers for QLogic HBAs (read
		comments on download page for specific contents of each patch).
		Depending on your kernel, and features desired, look for the
		following, or other appropriate patches as they may appear:

		linux-2.4.20-ogfs3.patch.bz2,
		linux-2.4.21-ogfs1.patch.bz2,

		Use bunzip2 to uncompress, then apply patches to your kernel:

		cd /usr/src/linux	(or wherever your kernel source lives)
		cat /path/to/big/patch/linux-2.4.21-ogfs1.patch | patch -p1

		HINT:  To avoid patch conflicts, you may want to start with
		pristine kernel source in a new directory.  You may want to
		copy your .config file from your old source directory to your
		new one, to save a lot of reconfiguration time.  The big
		patches place an EXTRAVERSION value (e.g. "-ogfs1") in Makefile,
		to differentiate the build from your normal build.

		HINT:  When reconfiguring, select no more than one QLogic
		driver (2100/2200/2300).

		HINT:  The big patches on the OpenGFS website may not contain
		the latest patches for various non-OpenGFS components.  If you
		want to use the latest, you may need to round up all the
		patches yourself.  If you do, please consider contributing
		your results (i.e. a new big patch) to our download page.

	B.  OR, apply minimal (only the OpenGFS and DM) patches.
		If you will not use EVMS, want to keep your patching compact,
		or want to make sure you're using the latest patches, obtain
		DM patches directly from DM project website:

		http://sources.redhat.com/dm/

		Select the latest patches for your kernel, and apply them:

		cd /usr/src/linux	(or wherever your kernel source lives)
		cat /path/to/dm/patches/*.patch | patch -p1

		In addition, you must apply the OpenGFS kernel patches:

		cd /usr/src/linux	(or wherever your kernel source lives)
		cat /path/to/opengfs/kernel_patches/2.4.x/*.patch | patch -p1

		HINT:  You may want to edit your kernel Makefile to put in a
		value for EXTRAVERSION (e.g. "-ogfs") to differentiate this
		kernel from your normal kernel.


	Then (for A or B):

	Reconfigure kernel (e.g. make oldconfig) for new patches
	Rebuild kernel (make bzImage, make modules, make modules_install)
	Install kernel (e.g. mkinitrd, lilo, etc.)
	Reboot

	HINT:  With, for example, linux-2.4.21, and EXTRAVERSION=-ogfs,
	make modules_install will place kernel modules in:

	/lib/modules/2.4.21-ogfs/ogfs

	HINT:  If you don't add patches for DM to the kernel, you will get
	complaints during the build about "b_journal_head".


(Return here from HOWTO-1394 . . . )
3. Build and install OpenGFS kernel modules/tools/man-pages (on each computer).
	The bootstrap step is required only if you downloaded from CVS.  It is
	not required for the tarball.

	Make sure you provide a correct path to your patched(!) kernel source,
	using --with-linux_srcdir, when running ./configure.  Some of the
	patches modify kernel include files needed for building ogfs.

	cd /path/to/opengfs
	./bootstrap (ONLY FOR CVS DOWNLOADS)
	./configure [options]

	HINT:  If the bootstrap script spits out cryptic warnings, double check
	that you have the right versions of autoconf and automake installed.
	Some versions of RedHat and Debian Linux (and maybe others), came with
	a flawed wrapper script that tried to allow installing old and new
	versions of the auto... tools in parallel.  Using them usually ends in
	disaster.  Deinstall all autoconf and automake versions from your
	system, download the latest autoconf tarball and the automake-1.6.x
	tarball and install the tools from them, then try again.

	HINT:  ./configure --help shows options.  Some interesting ones:
	--with-linux_srcdir=/some/path, location of patched(!) linux source
	--with-opendlm_includes=/some/path, OpenDLM source, see HOWTO-opendlm
	--prefix=/some/path, installs user binaries and man pages under here
	--with-linux_moduledir=/some/path, installs kernel modules under here
	--enable-extras, builds OpenGFS test tools
	--enable-*-debug, enables debug features in various OpenGFS components
	--enable-*-stats, enables statistics in various OpenGFS components
	--enable-uml, compile for User-Mode Linux, see HOWTO-uml
	--enable-opendlm, compile with OpenDLM lock module, see HOWTO-opendlm

	make
	su		(you need root privilege for all following steps)
	make install

	Check for success*:  /sbin contains "ptool" and many other ogfs tools
	Check for success*:  /lib/modules/2.4.x/ogfs contains ogfs.o + others
	Check for success*:  /usr/man/man8 contains ogfs.8, memexpd.8 + others

	* Default locations without --prefix or --with-linux_moduledir options.


4. Insert OpenGFS modules (on each computer):

	modprobe memexp
	modprobe ogfs

	Hint:  If modprobe says it can't find memexp, try running depmod to
	update /lib/modules/(version)/modules.dep

	Check for success:  cat /proc/filesystems shows "ogfs", among others
	Check for success:  cat /proc/modules shows "memexp", "ogfs"
			+ other modules used by memexp and ogfs, among others


5. Partition the shared drive (using only one computer), into 3 partitions.

	A small partition (~4MB) will be used for Cluster Information (ci).
	A medium partition (~128MB) will be used for an external journal.
	A large partition (the rest of the drive) will be used for the
	filesystem data and an internal journal.

	These instructions assume you are using a "real" device, but you
	may instead want to use a volume manager to partition your drive.
	See OpenGFS HOWTO-evms for information on doing this with EVMS.

	Use the dev name of your shared drive in place of "sdX" below.
	See the man page for sfdisk for more information:

	sfdisk -R /dev/sdX  (make sure no drive partitions are in use)
	sfdisk /dev/sdX	    (this partitions the disk; follow the prompts)

	Hint:  sfdisk works in units of "cylinders".  When partitioning
	my drive, sfdisk showed that each cylinder was 1048576 bytes.
	So, for the first (small) partition, I entered 0,4 to start at
	cylinder 0, with a size of 4 cylinders (~4MB).
	For the second (medium) partition, I entered 4,128 to start at
	cylinder 4, with a size of 128 cylinders (~128MB).
	For the third (large) partition, I entered nothing (except the
	Enter key), which defaulted to use the rest of the drive.
	For the next one (sfdisk asks for 4 partitions), I also entered
	nothing (except the Enter key).  This last "partition", of course,
	is empty.

	After you enter all 4 partitions, sfdisk asks you if you really
	want to write to the disk, so you can experiment a bit
	before committing.

	Check for success:  cat /proc/partitions shows the new partitions

	HINT:  These partitions must show up on all computers in the cluster,
	and must be named consistently.  After you create new partitions using
	one machine, you may need to find a way for the other machines to
	re-scan for partitions, or simply reboot the other machines (don't
	forget to re-modprobe memexp and ogfs).

	HINT:  If you find that the other computers see the partitions with
	different names, or if there is any chance you will be re-configuring
	your cluster computers and thereby affecting the device names, you will
	*need* to use a volume manager such as EVMS (see OpenGFS HOWTO-evms
	for information).  If using EVMS, you will need to create "native
	volumes".  These provide consistent naming from machine to machine.

	Check for success:  cat /proc/partitions shows the new partitions
		on every machine, with identical names

	In the following instructions, we'll call the three partitions
	/dev/sdx1, /dev/sdx2, and /dev/sdx3, but you will need to substitute
	appropriate names in their stead.


6. Make the OpenGFS Filesystem (using only one computer).
	Use the OpenGFS tool "mkfs.ogfs" to create the file system on disk.
	This step writes a superblock and resource group (a.k.a. block group)
	headers for the filesystem, and creates journals.  In the superblock,
	it writes the default locking protocol (e.g. "memexp") and the cluster
	information device, as specified in the mkfs.ogfs command line.
	See the man page for mkfs.ogfs for more information on options.

	A.  Edit a configuration file named journal.cf to read like the
		following.  You can find a copy of this file, with comments,
		as opengfs/docs/journal.cf.  This file, and all other opengfs
		configuration files, may reside anywhere on your computer.

		Remember, you will need to substitute appropriate names for
		sdx3 (the filesystem device, large partition) and sdx2
		(the external journal device, medium partition).

fsdev  /dev/sdx3

journals  2

journal  0  int 256
journal  1  ext /dev/sdx2

	B.  Run the following command to make sure everything is okay.  The
		-v prints extra information, and -n prevents mkfs.ogfs from
		writing anything to disk.  Check the output to verify device
		sizes, etc.  The external journal should start at a *very*
		high number.

		Remember, you will need to substitute an appropriate name for
		sdx1 (the cluster information device, small partition).

	mkfs.ogfs -p memexp -t /dev/sdx1 -c journal.cf -v -n

		HINT:  If you want even more output, try the -d option.

	C.  Run the following command to write the filesystem and internal
		journal onto the filesystem device, and the external journal
		onto the external journal device.

	mkfs.ogfs -p memexp -t /dev/sdx1 -c journal.cf

		HINT:  This can take some time to write to disk, and shows
		no output until it is done.  Be patient.


7. Configure the OpenGFS Cluster, write it to disk (using only one computer).
	Now that the file system has been created on the storage media, we
	need to describe the computers (nodes/machines) that will be sharing
	the media.  This includes the STOMITH ("Shoot The Other Machine In
	The Head") method(s) used for resetting a computer when it fails,
	and the heartbeat period used for detecting failure.  For more
	information, see the man page for ogfsconf.  Also see
	opengfs/docs/ogfscftemplate to understand the configuration file.

	You need to choose one computer to be the lock storage server.  This
	computer stores the inter-node lock status of all OpenGFS locks within
	the cluster.  We're assuming a two-computer cluster in this example.
	You may use one of these two computers as the lock storage server, or
	use a third computer instead.

	HINT:  When using Manual STOMITH, after resetting a dead computer
	(but *not* now, though, while you initially set up the cluster!),
	you must run the following OpenGFS tool to continue the recovery
	process.  See the man page for do_manual for more information:

	do_manual -s $IP_ADDR_OF_DEAD_NODE  (don't do this now, though)

	A.  Edit a configuration file named ogfscf.cf to read like the
		following.  You'll need to substitute appropriate
		IP addresses for the lock storage server computer and the
		cluster member computers (nodes).  However, you'll want
		to use the same port numbers (15697 and 3001).
		See ogfsconf man page.

		Remember, you will also need to substitute appropriate names
		for sdx3 (the filesystem device, large partition) and
		sdx1 (the cluster information device, small partition).

datadev: /dev/sdx3
cidev: /dev/sdx1
lockdev: 192.168.0.37:15697
cbport: 3001

timeout: 30
STOMITH: manual
name: manual

node: 192.168.0.37 0 SM: manual 1
node: 192.168.0.203 1 SM: manual 2


	B.  Write the config info to the Cluster Information (ci) partition.

	ogfsconf -c ogfscf.cf


8. Start the Lock Server (on only one computer, the lock storage server).
	This daemon provides the centralized lock storage facility for the
	memexp locking protocol.  Launch the OpenGFS memexpd lock storage
	server daemon (see memexpd man page):

	memexpd &


9. Mount the Filesystem (on each computer).
	The following commands mount the ogfs file system at the /ogfs
	mount point.  For hostdata, you'll need to substitute the IP address
	of the computer on which you are currently mounting the filesystem
	(i.e. *this* computer).

	Remember, you will also need to substitute an appropriate name
	for sdx3 (the filesystem device, large partition).

	mkdir /ogfs
	mount -t ogfs /dev/sdx3 /ogfs -o hostdata=192.168.0.x

	HINT:  If you see an error from mount like:
	"mount:  dev/sdc3 is not a valid block device",
	a)  make sure you are using a valid /dev/* in the command line, or
	b)  you may be trying the mount from a machine *other* than the one you
	used for creating the new partitions.  Try rebooting so *this* machine
	can detect the new partitions.

That's it, you are done!
You should now be able to use this like any other filesystem.



SHUTTING DOWN CLEANLY
--------------------
As an alternative to manually executing the steps below, look in
opengfs/scripts for pool.* and ogfs.* startup scripts for Debian and Red Hat
distributions.  These may require modification for your particular setup.

If you want to shut down your computers, you will need to unmount OpenGFS.

1.  Unmount the Filesystem (on each computer).

	umount /ogfs   (this assumes you mounted onto /ogfs)


2.  If you find that you need to unload the kernel modules for some
	reason (usually not necessary for shutting down the computer), you
	may need to run the following to remove all knowledge
	of pools from the pool module, and reduce usage count to 0:

	passemble -r all (not necessary unless you're using pool)

	Then unload the modules:

	modprobe -r ogfs
	modprobe -r memexp


3.  If you want to uninstall the tools/modules/man-pages, go to your
	build directory, and run:

	make uninstall

	If you deleted the tree (oops!), it can be rebuilt from scratch, but
	make sure you use the same paths.


STARTING OpenGFS (e.g. after boot-up)
------------------------------------
As an alternative to manually executing the steps below, look in
opengfs/scripts for pool.* and ogfs.* startup scripts for Debian and Red Hat
distributions.  These may require modification for your particular setup.

Once OpenGFS has been installed on your computers and storage media, only a
few steps are needed to get it going after a boot-up.  The following steps
assume that your storage media hardware and drivers are installed and visible
by all computers in the cluster.

Check for success:  cat /proc/partitions shows shared drive(s)

You will need root privilege for all steps below:

1.  Load OpenGFS kernel modules (on each computer).

	modprobe memexp
	modprobe ogfs

	Check for success:  cat /proc/filesystems shows "ogfs", among others
	Check for success:  cat /proc/modules shows "memexp", "ogfs"
			+ other modules used by memexp and ogfs, among others


2.  Start the Lock Server (on only one computer, the lock storage server).
	This must be started before mounting the filesystem on *any* of the
	computers in the cluster.

	memexpd &  (on lock storage server computer only!)


3.  Mount the Filesystem (on each computer).
	Remember, you will need to substitute an appropriate name
	for sdx3 (the filesystem device, large partition).

	Remember also, "hostdata" is the IP address of the computer on which
	you are currently mounting the filesystem (i.e. *this* computer).

	mount -t ogfs /dev/sdx3 /ogfs -o hostdata=192.168.0.x


That's it, you are done!
You should now be able to use this like any other filesystem.

Copyright 2002-2003 The OpenGFS Project
SourceForge Logo