An SSI cluster is a collection of computers that work together as if they are a single highly-available supercomputer. There are at least three reasons to create an SSI cluster of virtual UML machines.
Allow new users to easily experiment with SSI clustering, before investing time and hardware resources into creating a hardware-based cluster.
Provide a friendly testing and debugging environment for SSI developers.
Let developers test hundred-node clusters with only ten or so physical machines.
The raison d'être of the SSI Clustering project is to provide a full, highly available SSI environment for Linux. Goals for this project include availability, scalability and manageability, using standard servers. Technology pieces include: membership, single root and single init, single process space and process migration, load leveling, single IPC, device and networking space, and single management space.
The SSI project was seeded with HP's NonStop Clusters for UnixWare (NSC) technology. It also leverages other open source technologies, such as Cluster Infrastructure (CI), Global File System (GFS), keepalive/spawndaemon, Linux Virtual Server (LVS), and the Mosix load-leveler, to create the best general-purpose clustering environment on Linux.
The CI project is developing a common infrastructure for Linux clustering by extending the Cluster Membership Subsystem (CLMS) and Internode Communication Subsystem (ICS) from HP's NonStop Clusters for Unixware (NSC) code base.
GFS is a parallel physical file system for Linux. It allows multiple computers to simultaneously share a single drive. The SSI Clustering project uses GFS for its single, shared root. GFS was originally developed and open-sourced by Sistina Software. Later they decided to close the GFS source, which prompted the creation of the OpenGFS project to maintain a version of GFS that is still under the GPL.
keepalive is a process monitoring and restart daemon that was ported from HP's Non-Stop Clusters for UnixWare (NSC). It offers significantly more flexibility than the respawn feature of init.
spawndaemon provides a command-line interface for keepalive. It's used to control which processes keepalive monitors, along with various other parameters related to monitoring and restart.
Keepalive/spawndaemon is currently incompatible with the GFS shared root. keepalive makes use of shared writable memory mapped files, which OpenGFS does not yet support. It's only mentioned for the sake of completeness.
LVS allows you to build highly scalable and highly available network services over a set of cluster nodes. LVS offers various ways to load-balance connections (e.g., round-robin, least connection, etc.) across the cluster. The whole cluster is known to the outside world by a single IP address.
The SSI project will become more tightly integrated with LVS in the future. An advantage will be greatly reduced administrative overhead, because SSI kernels have the information necessary to automate most LVS configuration. Another advantage will be that the SSI environment allows much tighter coordination among server nodes.
LVS support is turned off in the current binary release of SSI/UML. To experiment with it you must build your own kernel as described in Section 4.
The Mosix load-leveler provides automatic load-balancing within a cluster. Using the Mosix algorithms, the load of each node is calculated and compared to the loads of the other nodes in the cluster. If it's determined that a node is overloaded, the load-leveler chooses a process to migrate to the best underloaded node.
Only the load-leveling algorithms have been taken from Mosix. The SSI Clustering project is using its own process migration model, membership mechanism and information sharing scheme.
The Mosix load-leveler is turned off in the current binary release of SSI/UML. To experiment with it you must build your own kernel as described in Section 4.
User-Mode Linux (UML) allows you to run one or more virtual Linux machines on a host Linux system. It includes virtual block, network, and serial devices to provide an environment that is almost as full-featured as a hardware-based machine.
The following are various cluster types found in use today. If you use or intend to use one of these cluster types, you may want to consider SSI clustering as an alternative or addition.
High performance (HP) clusters, typified by Beowulf clusters, are constructed to run parallel programs (weather simulations, data mining, etc.).
Load-leveling clusters, typified by Mosix, are constructed to allow a user on one node to spread his workload transparently across all nodes in the cluster. This can be very useful for compute intensive, long running jobs that aren't massively parallel.
Web-service clusters, typified by the Linux Virtual Server (LVS) project and Piranha, do a different kind of load leveling. Incoming web service requests are load-leveled by a front end system across a set of standard servers.
Storage clusters, typified by Sistina's GFS and the OpenGFS project, consist of nodes which supply parallel, coherent, and highly available access to filesystem data.
Database clusters, typified by Oracle 9I RAC (formerly Oracle Parallel Server), consist of nodes which supply parallel, coherent, and HA access to a database.
High Availability clusters, typified by Lifekeeper, FailSafe and Heartbeat, are also often known as failover clusters. Resources, most importantly applications and nodes, are monitored. When a failure is detected, scripts are used to fail over IP addresses, disks, and filesystems, as well as restarting applications.
For more information about how SSI clustering compares to the cluster types above, read Bruce Walker's Introduction to Single System Image Clustering.
To create an SSI cluster of virtual UML machines, you need an Intel x86-based computer running any Linux distribution with a 2.2.15 or later kernel. About two gigabytes of available hard drive space are needed for each node's swap space, the original disk image, and its working copy.
A reasonably fast processor and sufficient memory are necessary to ensure good performance while running several virtual machines. The systems I've used so far have not performed well.
One was a 400 MHz PII with 192 MB of memory running Sawfish as its window manager. Bringing up a three node cluster was quite slow and sometimes failed, maybe due to problems with memory pressure in either UML or the UML port of SSI.
Another was a two-way 200 MHz Pentium Pro with 192 MB of memory that used a second machine as its X server. A three node cluster booted quicker and failed less often, but performance was still less than satisfactory.
More testing is needed to know what the appropriate system requirements are. User feedback would be most useful, and can be sent to <ssic-linux-devel@lists.sf.net>.
The latest version of this HOWTO will always be made available on the SSI project website, in a variety of formats:
Feedback is most certainly welcome for this document. Please send your additions, comments and criticisms to the following email address: <ssic-linux-devel@lists.sf.net>.
This document is copyrighted © 2002 Hewlett-Packard Company and is distributed under the terms of the Linux Documentation Project (LDP) license, stated below.
Unless otherwise stated, Linux HOWTO documents are copyrighted by their respective authors. Linux HOWTO documents may be reproduced and distributed in whole or in part, in any medium physical or electronic, as long as this copyright notice is retained on all copies. Commercial redistribution is allowed and encouraged; however, the author would like to be notified of any such distributions.
All translations, derivative works, or aggregate works incorporating any Linux HOWTO documents must be covered under this copyright notice. That is, you may not produce a derivative work from a HOWTO and impose additional restrictions on its distribution. Exceptions to these rules may be granted under certain conditions; please contact the Linux HOWTO coordinator at the address given below.
In short, we wish to promote dissemination of this information through as many channels as possible. However, we do wish to retain copyright on the HOWTO documents, and would like to be notified of any plans to redistribute the HOWTOs.
If you have any questions, please contact <linux-howto@en.tdlp.org>
No liability for the contents of this documents can be accepted. Use the concepts, examples and other content at your own risk. As this is a new edition of this document, there may be errors and inaccuracies, that may of course be damaging to your system. Proceed with caution, and although this is highly unlikely, the author(s) do not take any responsibility for that.
All copyrights are held by their by their respective owners, unless specifically noted otherwise. Use of a term in this document should not be regarded as affecting the validity of any trademark or service mark.
Naming of particular products or brands should not be seen as endorsements.
You are strongly recommended to make a backup of your system before major installations, and back up at regular intervals.