An Introduction To Clustering Under Linux ========================================= Shanker Balan http://shankerbalan.com/ Tue Apr 29 18:13:10 IST 2003 Linux based clustering is big these days for the following reasons - Ability to use COTS (Cheap Off The Shelf) hardware - GPL software with source code - Easy to setup and maintain - Support from Linux newsgroups and user groups - Extensive documentation - Availability of cluster optimized applications - both GPL and commercial - Low TCO There are many types of clusters, the most common ones are - Beowulf clusters - MOSIX" clusters - High-Availability cluster Which one should you use depends on what you wish to use your cluster for. Applications usually fall in one of the following X categories - Computational Intensive, IO intensive and High Availability. Beowulf clusters - Beowulf clusters are the most common and popular clusters currently. These are used for computationally intensive jobs like simulations and other number crunching problems typically found in research centers and laboratories. Beowulf clusters use parallel computing to reduce the time required for a CPU intensive job. By distributing the workload across the clusters and paralleling executing the instructions, it is possible to reduce the job execution time considerably depending on the number of nodes in the cluster. Beowulf clusters can use either PVM (Parallel Virtual Machine) or MPI (Message Passing Interface) (there are others) as distributed application programming environments. PVM - PVM (Parallel Virtual Machine) is a freely-available, portable, message-passing library generally implemented on top of sockets. It is clearly established as the de-facto standard for message-passing cluster parallel computing See http://www.linuxdoc.org/HOWTO/Parallel-Processing-HOWTO-3.html#ss3.4 MPI - Although PVM is the de-facto standard message-passing library, MPI (Message Passing Interface) is the relatively new official standard. PVM Vs MPI has always been a religious war akin to Vi Vs Emacs. The author of the Linux Parallel Processing HOWTO has provided an unbiased summary of the differences b/w the two at http://www.linuxdoc.org/HOWTO/Parallel-Processing-HOWTO-3.html#ss3.5 Though the MPI has clearly established itself as a standard, PVM still has an upper edge over MPI as it has more applications and coder base is also quite strong. But this would soon change as MPI becomes popular. MOSIX Clusters: Now lets get to another type of clusters - MOSIX clusters. MOSIX is a software package that was specifically designed to enhance the Linux kernel with cluster computing capabilities. The core of MOSIX are adaptive (on-line) load-balancing, memory ushering and file I/O optimization algorithms that respond to variations in the use of the cluster resources, e.g., uneven load distribution or excessive disk swapping due to lack of free memory in one of the nodes. In such cases, MOSIX initiates process migration from one node to another, to balance the load, or to move a process to a node that has sufficient free memory or to reduce the number of remote file I/O operations. MOSIX clusters are typically used in data centers and data warehouses. See http://www.mosix.org/txt_whatis.html High Availability Cluster: As the name suggests, these are clusters which provide a high degree of fault tolerant server systems where 100% up-times are required. In the event of failure of one node, the other nodes which form the cluster will take over the functionality of the failed node transparently. HA clusters are typically used for DNS server, proxy and web servers. See http://www.linux-ha.org/