problem-description.tex

% Problem description


\chapter{Introducing Zimbra High Availability}
This thesis was written to reflect the state of the art in High Availability methods for Zimbra Open Source Edition.  

\section {High Availability}
High availability is a system design approach and associated service implementation that ensures that a prearranged level of operational performance will be met during a contractual measurement period.

One of the most common high availability examples are web servers. Two web servers share the same information thanks to a shared storage. If one of the web servers fails to serve pages the other server can reclaim its primary role and shot the other node in the head (stonith) so that it can serve pages instead of the original primary node. The amount of time since the detection of the first server failure to its reestablishment is denoted as downtime. A contract for High Availability might stipulate that in a month time web servers service might be in downtime status for no more than five minutes.

High Availability, or HA as it is abbreviated, refers to the availability of resources in a computer system, in the event of component failures in the system. This can be achieved in a variety of ways, either with custom and redundant hardware to ensure availability or with software solutions using off-the-shelf hardware components.

The former class of solutions provide a higher degree of availability, but are significantly more expensive than the latter. This has led to the popularity of the latter class, with almost all vendors of computer systems offering various HA products. Typically, these products endure single points of failure in the system. (\cite{TaskForceHA})

As an example for more expensive systems we can mention OVH.co.uk web hosting service which uses more than 1,000 servers not only for ensuring high availability but also for dealing with high loads of visitor's queries. SQL servers seem to achieve high availability by using several RAID-1 (mirror) hard disks.

These HA systems usually ensure that a prearranged level of operational performance will be met during a contractual measurement period. (\cite{WikipediaHA})

High availability systems typically operate 24x7 and usually require built-in redundancy to minimize the risk of downtime due to hardware and/or telecommunication failures. 

Availability can be measured relative to "100\% operational" or "never failing." A widely-held but difficult-to-achieve standard of availability for a system or product is known as "five 9s" (99.999 percent) availability. (\cite{BCMHA})

From now on high availability will be refered as HA.

\section {Vmware Zimbra OSE}
VMware Zimbra is a complete email, address book, calendar and tasks solution that can be accessed from the Zimbra Web Client, Zimbra Desktop offline client, Outlook and a variety of other standards-based email clients and mobile devices. It can be deployed as a traditional binary install on Linux, or as a software virtual appliance, commonly referred to as Zimbra appliance.

Among the Zimbra Collaboration Server (ZCS) versions this thesis will approach the ZCS Open Source Edition also known as Zimbra OSE.

Vmware Zimbra OSE will be referred most of the times as Zimbra.

For more information about Vmware Zimbra you can visit: \cite{ZimbraLearn}.

\section {Vmware Zimbra OSE High Availability}
Zimbra OSE High Availability is a project which attempts to attain HA to each one of the Zimbra Collaboration Server components so that the risk of downtime due to hardware and/or telecommunication failures is minimized. High availability is usually implemented using High Availability software aimed at Gnu/Linux originally which is adapted to the Zimbra OSE setup.


\section{\label{sec:history}History}

Prior documentation about Zimbra High Availability was written with Zimbra 6 version in mind which is devoted to work (among others) in Ubuntu 8.04 64 bit. That documentation was based on Gnu/Linux High Availability software (heartbeat) which is no longer used for High Availability purposes nowadays.

To the best of the author's knowledge there is no updated documentation on how to setup this system.

On September 13th, 2012 Vmware announced Zimbra 8 which could be run in Ubuntu 12.04 (\cite{VmwareZimbra8Announce}).

\section {Main thesis topic}
The main topic of this thesis is: \\
\textit{Design and test the setup of a Zimbra 8 High Availability System in Ubuntu 12.04 and write a detailed report of this setup procedure.}

\section {Chosen technologies and solution}
We have decided to use the following proved open source HA technologies:\\ Corosync, Distributed Replicated Block Device (DRBD), OCF, and Pacemaker.

Corosync is a software meant to synchronize configuration files. Synchronize configuration files in a cluster is essential because all the cluster members have to share the same information and knowledge of the other cluster members. DRBD allows us to share a common storage between hosts as if a network RAID-1 hard disk it was. Finally, in order to setup and manage the whole cluster we will use Pacemaker cluster which uses OCF scripts as Cluster control scripts.

The reason for using these technologies is because they are the best well known and most used open source HA technologies. They have supersesed old technologies like heartbeat which we mentioned in \textbf{section {\ref{sec:history} History}}.

A set of virtual machines will be used to setup Zimbra 8 and Ubuntu 12.04 HA system. The neccessary steps to carry out this setup are: Setup network in both hosts, setup DRBD, initial Pacemaker setup, corosync installation, pacemaker installation, corosync setup, startup script disabling, DRBD script boot disabling, Pacemaker final setup, Pacemaker case of use tests.

This report describes in detail the setup procedure of each of these steps.

\section {Notation remarks}
Zimbra will be run in two \textit{servers} in our solution. These servers will be refered to as \textit{virtual servers} when we take the point of view of Virtualbox or virtualization point of view as in section \textbf {\ref{sec:virtualbox-implementation} Virtualbox implementation}.

When the HA system will be described these same servers will be refered to as \textit{nodes} as this is the way most HA software refer to their own cluster machines.

\section {Structure of the document}

Chapter 2 explains the high availability system that will be tested through the thesis.

Chapters 3 to 12 explain the detailed setup procedure of each element that constitutes the final system: \textit{\ref{chap:ha-schema} - High availability schema},
\textit{\ref{chap:operating-system-installation} - Operating System installation},
\textit{\ref{chap:network-setup} - Network setup},
\textit{\ref{chap:zimbra-installation} - Zimbra installation},
\textit{\ref{chap:drbd-setup} - DRBD Setup},
\textit{\ref{chap:zimbra-drbd-startup-script-disabling} - Zimbra and DRBD Startup Script Disabling},
\textit{\ref{chap:corosync-setup} - Corosync Setup},
\textit{\ref{chap:zimbra-ocf} - Zimbra OCF Resource Agent development},
and
\textit{\ref{chap:pacemaker-setup} - Pacemaker Setup}.
Final system is a Zimbra OSE High Availability system with two servers.

We learn how to manage our new High Availability system thanks to the \textit{\ref{chap:ha-system-management} - High Availability System Management} chapter.

Finally both conclusions and some of the ways this thesis can be improved are described in \textit{\ref{chap:conclusions-and-future-work}} chapter.