HOWTO: Use GlusterFS for IMAP Spools¶
GlusterFS is a distributed filesystem with built-in redundancy and self-healing features, that allows individual storage volumes to be aggregated into larger storage volumes.
This HOWTO sets up a single Kolab server using an IMAP spool mounted over GlusterFS, as illustrated in GlusterFS Replicated Volume.
To illustrate the GlusterFS volume scaling, we expand this original GlusterFS volume in GlusterFS Distributed Replicated Volume.
The initial setup consists of the following systems:
System
gfs1.example.orgwith a second disk volume vdb of 10GB and IP address 192.168.122.11.System
gfs2.example.orgwith a second disk volume vdb of 10GB and IP address 192.168.122.12.System
kolab.example.org.
The IN A address for gfs.example.org is made to resolve to the .11
and .12 IP addresses.
GlusterFS Replicated Volume¶
The initial setup looks as follows:
![digraph {
nodesep=1
"Kolab Server" -> "GlusterFS"
subgraph cluster_gluster {
"GlusterFS" -> "Brick #1", "Brick #2";
subgraph {
rank=same;
"Brick #1" -> "Brick #2" [dir=both];
}
}
}](../_images/graphviz-900152f1338db978f2575e978c473aab986a6eff.png)
In this scenario, the Kolab server uses a GlusterFS volume mount for its IMAP spool, that is redundant as both bricks contain the same data.
Partition
/dev/vdbongfs1andgfs2as follows:# parted /dev/vdb GNU Parted 3.1 Using /dev/vdb Welcome to GNU Parted! Type 'help' to view a list of commands. # mklabel gpt Warning: The existing disk label on /dev/vdb will be destroyed and all data on this disk will be lost. Do you want to continue? Yes/No? yes # unit GB # mkpart primary 0GB 10GB # set 1 lvm on
Create a physical volume, then a volume group, then a logical volume on both
gfs1andgfs2:# pvcreate /dev/vdb # vgcreate vg_gfs /dev/vdb # lvcreate -L 9GB -n lv_brick vg_gfs
Note
The logical volume
lv_brickleaves 10% of the volume group unused for two purposes:Filesystem checks can be performed on a logical volume snapshot, without interrupting the storage availability, and
Backups can be made using logical volume snapshots without interrupting storage availability.
On both
gfs1andgfs2, create a filesystem on the new logical volume:# mkfs.ext4 /dev/vg_gfs/lv_brick
Create a mount point for the filesystem:
# mkdir -p /srv/gfs
Configure the mount to be made on system startup and mount:
# echo "/dev/vg_gfs/lv_brick /srv/gfs ext4 defaults 1 2" >> /etc/fstab # mount -a
Create the directory to be exported as a brick:
# mkdir -p /srv/gfs/brick
Warning
Do not use the filesystem root directory
/srv/gfs/as the brick to export, for itslost+found/directory will be rendered corrupt and useless.Install the
glusterfs,glusterfs-fuseandglusterfs-serverpackages ongfs1andgfs2:# yum -y install glusterfs{,-fuse,-server}Start the glusterd service and configure it to start when the system boots:
# service glusterd start # chkconfig glusterd on
Use
gfs1and probe the other GlusterFS node:# gluster peer probe gfs2.example.org
Create the GlusterFS volume to provide to
kolab.example.org:# gluster volume create imap0 gfs1.example.org:/srv/gfs/brick/ gfs2.example.org:/srv/gfs/brick/
Start the new volume:
# gluster volume start imap0
Continue with Configuring the GlusterFS Client.
GlusterFS Distributed Replicated Volume¶
This part of the HOWTO assumes we are expanding a GlusterFS Replicated Volume and you already have followed Configuring the GlusterFS Client.
We’ll be expanding the GlusterFS storage volume from 10GB to 20GB, by configuring the GlusterFS volume to become a distributed volume (on top of being replicated).
The number of nodes required for this is 4 – distributing files over two bricks, each of which replicate with a replica brick. We will therefore add nodes:
System
gfs3.example.orgwith a second disk volume vdb of 10GB and IP address 192.168.122.13.System
gfs4.example.orgwith a second disk volume vdb of 10GB and IP address 192.168.122.14.
Partition
/dev/vdbongfs3andgfs4as follows:# parted /dev/vdb GNU Parted 3.1 Using /dev/vdb Welcome to GNU Parted! Type 'help' to view a list of commands. # mklabel gpt Warning: The existing disk label on /dev/vdb will be destroyed and all data on this disk will be lost. Do you want to continue? Yes/No? yes # unit GB # mkpart primary 0GB 10GB # set 1 lvm on
Create a physical volume, then a volume group, then a logical volume on both
gfs3andgfs4:# pvcreate /dev/vdb # vgcreate vg_gfs /dev/vdb # lvcreate -L 9GB -n lv_brick vg_gfs
Note
The logical volume
lv_brickleaves 10% of the volume group unused for two purposes:Filesystem checks can be performed on a logical volume snapshot, without interrupting the storage availability, and
Backups can be made using logical volume snapshots without interrupting storage availability.
On both
gfs3andgfs4, create a filesystem on the new logical volume:# mkfs.ext4 /dev/vg_gfs/lv_brick
Create a mount point for the filesystem:
# mkdir -p /srv/gfs
Configure the mount to be made on system startup and mount:
# echo "/dev/vg_gfs/lv_brick /srv/gfs ext4 defaults 1 2" >> /etc/fstab # mount -a
Create the directory to be exported as a brick:
# mkdir -p /srv/gfs/brick
Warning
Do not use the filesystem root directory
/srv/gfs/as the brick to export, for itslost+found/directory will be rendered corrupt and useless.Install the
glusterfs,glusterfs-fuseandglusterfs-serverpackages ongfs3andgfs4:# yum -y install glusterfs{,-fuse,-server}Start the glusterd service and configure it to start when the system boots:
# service glusterd start # chkconfig glusterd on
Use
gfs1and probe the new GlusterFS nodes:# gluster peer probe gfs3.example.org # gluster peer probe gfs4.example.org
Add the new bricks to the existing volume:
# gluster volume add-brick imap0 gfs3.example.org:/srv/gfs/brick gfs4.example.org:/srv/gfs/brick
Rebalance the bricks (use
gfs1orgfs2):# gluster volume rebalance imap0 start # watch -n 1 gluster volume rebalance imap0 status
When the rebalancing of the volume has been completed, remounting the volume on the GlusterFS client(s) makes it appreciate the change in storage volume.
# mount -o remount /var/spool/imap/
![digraph {
nodesep=1
"Kolab Server" -> "GlusterFS"
subgraph cluster_gluster {
"GlusterFS" -> "Brick #1", "Brick #2", "Brick #3", "Brick #4";
subgraph {
rank=same;
"Brick #1" -> "Brick #2" [dir=both];
"Brick #3" -> "Brick #4" [dir=both];
}
}
}](../_images/graphviz-6a14e622be448950c29b703dcd78c8386621c8a5.png)
Configuring the GlusterFS Client¶
Using kolab.example.org, this procedure configures the GlusterFS client to
mount the imap0 volume.
Install the
glusterfsandglusterfs-fusepackages:# yum -y install glusterfs{,-fuse}Configure the mount to be made on system startup and mount:
# echo "gfs.example.org:/imap0 /var/spool/imap/ glusterfs defaults,_netdev 0 0" >> /etc/fstab # mount -a -t glusterfs
Change the directory ownership back to its original owner and group:
# chown cyrus:mail /var/spool/imap/ # chmod 750 /var/spool/imap/
FAQ¶
What happens when a GlusterFS node fails?¶
In a replica n volume, n-1 nodes can fail. For each individual brick, at least one replica must stay alive.
In situations where you might expect or are required take into account the failure of multiple nodes (that are replicas) simultaneously, such as might be the case when using old desktop PCs for your storage, you should increase the number of replicas.
There is a significant initial performance hit for the GlusterFS client, as it merely starts to realize one of the volume’s bricks is no longer available.
The write performance should not be impacted significantly, but the read performance is – not unlike with RAID 1 replicated disk volume.
You can find peers that are unavailable as being disconnected:
# gluster peer status Number of Peers: 3 Hostname: gfs2.example.org Uuid: 5e68482a-4164-4cfb-af2c-61a64cf894a7 State: Peer in Cluster (Connected) Hostname: gfs3.example.org Uuid: 89073c71-1cf7-4d6e-af93-dab8f13cee14 State: Peer in Cluster (Disconnected) Hostname: gfs4.example.org Uuid: fb7db59d-aaee-4dcc-98e3-c852243c8024 State: Peer in Cluster (Connected)
When the node comes back online, it will automatically repair itself before it is deemed connected. During the downtime, and during the repair, it is crucially important the other replica(s) does not fail as well.
Replica x, Distribute y - how much storage, how many nodes?¶
The total storage volume available is impacted most significantly by the number of replicas – the distribution is a JBOD aggregation of volumes.