Creating a GlusterFS Volume on CentOS 7

Need HA for a dataset? Sync your data across multiple nodes using GlusterFS.

Overview

GlusterFS is the perfect tool to sync data across multiple nodes. Think of GlusterFS as a {Dropbox, Box, Google Drive} sharing solution that you host yourself and works like an NFS share. Here are some common questions and answers:

Q: How does GlusterFS differ from a simple NFS share?
A: GlusterFS replicates your data across all nodes in the pool. This makes sure that if any node goes offline, that data is still available. If you were using an NFS share and the NFS server were to go offline, all of your clients would not have access to the data.

Q: Why not use BitTorrent Sync (now Resilio) instead of GlusterFS?
A: The commercial version of BitTorrent Sync is not free while GlusterFS is.

Q: Why not use {Dropbox, Box, Google Drive} instead of GlusterFS?
A: First, think about all of that traffic going across your WAN when a file is added or modified. Using GlusterFS keeps traffic internal. Second, the cost of storing data in the cloud can be high for large datasets. Last, if storing sensitive information, the cloud probably shouldn’t be used (of course, this varies from case-to-case).

Q: Why would anyone need to use GlusterFS?
A: Think HA for web server root, documents, photos, configurations, etc.

Environment: CentOS 7 w/SELinux disabled

Installing and Configuring GlusterFS (Server)

Adding a Disk and Formatting a Partition (on each node)

Let’s start by adding a disk to each node and creating an empty partition.

fdisk /dev/sdc
n
p
(default)
(default)
(default)
w

Now that we have a new empty partition on our newly added disk, let’s format it.

mkfs.xfs -i size=512 /dev/sdc1

Install GlusterFS (on each node)

First, we’ll need to install the centos-release-gluster package since we’re on CentOS.

yum install -y centos-release-gluster

Now, we can go ahead and install glusterfs-server.

yum install -y glusterfs-server

Now that GlusterFS is installed, let’s enable and start the service.

systemctl enable glusterd
systemctl start glusterd

Creating the Connection

Let’s setup our node-to-node connection. On storage-a, probe storage-b:

gluster peer probe storage-b.example.com

That command should return with: peer probe: success. You can verify this connection from storage-b by running:

gluster peer status

Creating and Using the Volume

Let’s create a folder where our data where live (on both nodes):

mkdir -p /data/Documents

Now that our folder has been created, we can create the GlusterFS volume (on one node only).

gluster volume create Documents replica 2 transport tcp storage-a.example.com:/data/Documents storage-b.example.com:/data/Documents force

This should return: volume create: Documents: success: please start the volume to access data. The final step (server side) is to start the volume (on one node only).

gluster volume start Documents

This should return: volume start: Documents: success. You can confirm your setup by running:

[root@storage-a ~]# gluster volume info
  
Volume Name: Documents
Type: Replicate
Volume ID: 342bdcd5-c486-4360-a85e-3e33c8e04c40
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 10.211.55.3:/data/Documents
Brick2: 10.211.55.5:/data/Documents
Options Reconfigured:
transport.address-family: inet
nfs.disable: on

Installing and Using GlusterFS (Client)

Let’s install the GlusterFS client.

yum install -y centos-release-gluster
yum install -y glusterfs-client

We’ll need to create a mount point for this storage:

mkdir /mnt/Documents

Mounting the GlusterFS share is easy and quick:

mount -t glusterfs storage-a.example.com:/Documents /mnt/Documents

Try to create a file on one of the nodes and see if it replicates over. You should have an instant duplicate on each node in the pool.