DRBD: Redundant NFS Storage on CentOS 6
A pair of CentOS NFS servers can be a great way to build an inexpensive, reliable, redundant fileserver. Here we are going to use DRBD
to replicate the data between NFS nodes and Heartbeat
to provide high availability to the cluster. Here we will use a RackSpace Cloud Server with attached Cloud Block Storage.
Make sure that your DNS resolves correctly for each server’s hostname, and to really make sure put an entry in /etc/hosts
. We’ll pretend to use fileserver-1
as the primary and fileserver-2
as the backup and share the /dev/xvdb1
device under the DRBD resource name “data”. It will eventually be available to the filesystem as /dev/drbd1
.
10.0.0.1 fileserver-1 fileserver-1.example.com 10.0.0.2 fileserver-2 fileserver-2.example.com
Install EL Repository
If you don’t already have the EL repository for yum
installed install it using rpm
:
rpm -ivh http://elrepo.org/elrepo-release-6-5.el6.elrepo.noarch.rpm
Install & Configure DRBD
Now install and load the DRBD and its Utils using yum
.
yum install -y kmod-drbd84 drbd84-utils modprobe drbd
Next we need to create a new DRBD resource file by editing /etc/drbd.d/data.res
. Make sure to use the correct IP address and devices for your server nodes.
resource data { startup { wfc-timeout 30; outdated-wfc-timeout 20; degr-wfc-timeout 30; } net { protocol C; cram-hmac-alg sha1; shared-secret "Secret Password for DRBD"; } disk { resync-rate 100M; } syncer { rate 100M; verify-alg sha1; } on fileserver-1 { volume 0 { device minor 1; disk /dev/xvdb1; meta-disk internal; } address 10.0.0.1:7789; } on fileserver-2 { volume 0 { device minor 1; disk /dev/xvdb1; meta-disk internal; } address 10.0.0.2:7789; } }
Run the following commands on each server to initialize the storage medadata, start the DRBD service, and bring up the “data” resource.
drbdadm create-md data service drbd start drbdadm up data
You can monitor the progress by checking /proc/drbd
. It should look something like the following, with a status of “Inconsistent/Inconsistent” being expected at this point.
[root@fileserver-1 ~]# cat /proc/drbd version: 8.4.4 (api:1/proto:86-101) GIT-hash: 599f286440bd633d15d5ff985204aff4bccffadd build by phil@Build64R6, 2013-10-14 15:33:06 1: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r----- ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:209708764
On the primary only run the following command to initialize the synchronization between the two nodes.
drbdadm primary --force data
Again we can monitor the status by watching /proc/drbd
– notice that the status is now “UpToDate/Inconsistent” along with a sync status (at 4.8% in my example).
[root@fileserver-1 ~]# cat /proc/drbd version: 8.4.4 (api:1/proto:86-101) GIT-hash: 599f286440bd633d15d5ff985204aff4bccffadd build by phil@Build64R6, 2013-10-14 15:33:06 1: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r---n- ns:9862244 nr:0 dw:0 dr:9863576 al:0 bm:601 lo:8 pe:2 ua:11 ap:0 ep:1 wo:f oos:199846748 [>....................] sync'ed: 4.8% (195160/204792)M finish: 1:57:22 speed: 28,364 (22,160) K/sec
Once the DRBD device has synced between the two nodes you will see an “UpToDate/UpToDate” message and you are ready to proceed.
[root@fileserver-1 ~]# cat /proc/drbd version: 8.4.4 (api:1/proto:86-101) GIT-hash: 599f286440bd633d15d5ff985204aff4bccffadd build by phil@Build64R6, 2013-10-14 15:33:06 1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----- ns:209823780 nr:8 dw:3425928 dr:206400390 al:1763 bm:12800 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
Format & Mount
Once the device has synchronized between your nodes you can prepare it on the primary node and then mount it. Note that you can only mount the device on one node at a time in a standard Primary/Secondary configuration using traditional filesystems such as ext3, however it is possible to create a Dual Primary configuration in which the data can be accessible from both nodes at the same time but requires the use of a clustered filesystem such as GFS or OCFS2 (Oracle Cluster File System v2) used here.
OCFS2 isn’t available from the default repositories so we have to install the Oracle Open Source yum repository, import their key, and install ocfs2-tools
so we can set up a clustered configuration.
yum -y install yum-utils cd /etc/yum.repos.d wget --no-check-certificate https://public-yum.oracle.com/public-yum-ol6.repo rpm --import http://public-yum.oracle.com/RPM-GPG-KEY-oracle-ol6 yum-config-manager --disable ol6_latest yum -y install ocfs2-tools kernel-uek reboot
You will need to edit /boot/grub/grub.conf to default to the correct kernel – it is very important that the installed driver match the kernel version.
mkfs -t ext3 /dev/drbd1 mkdir -p /mnt/data mount -t ext3 noatime,nodiratime /dev/drbd1 /mnt/data
If you want to test out that the replicated device is in fact replicating, try the following commands to create test file, demote the primary server to the secondary, promote the secondary to the primary, and mount the device on the backup server.
[root@fileserver-1 ~]
cd ~ touch test_file /mnt/data umount /mnt/data drbdadm secondary data
[root@fileserver-2 ~]
drbdadm primary data mount /dev/drbd1 /mnt/data cat /proc/drbd ls -la /mnt/data
Reverse the process to change back to your primary server.
Setup NFS
Next we need to share the replicated storage over NFS so that it can be used by other systems. You’ll need these packages on both nodes of your storage cluster as well as any clients that are going to connect to them.
yum -y install nfs-utils nfs-utils-lib service portmap start
Some guides will tell you to enable the service on boot using chkconfig
however since we will be using Heartbeat to manage the cluster, we don’t want to do this.
Edit the /etc/exports
file to share your directory with your clients.
/mnt/data 10.0.0.0/24(rw,async,no_root_squash,no_subtree_check)
- 10.0.0.0/24 – Share with 10.0.0.0-10.0.0.255
- rw – Read/Write access.
- async – Achieve better performance with the risk of data corruption if the NFS server reboots before the data is committed to permanent memory. The server lies to the client indicating that the write was successful before it actually is.
- no_root_squash – Allow root to connect to this share.
- no_subtree_check – Increases performance but lowers security by preventing parent directory permissions to be checked when accessing shares.
Next all that is left is to connect to the NFS server from your client.
mkdir -p /mnt/data showmount -e fileserver-cluster mount -v -t nfs -o 'vers=3' fileserver-cluster:/mnt/data /mnt/data
Configuring Heartbeat
The last step of this guide should be the configuration of Heartbeat to manage the NFS cluster, however it is omitted as I ended up going a different route and instead used Pacemaker to control DRBD in a Dual Primary configuration. Since you might have come here looking for a HOWTO with Heartbeat as well, the best I can do is provide a link to a Heartbeat Configuration Guide on the DRBD site.
Quick question:
Is the /dev/xvdb1 a shared device presented to each node from a SAN/NAS, or is it an LVM device on each node?
A bit of both? They were LVM mapped devices but hosted on external block storage. This particular setup was at RackSpace so each NFS server was a VM that mounted a block storage device which was shared over NFS and synced with DRBD.
Just found what looks like a great guide, but: where’re you configuring fileserver-cluster and heartbeat?
Oops, I got busted. My original design called for using Heartbeat to manage the cluster (which is why it was indicated here) but ultimately I decided to use GFS2, Pacemaker, and a hardware load balancer for my NFS cluster (http://justinsilver.com/technology/linux/dual-primary-drbd-centos-6-gfs2-pacemaker/). The downside is that it requires some additional hardware ($$$) and configuration but does allow both nodes to be used at the same time.
Since I said I was going to use Heartbeat for the ha-cluster here I’ll update this post (as soon as I can) with the details of what that configuration would look like and give you a ping – thanks for reading!
One other thing you might consider looking at is Gluster, which appears to provide a highly-available distributed and fault-tolerant file system with a much less complicated setup. Dual-primary DRBD would concern me a little, especially if it behaved anything like master-master MySQL does. Given storage is more your field than mine, would you consider giving Gluster a go?
Gluster is definitely a good option to check out. I’m usually more of a cutting edge type person but in this case I had some experience with DRBD & Pacemaker so decided to go that route instead of Gluster, though I would like to play around with it some more (and probably will on future projects). It’s definitely a tricky situation with dual primary which is why the fencing/STONITH setup is so important, but so far I haven’t run into any issues with this config, knock on wood.