Software RAID monitoring w/notifications Ubuntu Server 10.04

Ubuntu Linux Specific Guides
Post Reply
User avatar
dedwards
Site Admin
Posts: 70
Joined: Wed Mar 15, 2006 8:28 pm
Contact:

Software RAID monitoring w/notifications Ubuntu Server 10.04

Post by dedwards » Wed Mar 17, 2010 9:36 am

This how-to assumes you have Ubuntu Server installed with Postfix. Postfix or a similar MTA is required and not covered by this tutorial. This guide was tested on Ubuntu Server 10.04. For the purpose of this tutorial we'll call the array /dev/md0.

1. Install mdadm. It should had been installed automatically when you tried to setup the array. Either way, it's here for informational purposes.

Code: Select all

sudo apt-get install mdadm


Accept all dependencies. If you get prompted to choose the server configuration type for Postfix, ensure you select Internet Site. Once everything is installed, ensure that mdadm is set to start automatically as a service by editing the /etc/default/mdadm file:

Code: Select all

vi /etc/default/mdadm


Ensure the following fields are set as follows:

Code: Select all

AUTOCHECK=true
START_DAEMON=true


Save the file and then start the service:

Code: Select all

sudo service mdadm restart


Ensure the service starts with no errors.

Ensure that postfix is installed:

Code: Select all

sudo apt-get install postfix


If it's installed you will get the following message:

Code: Select all

postfix is already the newest version.
postfix set to manually installed.
0 upgraded, 0 newly installed, 0 to remove and 2 not upgraded.


If it's not installed you will get prompted for the server configuration type. You must choose Internet Site. At the System name prompt enter a fully qualified host name that can be accessible from the outside. All the rest of the prompts can left at default.

Once installed, edit the /etc/postfix/main.cf file and verify the following fields:

Code: Select all

myhostname = hostname.domain.tld
myorigin = domain.tld
mydestination =


You may have to add the "myorigin" directive.

Ensure the "domain.tld" is a real resolvable domain name from the outside world or emails informing of a problem with your array will NOT be delivered!!. I would also be nice if you actually owned that domain also.

Reload and re-start postfix:

Code: Select all

postfix reload
service postfix restart


2. Next, edit /etc/mdadm/mdadm.conf file and then edit the "MAILADDR" entry and enter your email address in the "someone@yourdomain.com" and add the "MAILFROM" entry like it appears below substituting "raidhost@yourdomain.com" to the host name of your server with the array you are monitoring:

Code: Select all

sudo vi /etc/mdadm/mdadm.conf


Code: Select all

# instruct the monitoring daemon where to send mail alerts
MAILADDR someone@yourdomain.com
MAILFROM raidhost@yourdomain.com - mdadm


Next, restart mdadm:

Code: Select all

sudo service mdadm restart


3. Ensure the mdadm email notifications work by sending a test email. On the console/putty prompt issue the following command:

Code: Select all

sudo mdadm --monitor --scan --test --oneshot


Check your email. You should have an email similar to the one below:

This is an automatically generated mail message from mdadm
running on raidhost
A TestMessage event had been detected on md device /dev/md0.
Faithfully yours, etc.
P.S. The /proc/mdstat file currently contains the following:
Personalities : [raid1] [raid6] [raid5] [raid4]
md0 : active raid1 sda1[0] sdb1[1]
152512 blocks [2/2] [UU]
unused devices: <none>


An interesting thing to remember. You will receive a separate email for each separate array that you have.

4. Simulate a drive failure and see if the notifications actually work. First get a listing of your hard drive devices for your array by issuing the following command:

Code: Select all

sudo mdadm --detail /dev/md0


You should get something similar to the list below:

Code: Select all

/dev/md0:
        Version : 0.90
  Creation Time : Sun Sep  6 11:14:29 2009
     Raid Level : raid1
     Array Size : 152512 (148.96 MiB 156.17 MB)
  Used Dev Size : 152512 (148.96 MiB 156.17 MB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0
    Persistence : Superblock is persistent
    Update Time : Thu Jan 14 17:31:52 2010
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
           UUID : f1083564:3bbcaf72:8be9ee6c:8f9af153
         Events : 0.72
    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       8       17        1      active sync   /dev/sdb1


Pick a drive you want to simulate as failing. I picked /dev/sdb1. Now set it as faulty:

Code: Select all

sudo mdadm --manage --set-faulty /dev/md0 /dev/sdb1


You should get an email right after you issue the command above. It will look similar to the email below:

This is an automatically generated mail message from mdadm
running on raidhost
A Fail event had been detected on md device /dev/md0.
It could be related to component device /dev/sdb1.
Faithfully yours, etc.
P.S. The /proc/mdstat file currently contains the following:
Personalities : [raid1] [raid6] [raid5] [raid4]
md0 : active raid1 sda1[0] sdb1[1]
152512 blocks [2/2] [UU]
md1 : active raid1 sda5[0] sdb5[2](F)
956116352 blocks [2/1] [U_]
unused devices: <none>



If not, there is something wrong. Make sure mdadm is running and check the system logs for any errors. Assuming you got the email, let's fix the array again. First, remove the "failed" drive from the array:

Code: Select all

sudo mdadm /dev/md0 -r /dev/sdb1


Now, let's add it back to the array:

Code: Select all

sudo mdadm /dev/md0 -a /dev/sdb1


Now, if you check the array:

Code: Select all

sudo mdadm --detail /dev/md0


You should get an output similar to the one below:

Code: Select all

Version : 0.90
  Creation Time : Sun Sep  6 11:03:28 2009
     Raid Level : raid1
     Array Size : 956116352 (911.82 GiB 979.06 GB)
  Used Dev Size : 956116352 (911.82 GiB 979.06 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 1
    Persistence : Superblock is persistent
    Update Time : Thu Jan 14 17:47:13 2010
          State : clean, degraded, recovering
 Active Devices : 1
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 1
 Rebuild Status : 1% complete
           UUID : a23fde81:9fa21978:30ea25c1:55f18cff
         Events : 0.1634
    Number   Major   Minor   RaidDevice State
       0       8        5        0      active sync   /dev/sda1
       2       8       21        1      spare rebuilding   /dev/sdb1

If you look under the "State" section you will see that the array is clean and it's simply rebuilding. That's it!
Post Reply