Basic Backup Script in BASH


Introduction
Everyone should be backing up their data. This just doesn't go towards sysadmins, but even people at home who never even think of it. Just like everything else in I.T., hard drives were built to fail. If you do not make efficient back ups, you are at your own mercy when your drive will no longer spin up, meaning all that data is now gone.

I try to make as many of my tasks as easy as possible simply by using Bash scripting. Its the most portable language meaning 99.9% of the time, you aren't going to need bash to run the script...and even if so, most of the code I write can work with other shells as well.

I'm going to copy/paste parts of my script, and give details about each block of code. At the end, I'll also provide a link to the full script.

What WIll This (Not) Do?
This script will do the following:



  • Back up the specified directories
  • Create a directory based on month & date
  • Create a separate backup for each given device
  • E-mail you the results (this assumes a working SSMTP, which is out of the scope of this particular guide)
  • Pretty much everything else is not going to be done. This is a quick and dirty backup solution, and isn't intended to be the absolute answer. I wrote this originally for my own needs and my needs only.

    Lets Begin
    DAY=`date +%d`
    
    MONTH=`date +%m`
    
    YEAR=`date +%Y`

    Here we get various information needed for storing our backup information. Pretty straight forward with a smile at the end if I do say so myself. Pretty easy, eh?

    # Backup directory to use (2011/08/31 for 08.31.2011)
    BKDIR="/backups/$YEAR/$MONTH/$DAY"
    
    if [ ! -d "$BKDIR" ]; then
            mkdir -p $BKDIR
    fi

    We make the directory where the backup files will be stored (-p ensures any missing directories will be made). The script must have write permissions to the BKDIR (in this case /backups/), or else it will fail.

    BKLOG="/backups/$YEAR/$MONTH/$DAY.log"

    Log of information (will make more sense later). I like things being consistent.

    ARRPOS=0

    I'll be honest, I can't really explain this, but it is used, and should make more sense when you actually see its use. Its like trying to explain to someone new to computers how a keyboard makes a letter appear...you just tell them "just press the key to see its power" if you want to keep them interested in you.

    DRIVE=('sda' 'sdb')

    This backup solution is device-based, and my server has two devices (one main, another with misc. data). The end result will basically be $BKDIR/$DRIVE[$ARRPOS].tar.gz (i.e.: /backups/2012/01/26/sda.tar.gz)

    BACKUP=('/' '/pub')

    What to back up on each device (for me, this is backing up everything).

    SDAEX=('/media' '/tmp' '/dev' '/proc' '/sys' '/mnt' '/pub' '/var/cache' '/backups')

    A lot of these aren't needed, and we also do not want to back up our back ups by default.

    touch $BKLOG

    Create an empty file for the log file, making sure it can be made.

    echo "To: someone@somewhere.com" > $BKLOG
    echo "From:Backups " >> $BKLOG
    echo -e "Subject: Generated backup report for `hostname` on $YEAR.$MONTH.$DAYn" >> $BKLOG
    
    echo -e ">> Backup for: $YEAR.$MONTH.$DAY started @ `date +%H:%M:%S`n" >> $BKLOG

    The purpose of $BKLOG is to log the status of the back up process. When it is done, we will be e-mailing the report out (see "To:" field). You can change this however you want, this is how I did it for myself.

    # Checks to see if day = 1, and if so, backs up the last month's backups
    if [ "$DAY" == "01" ]; then
            M=`echo -n $MONTH | awk '{printf substr($1,2)}'`
            let OLD=M-1
    
            echo "- New month detected.  Backing up previous month's ($OLD) backups." >> $BKLOG
            echo "   + Backup file: /backups/$YEAR/$OLD.tar.gz" >> $BKLOG
            SD=$( { time tar -cpPzf /backups/$YEAR/$OLD.tar.gz /backups/$YEAR/$OLD/; } 2>&1 )
    
            # Got stats, delete folder
            rm -rf /backups/$YEAR/$OLD
    
            SD=`echo -n "$SD" | grep real`
            MIN=`echo -n "$SD" | awk '{printf substr($2,0,2)}'`
            SEC=`echo -n "$SD" | awk '{printf substr($2,3)}'`
            echo -e "- done [ $MIN $SEC ].n" >> $BKLOG
    fi

    As the comment block states, this is a monthly backup that occurs. It backs up the previous month's backups before starting one for the current day. This, combined with other routines put into the system keeps backups for a lengthy amount of time. This is also why we excluded /backups from our routine...WAY too many back ups of back ups if you ask me.

    One thing I want to talk about, since its the meat of the actual back up routine, is this line:
    tar -cpPzf /backups/$YEAR/$OLD.tar.gz /backups/$YEAR/$OLD/

    This is basically telling tar to create (-c) a gunzipped (-z) back up file (-f) named /backups/$YEAR/$OLD.tar.gz containing the data found in /backups/$YEAR/$OLD/ directory, preserving permissions (-p), using absolute file names (-P) basically not stripping "/" from the beginning of the file name. The -P switch is used because it makes the output ugly, and it can lead to a broken back up.

    Continuing on...

    # Cycle through each drive and back up each
    for d in "${DRIVE[@]}"; do
            echo "- Backing up drive $d" >> $BKLOG
    
            # By default, at least don't backup lost+found directories
            EX="--exclude=lost+found"
    
            # If we are backing up drive 1 (/dev/sda), there's to exclude
            if [ $d == "sda" ]; then
                    for e in "${SDAEX[@]}"; do
                            EX="`echo -n $EX` --exclude=$e"
                    done
            fi
    
            # Do the magic work and display some cool info
            SD=$( { time tar -cpPzf $BKDIR/$d.tar.gz $EX ${BACKUP[$ARRPOS]}; } 2>&1 )
            SD=`echo -n "$SD" | grep real`
            MIN=`echo -n "$SD" | awk '{printf substr($2,0,2)}'`
            SEC=`echo -n "$SD" | awk '{printf substr($2,3)}'`
            SD=$(ls -liha $BKDIR/$d.tar.gz)
            SIZE=`echo -n $SD | awk '{printf $6}'`
    
            let ARRPOS++
    done

    This is the code that does the backing up of current data. This is also where we need ARRPOS. This is all pretty much self explanatory as well to be honest. Biggest change here is, besides the array wrapped in a for block, we get the file size of the created back up file. So, lets re-wind a little bit here and look at the for block...

    Another block of code I didn't discuss earlier (since its in a couple of spots) is this:
            SD=$( { time tar -cpPzf $BKDIR/$d.tar.gz $EX ${BACKUP[$ARRPOS]}; } 2>&1 )
            SD=`echo -n "$SD" | grep real`
            MIN=`echo -n "$SD" | awk '{printf substr($2,0,2)}'`
            SEC=`echo -n "$SD" | awk '{printf substr($2,3)}'`

    If you run the time command, you'll get an output like this:
    [ehansen@sfu ~]$ time
    
    real    0m0.000s
    user    0m0.000s
    sys    0m0.000s

    What the block of code does is measure how long it takes to create the back up file (tar command), and then we only measure the "real" time. The reason why this is done is because even in a multithreaded environment like Linux lets you have, a process may have to stop its thread for a moment to either let another user's program (or system action like signal handling) occur. The "real" time is the actual time it took for a command to execute. We use my best friend, Mr. awk, to parse the data from the information.

    for d in "${DRIVE[@]}"; do

    d will be whatever value is currently at $DRIVE[$ARRPOS]. For example, the first time around, d will be sda, second time it will be sdb.

            # By default, at least don't backup lost+found directories
            EX="--exclude=lost+found"
    
            # If we are backing up drive 1 (/dev/sda), there's to exclude
            if [ $d == "sda" ]; then
                    for e in "${SDAEX[@]}"; do
                            EX="`echo -n $EX` --exclude=$e"
                    done
            fi

    ext3 & 4 file systems create this wonderful file called lost+found. Personally, I'm not a fan of it, because every time I try to restore corrupted data from it, I just get my bottom handed to me, but besides this, its a folder that really should be pointless to include in a routine back up measure. If we're working on the first partition (where /boot, /home, /var, etc... are stored), we exclude some of the more minor files that really mean nothing when the system is shut down. The reason why there's the line "EX="`echo -n $EX` --exclude=$e" is its basically the same as, for example, in PHP or Perl where you can do EX .= " --exclude=$e". Bash, however, is not as friendly with strings and concating.

    # Mail this script out...ssmtp for GMail accounts, otherwise change for 
    #  appropriate MTA
    /usr/sbin/ssmtp -t < $BKLOG

    This is the last of the back up script, just giving out some generic details, and then using ssmtp to send out the report. Nothing to really discuss here.