Linux Backup – Poor man’s solution

Hard Drive 016
Image by jon_a_ross via Flickr

The importance of backups? It actually differs from case to case, but it seems that in the end 95% of the people do not know how important backups are until their hard drive dies and their data is gone. In the past ten days I had three people complaining about losing their data. Two hard drive failures and one accidental Shift-Delete & Click Yes without reading the warning message. Backing up on Linux is a pain if you are not at least demigod of Linux administration. Linux really lacks a simple backup solution for regular users.

Read on if you want to know more on how to perform simple backup in Linux, what backups are and how to handle them.

DISCLAIMER: These instructions and scripts were written and tested on Ubuntu Linux Server 9.10, Ubuntu Linux Server 8.04 LTS and Ubuntu Linux Desktop 9.10, but they might not work for you. You will need to install openssh-client, openssh-server and rsync if you did not use them before. Before using these scripts for backup you should test them, we take no responsibility if you backed up something and then lost it.

No, that is not a backup

Raid arrays, software or hardware, are not backup solutions, they are just protection in case of a hardware failure. If your hard disk dies raid array will save you a lot of trouble and a lot of time that you would spend to restore the system from backup. Raid array will never save you from a human failure. If you often read mail really fast1 then files will be deleted from all the disk drives in the array at the same time and raid array will not provide any protection from this kind of an error. Raid should be backup supplement not a replacement.

Yes, that is a backup

If you have an external hard drive and you copy your important data on it, then you have at least a partial backup. If you have another computer in the house and you copy important data on it every now and then, then you also have a backup. You need to keep in mind that external drives should be disconnected and unplugged after you make the file transfer. Lightning strike can destroy your computer and all the external devices connected to it, so your external drive has to be disconnected from your computer if you want to call it a backup drive.

Hardware failures

We all had them and we all will have them again. In my fifteen years of experience with hardware I think I saw every single brand of hard drive failing. If there are no faults in the production process2, then it does not really matter which brand of hard drive you buy. They all die and they all have very similar MTBF rating and I still did not see any relevant statistics that proves one brand is better than the other. Make sure that you check your disks on a regular basis, Linux provides many tools for that.

How to backup

Creating backups can be complicated and difficult to setup. I will not bother you with incremental or differential backups, with compression, daily backups, weekly rotation and other similar stuff that is used in a corporate world where big boys play. We will be creating a couple of shell scripts that will help you backup your data to a remote server.

The scripts and the tools

As we said already, Linux backup solutions are really not worthy of their names and that is why we need to make our own solution. Let us assume that you have a web server with data in a MySQL database and you want to perform daily backups of your web server. We will use a couple of simple crontab scripts that are placed in /etc/cron.daily on your web server and on your backup server. The script below you need to put on your web server or any other remote computer that needs to perform any tasks before you start backing it up.

1
2
3
4
5
6
7
#!/bin/bash
 
mv /srv/www/database.sql.bz2 /srv/www/database.sql.bz2.old
/usr/bin/mysqldump --defaults-extra-file=/root/.my.cnf -A --all-databases -u root | /bin/bzip2 > /srv/www/database.sql.bz2
chown www-data:www-data -R /srv/www
find /srv/www -type d -exec chmod g+rwx {} \;
find /srv/www -type f -exec chmod g+rw {} \;

First we move database backup to a new file, overwriting previous old backup. This way we make sure that there is at least one database backup available if sql dump fails for some reason. Then we use mysqldump to dump all the content of all the databases, pipe it through bzip2 to compress them and finally write the output to the file database.sql.bz2 in the /srv/www directory. For a 100 megabyte dump this takes approximately 30 seconds. Password for mysql is stored in the /root/.my.cnf file and it needs to be passed to mysqldump. Password could be supplied in the command line with -p parameter but that would be very insecure. Because a lot of things can happen on the web server during the day we also make sure that all permissions and ownerships are set correctly.

We will call this script PrepBackup, place it in /etc/cron.daily directory and make it executable. In the end the script needs testing, so we execute it.

$ sudo cp PrepBackup /etc/cron.daily/
$ cd /etc/cron.daily
$ sudo chmod 755 PrepBackup
$ sudo ./PrepBackup

Now we need to take care of a timing issue. All Ubuntu servers are executing cron.daily jobs at the same time. We need to run our web server script before we run the actual backup script on the backup server. Cron entries for /etc/cron.* directories are located in /etc/crontab and we will need to edit this file. Look for a line that is similar to this:

25 5	* * *	root	test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily )

Notice the first two numbers on the line. First number represents minutes and the second number represents hours. Simply reduce the number of hours for one and your daily cron scripts will run an hour early. Now that we took care of the web server and we are moving on to the backup server.

Backup is performed with rsync as an unattended process and we need to take care that rsync will be able to login without user intervention. Create an ssh-key for root user on the backup server.

$ sudo ssh-keygen

Default filename should be /root/.ssh/id_rsa and no pass phrase for the key. Now we will need to add public key from the backup server to authorized keys on the web server. We need to do this as user who has access to the /srv/www directory:

$ mkdir ~/.ssh
$ vim ~/.ssh/authorized_keys

Paste the content of the id_rsa.pub file in the authorized_keys file. Pico or Nano or any other editor can be used for editing the file, just make sure that the whole key is one single line of text.

Now onto the backup server, testing if this works.

$ sudo ssh username@your.backup.server.com

If ssh did not ask for a password then we are in business.

The second script that we will need is the one that will actually take care of the backup. We will be backup /srv/www directory on the web server.

1
2
3
4
#!/bin/bash
HOME=/root
 
rsync -avpe ssh user@web.server.net:/srv/www/ /srv/Backup/www/

First we need to set the $HOME properly otherwise we will never know what user is running cron job and where his home is. Put this script in /etc/cron.daily on the backup server. It is wise to test everything manually before using it in the production environment.

$ sudo cp PerformBackup /etc/cron.daily/
$ sudo chmod 755 /etc/cron.daily/PerformBackup
$ sudo -s
# cd /etc/cron.daily
# ./PerformBackup
# exit

If both scripts worked then the content of /srv/www on the web server should be on the backup server in /srv/Backup/www directory together with your database dump. Try changing few files on the web server and run the second script again. It will take much less time to run and all the changed files will be backup up.

History?

The downside of this approach is that you don’t have any history of your backups and you can not restore a file that was available three days ago. PerformBackup script could be modified and a simple cp command inserted in front of the rsync.

cp /srv/Backup/www /srv/Backup/www_`date +%d-%M-%Y`

This will take care of copying backup directory to www_ with the date suffix. Crude, slow and disk space inefficient, but it works if you really need that history and do not want to invest some time and/or money in a different, more advanced solution.



Footnotes:
  1. Using command rm -rf; which removes the target and all its subdirectories. []
  2. Remember IBM’s DeathStar disk series? []




Leave a comment