Using tar for full and incremental backups

There is a lot of confusion about tar when it is used for incremental backups. There are plenty of utilities that attempt to automate backup and restoration but often, in their attempts to make it simpler they make life harder.

The tar manual has everything we need. Use it if my notes below are too pragmatic.

Here is how we can use tar to create a simple backup strategy.

We'll use a little test file system so we'll create it with a 'make_base_root' script:

#!/bin/bash
# Make (or remake) base root folder and files for testing

# Remove existing test folders and files
if [ -d root ]; then
        rm -R root
fi

# Make new test folders and files
mkdir root
cd root
mkdir foo
mkdir bar
echo 'a base content' > a
cd foo
echo 'b base content' > b
mkdir baz
cd ../bar
echo 'c base content' > c
cd ../foo/baz
echo 'd base content' > d

There are two kinds of backup. The simplest type is a 'full' or level 0 backup. This is a compressed copy of all of the files from the target. We use tar to create a full compressed backup thus:

~/wk/tar $ tar -czpf root_backup.tar.gz root
~/wk/tar $ 

To test our backup, we'll create a little 'change_root' script to create a new version of our base folder:

#!/bin/bash
#Make some arbitrary changes to root (made with make_base_root)

# make base root if not present
if [ ! -d root ]; then
        make_base_root
fi

# Make some changes
cd root
echo 'e new file content' > e
cd foo
echo 'b added content' >> b
rm -R baz
cd ../bar
mkdir new
cd new
echo 'f new file content' > f

Now we can delete our corrupt or badly changed root and extract our tar to restore root to its backed up state:

~/wk/tar $ rm -R root
~/wk/tar $ tar -xzf root_backup.tar.gz 

Note that deletion - its essential if we are to make sure none of the post backup additions get left hanging around.

This is good but we are copying all our files every time that we backup. If you are using this to backup to somewhere offline (or elsewhere online such as s3) the time taken and space used can quickly become large. Large enough to cause prohibitive cost and performance issues.

This is where the second kind of backup, 'incremental' backup can become essential.

An incremental backup considers some known backup and just stores the changes relative to that backup. A 'level 1' incremental backup is a backup based on a full ('level 0') backup. I could create a 'level 2' incremental backup based on a 'level 1' incremental backup if space is really a premium. It will save even more space but it's probably overkill for desktop PC and most web servers because it can make restoration painful - requiring each level incremental tar be restored in order.

To understand this issue, consider a weekly backup. If I create a full backup (level 0) on Monday and daily incrementals through to the following Sunday, I could use either

  • simple case: repeated level 1 incrementals or
  • complex case: level 1 to level 7 incrementals.

In the simple case, a file created on Tuesday would be copied in each of the subsequent level 1 incremental backups. In the complex case it would only be stored in Tueday's (level 1) backup so the complex case can potentialy save quite a bit of space. The catch is that in the complex case, restoring a folder to Sunday's state requires the sequential restoration of 7 tars, 6 tars for Saturday and so on down to just one for Monday. For large systems or systems using automated restoration this is worth the effort, especially if the file system is changing a lot. For simple systems (like a typical desktop PC) its easier to use the simple case.

To keep it simple, I create a series of level 1 incremental backups. This allows a two step restoration process at the cost of some redundant data storage.

To use incremental backups, we have to first start over with our full backup because tar needs to create a 'snapshot' file along with the full backup to facilitate the subsequent incremental backup:

~/wk/tar $ rm *~
~/wk/tar $ rm root_backup.tar.gz 
~/wk/tar $ rm -r root
~/wk/tar $ make_base_root
~/wk/tar $ # Now we are back where we started
~/wk/tar $ tar --listed-incremental root_backup.snar -czpf root_backup_full.tar.gz root
~/wk/tar $ 

This tar call creates our full backup exactly like the normal one (it restores the same way too). It also creates the root_backup.snar file which can be used to tell our subsequent uses of tar what is in the full backup. We have to be a little bit careful with this. If the root_backup.snar file already existed then tar would have made us a one level greater incremental backup than the tar described in the snar (read that out loud :). As there was no file present we got a level 0 = 'full' backup.

So lets make our level 1 incremental backup:

~/wk/tar $ change_root
~/wk/tar $ cp root_backup.snar root_backup.snar.bak
~/wk/tar $ # Now I will use a copy of root_backup.snar.bak
~/wk/tar $ # for any subsequent incremental backups
~/wk/tar $ tar --listed-incremental root_backup.snar -czpf root_backup_incremental.tar.gz root
~/wk/tar $ 

OK so now I have my full back up and an incremental backup. I can make more Level 1 incremental backups so long as I use a copy of my root_backup.snar.bak file as their --listed-incremental file. The copy is necessary because tar changes the snar file to reflect its work. If I reused that file I would get a level 2 backup then a level 3 backup...

Restoring the full backup is as above. To restore the incremental backup I have to first restore the full backup and then restore the incremental backup over it. If I do this naively it will 'kind of' work but the files will just be added:

~/wk/tar $ rm -r root
~/wk/tar $ tar -xzpf root_backup_full.tar.gz
~/wk/tar $ tar -xzpf root_backup_incremental.tar.gz 
~/wk/tar $ ls root/foo
b  baz

Notice that baz is still present even though the folder was deleted in the changes. If I want my restoration to really reflect the state of the folder when I made the incremental backup then I must tell tar that this is what I want so proper restoration of any level 1 incremental backup is done this way:

~/wk/tar $ rm -r root
~/wk/tar $ tar -xzpf root_backup_full.tar.gz 
~/wk/tar $ tar --incremental -xzpf root_backup_incremental.tar.gz 
~/wk/tar $ ls root/foo
b

So by telling tar that we're restoring the incremental backup, it deletes any files that should not be there for us. Pretty smart of it. If you want to see what is really in the incremental tar (archive managers wont show you the stuff its going to delete for you), you can use the --incremental and two verbose flags thus:

~/wk/tar $ tar  --incremental
... $ -tvvzpf root_backup_incremental.tar.gz 
drwxr-xr-x paul/paul	45 2010-10-26 16:26 root/
N a
D bar
Y e
D foo
R root/foo/baz
T root/bar/new

drwxr-xr-x paul/paul    9 2010-10-26 16:26 root/bar/
N c
D new

drwxr-xr-x paul/paul    4 2010-10-26 16:26 root/bar/new/
Y f

drwxr-xr-x paul/paul    4 2010-10-26 16:26 root/foo/
Y b

-rw-r--r-- paul/paul    19 2010-10-26 16:26 root/e
-rw-r--r-- paul/paul    19 2010-10-26 16:26 root/bar/new/f
-rw-r--r-- paul/paul    31 2010-10-26 16:26 root/foo/b
~/wk/tar $ 

This output is meant to be human readable. 'N' tells you "the file is Not in the archive", 'Y' tells you "Yes it is in the archive". There is a full explanation of this 'dumpdir' stuff in the tar manual.

This gives you the bare bones to create a full backup and an arbitrary number of level 1 incremental backups using tar. You can use scripts such as amanda or backup-manager to automate this process but in attempting to simplify, they introduce complexities of their own that you may or may not want. Sometimes its just more convenient to write your own scripts!


Comments

Jack 4 years, 1 month ago

Possible security risk in the first script
Hey,

if some mindless admin with root account access will run the first script in / he will have a really bad time...

Link | Reply

Paul Whipp 4 years, 1 month ago

He certainly would! However, if he is that self destructive he'd probably have never made it that far ;)

Seriously, anyone running as root should know better. If they don't, what is to stop them typing 'rm *' when in /?

Link | Reply

Dave 4 years ago

Thanks, very helpful guide, whipped up scripts for full and incremental personal backups this PM while watching the Patriot's lose.

Link | Reply

Dave Stadnick 3 years, 11 months ago

Thank you - very nicely done & explained.

Link | Reply

New Comment

required

required (not published)

optional

Australia: 07 3103 2894

International: +61 410 545 357