There is a lot of confusion about tar when it is used for incremental backups. There are plenty of utilities that attempt to automate backup and restoration but often, in their attempts to make it simpler they make life harder.
The tar manual has everything we need. Use it if my notes below are too pragmatic.
Here is how we can use tar to create a simple backup strategy.
We'll use a little test file system so we'll create it with a 'make_base_root' script:
#!/bin/bash # Make (or remake) base root folder and files for testing # Remove existing test folders and files if [ -d root ]; then rm -R root fi # Make new test folders and files mkdir root cd root mkdir foo mkdir bar echo 'a base content' > a cd foo echo 'b base content' > b mkdir baz cd ../bar echo 'c base content' > c cd ../foo/baz echo 'd base content' > d
There are two kinds of backup. The simplest type is a 'full' or level 0 backup. This is a compressed copy of all of the files from the target. We use tar to create a full compressed backup thus:
~/wk/tar $ tar -czpf root_backup.tar.gz root ~/wk/tar $
To test our backup, we'll create a little 'change_root' script to create a new version of our base folder:
#!/bin/bash #Make some arbitrary changes to root (made with make_base_root) # make base root if not present if [ ! -d root ]; then make_base_root fi # Make some changes cd root echo 'e new file content' > e cd foo echo 'b added content' >> b rm -R baz cd ../bar mkdir new cd new echo 'f new file content' > f
Now we can delete our corrupt or badly changed root and extract our tar to restore root to its backed up state:
~/wk/tar $ rm -R root ~/wk/tar $ tar -xzf root_backup.tar.gz
Note that deletion - its essential if we are to make sure none of the post backup additions get left hanging around.
This is good but we are copying all our files every time that we backup. If you are using this to backup to somewhere offline (or elsewhere online such as s3) the time taken and space used can quickly become large. Large enough to cause prohibitive cost and performance issues.
This is where the second kind of backup, 'incremental' backup can become essential.
An incremental backup considers some known backup and just stores the changes relative to that backup. A 'level 1' incremental backup is a backup based on a full ('level 0') backup. I could create a 'level 2' incremental backup based on a 'level 1' incremental backup if space is really a premium. It will save even more space but it's probably overkill for desktop PC and most web servers because it can make restoration painful - requiring each level incremental tar be restored in order.
To understand this issue, consider a weekly backup. If I create a full backup (level 0) on Monday and daily incrementals through to the following Sunday, I could use either
In the simple case, a file created on Tuesday would be copied in each of the subsequent level 1 incremental backups. In the complex case it would only be stored in Tueday's (level 1) backup so the complex case can potentialy save quite a bit of space. The catch is that in the complex case, restoring a folder to Sunday's state requires the sequential restoration of 7 tars, 6 tars for Saturday and so on down to just one for Monday. For large systems or systems using automated restoration this is worth the effort, especially if the file system is changing a lot. For simple systems (like a typical desktop PC) its easier to use the simple case.
To keep it simple, I create a series of level 1 incremental backups. This allows a two step restoration process at the cost of some redundant data storage.
To use incremental backups, we have to first start over with our full backup because tar needs to create a 'snapshot' file along with the full backup to facilitate the subsequent incremental backup:
~/wk/tar $ rm *~ ~/wk/tar $ rm root_backup.tar.gz ~/wk/tar $ rm -r root ~/wk/tar $ make_base_root ~/wk/tar $ # Now we are back where we started ~/wk/tar $ tar --listed-incremental root_backup.snar -czpf root_backup_full.tar.gz root ~/wk/tar $
This tar call creates our full backup exactly like the normal one (it restores the same way too). It also creates the root_backup.snar file which can be used to tell our subsequent uses of tar what is in the full backup. We have to be a little bit careful with this. If the root_backup.snar file already existed then tar would have made us a one level greater incremental backup than the tar described in the snar (read that out loud :). As there was no file present we got a level 0 = 'full' backup.
So lets make our level 1 incremental backup:
~/wk/tar $ change_root ~/wk/tar $ cp root_backup.snar root_backup.snar.bak ~/wk/tar $ # Now I will use a copy of root_backup.snar.bak ~/wk/tar $ # for any subsequent incremental backups ~/wk/tar $ tar --listed-incremental root_backup.snar -czpf root_backup_incremental.tar.gz root ~/wk/tar $
OK so now I have my full back up and an incremental backup. I can make more Level 1 incremental backups so long as I use a copy of my root_backup.snar.bak file as their --listed-incremental file. The copy is necessary because tar changes the snar file to reflect its work. If I reused that file I would get a level 2 backup then a level 3 backup...
Restoring the full backup is as above. To restore the incremental backup I have to first restore the full backup and then restore the incremental backup over it. If I do this naively it will 'kind of' work but the files will just be added:
~/wk/tar $ rm -r root ~/wk/tar $ tar -xzpf root_backup_full.tar.gz ~/wk/tar $ tar -xzpf root_backup_incremental.tar.gz ~/wk/tar $ ls root/foo b baz
Notice that baz is still present even though the folder was deleted in the changes. If I want my restoration to really reflect the state of the folder when I made the incremental backup then I must tell tar that this is what I want so proper restoration of any level 1 incremental backup is done this way:
~/wk/tar $ rm -r root ~/wk/tar $ tar -xzpf root_backup_full.tar.gz ~/wk/tar $ tar --incremental -xzpf root_backup_incremental.tar.gz ~/wk/tar $ ls root/foo b
So by telling tar that we're restoring the incremental backup, it deletes any files that should not be there for us. Pretty smart of it. If you want to see what is really in the incremental tar (archive managers wont show you the stuff its going to delete for you), you can use the --incremental and two verbose flags thus:
~/wk/tar $ tar --incremental ... $ -tvvzpf root_backup_incremental.tar.gz drwxr-xr-x paul/paul 45 2010-10-26 16:26 root/ N a D bar Y e D foo R root/foo/baz T root/bar/new drwxr-xr-x paul/paul 9 2010-10-26 16:26 root/bar/ N c D new drwxr-xr-x paul/paul 4 2010-10-26 16:26 root/bar/new/ Y f drwxr-xr-x paul/paul 4 2010-10-26 16:26 root/foo/ Y b -rw-r--r-- paul/paul 19 2010-10-26 16:26 root/e -rw-r--r-- paul/paul 19 2010-10-26 16:26 root/bar/new/f -rw-r--r-- paul/paul 31 2010-10-26 16:26 root/foo/b ~/wk/tar $
This output is meant to be human readable. 'N' tells you "the file is Not in the archive", 'Y' tells you "Yes it is in the archive". There is a full explanation of this 'dumpdir' stuff in the tar manual.
This gives you the bare bones to create a full backup and an arbitrary number of level 1 incremental backups using tar. You can use scripts such as amanda or backup-manager to automate this process but in attempting to simplify, they introduce complexities of their own that you may or may not want. Sometimes its just more convenient to write your own scripts!
Australia: 07 3103 2894
International: +61 410 545 357