Table of Contents:
This script provides a quite simple but often sufficient backup mechanism. Presuming it's called by crond once a day, on every first of month it makes a complete backup of the directory tree given as the last commandline argument (--dirs name; see Commandline Options below), eventually replacing last years respective monthly backup (if it existed, that is). Similar on each sunday a full backup is made as well, replacing last sunday's backup file. On weekdays, however, only those files that changed (or were added) since the time of the last full backup (i.e. last sunday of first-of-month, whatever was later) are saved. This incremental backup files are automatically removed after a new weekly backup was successfully created.
So, for every directory you backup this way usually you'll get this files:
If the first day of a month happens to be a sunday, physically there will be one file less: in this special case (which is not that seldom) the months backup is hard-linked with the sunday file, this way preserving processing time and disc space (instead of creating two separate files for month and sunday with identical contents).
Incremental, by the way, is not meant on a true day-by-day base here, but relative to the last full backup (i.e. last sunday or first of month, whichever was later). While this may use a little more disc space it has the benefit that in case of a disaster (und thus the need for restoring your data) you'd only need at most two files: the last full and the last incremental backup. Presuming your machine crashes on friday you'd have to restore the sunday (full backup) and the thursday (incremental backup), in that order. You would not need the monday/tuesday/wednesday files. If your machine crashed on monday, however, you'd only have to deal with the sunday (full) backup.
The following sections discuss in more detail the handling of the created backup archives, the naming conventions and the various optional commandline switches and how they change the normal behaviour of this script as outlined before.
[up to Table of Contents]
Let's assume you're called
Jane (at least as far as the computer-world is concerned), your hostname is
Castle.do.main and you're going to backup your personal home directory which happens to be
/home/users/jane. The destination of the backupfile is a directory exported by some other host within your LAN. I won't get into detail here how to (auto)mount such a remote directory, it's enough to say that Miss Ruth kindly provided a sym-link into your home as
~jane/backup/. Further assuming this script is reachable through your PATH setting, you'd call it in your personal crontab file like:
30 3 * * * incBackup -b ~/backup -d ~
This runs the script at half past three every morning, over the year producing the following files:
/home/users/jane/backup/Castle.home.users.jane-d1.tar.gz /home/users/jane/backup/Castle.home.users.jane-d2.tar.gz /home/users/jane/backup/Castle.home.users.jane-d3.tar.gz /home/users/jane/backup/Castle.home.users.jane-d4.tar.gz /home/users/jane/backup/Castle.home.users.jane-d5.tar.gz /home/users/jane/backup/Castle.home.users.jane-d6.tar.gz /home/users/jane/backup/Castle.home.users.jane-w00.tar.gz /home/users/jane/backup/Castle.home.users.jane-m011.tar.gz /home/users/jane/backup/Castle.home.users.jane-m021.tar.gz /home/users/jane/backup/Castle.home.users.jane-m031.tar.gz /home/users/jane/backup/Castle.home.users.jane-m041.tar.gz /home/users/jane/backup/Castle.home.users.jane-m051.tar.gz /home/users/jane/backup/Castle.home.users.jane-m061.tar.gz /home/users/jane/backup/Castle.home.users.jane-m071.tar.gz /home/users/jane/backup/Castle.home.users.jane-m081.tar.gz /home/users/jane/backup/Castle.home.users.jane-m091.tar.gz /home/users/jane/backup/Castle.home.users.jane-m101.tar.gz /home/users/jane/backup/Castle.home.users.jane-m111.tar.gz /home/users/jane/backup/Castle.home.users.jane-m121.tar.gz
In case that seems to much for your servers disc (or Miss Ruth) to handle and you know what you're doing, you may use the --shortmonth commandline switch (see below) which leaves the month number out, causing each monthly full backup (on the respective first of month) to overwrite the one of the month before. While saving the disc space of 11 full backups, you won't be able to recover lost data if you'd notice the missings only after a months change (since the current months full backup wouldn't contain them anymore). Another option for you could be the --trueincremental switch (see the section Commandline Options below).
As can be seen above, the full path/filenames of the backup archives show a consistent pattern:
/backupdir/hostname.saveddir-WD.tar.gz where W (when) stands for either m (a monthly full backup), w (a weekly/sunday full backup) or d (a daily incremental backup) and D (day) stands for either 00 (weekly), 1-6 (daily) or MMD (monthly) with MM indicating the month and D the day-of month. The hostname, as you see, is used in its short form since a FQDN would only make the filenames much longer without any benefit; and storing the backups from hosts of different domains in one and the same server directory would be a very bad idea anyway. – So it shouldn't be too difficult to figure out the meaning/relevance of a given file in your backup-directory.
If you think about it, this procedure looks quite safe, doesn't it? But, alas, there's still a timeframe where there's a chance to loose data (in case of something silently corrupting your data, that is). Consider a scenario where everything's allright until, say, third of december. At that very day you discover (not without lengthy investigation) that your harddisc seems to have some serious problems with the remapping of bad blocks, and the worst of it: either the disc failed to mention its problem to your operating system (the harddisc device drivers) or the latter didn't care very much. Whatever the technical reason was, the result is that several of your files happen to be, well, deranged, but – unfortunately in this case – still readable as far as the operating system is concerned (otherwise you would have noticed the problems earlier). "Well", you might think while Miss Ruth is replacing the faulty disc, "bad luck but I've got all those backup files on my server, so it's only a question of tar xzf filename to restore the whole stuff."
Let's see, which files are there for you to start with: The sunday/weekly full backup, as it turns out, already contains some corrupted files (this also renders the incrementals as unuseable). Since that very sunday happens to be the first of month (in 2002) the same applies for the monthly backup. The weekly backup before that (which was created on 2002-11-24 and might have had still uncorrupted files) is gone (replaced by the one of 2002-12-01). So you'll have to go back to the november full backup (as of friday 2002-11-01) which means that in effect the changes of a whole month are lost. Not that good at all, wouldn't you say?
This is where the --weeknumber switch comes in: It changes the name of the sunday backups from becoming
.../Castle.home.users.jane-w00.tar.gz in this case to
.../Castle.home.users.jane-w48.tar.gz which in turn means that there's also a file of the sunday before named
.../Castle.home.users.jane-w47.tar.gz and respective ones for all the weeks before that as well. So now – with our made up scenario – it's no longer a matter of the way the backup files are rotated (replaced) by newer ones to see how long you have to go back but only depends on how long your faulty disc silently corrupted your files. Probably you'll have to go back just one week, may be several weeks. In any case, the range of time of lost data is much smaller when going back week by week than it would be by going back month by month.
But, as you have probably figured out by now, there's a price to pay for this additional safety: more disc space usage on your backup server (hopefully not the one with the faulty disc). To keep this additional disc space as low as possible the --weeknumber switch has another side-effect: the first-of-month is handled like any other day, thus disabling the monthly full backups which wouldn't make much sense anyway since we've got all the sunday full backups already.
So, in short, usually you'll get up to 13 full backups per directory (12 first-of-month files and the last sundays weekly backup). With the --weeknumber switch instead you'll get up to 52 full backups (one for every sunday in a year), IOW: four times the usual number. That's the reason why the --weeknumber option has to given explicitely on the commandline. – Of course, you (or Miss Ruth) could deal with the disc space issue e.g. by the tmpwatch utility (see
man 8 tmpwatch for details) ot tmpreaper (see
man 8 tmpreaper) to automatically remove backups older than, say, 100 days (actually tmpwatch uses a hours argument but I presume you know the formula for converting days to hours), or you could use find (see
man 1 find).
About cronjob times: If you happen to live in an area with daylight saving (summertime / wintertime) you should make sure, that the chosen time (03:30 above) does not fall into that special period which passes twice (summer to winter) or not at all (winter to summer). In e.g. Europe that are the hours between 01:00-03:00 (I guess, elsewhere it will be similar).
As an alternative to setting up a personal cronjob like the one shown above you could ask Miss Ruth to put a script like the following in the systems
#!/bin/sh # remove backups older than three weeks: /usr/sbin/tmpwatch --nodirs --fuser --mtime 504 /path/to/backups/ # backup several system and user directories: /opt/bin/incBackup -b /path/to/backups -d /etc /root /var/www/html \ /home/users/* /home/helpers/* /home/admins/* /whatever/else #EoF
The various user directories, you see, are given separately instead of saying just
/home. The latter works well, of course. It would, however, create one huge backup file containing the data of all the subdirectories under
/home (i.e. the user home dirs). On the other hand, giving the directory names as a wildcard (with
/bin/bash expanding it to a number of names) will result in separate backup files for each user home directory. That's what you'd most probably want, isn't it?
As you can see, Miss Ruth keeps her backup archives for at most 504 hours (guess, how many days that is). I'd say, one could even reduce this to ~300 hours: Someone who – within two weeks – does neither notice that something's wrong with his/her data nor copied the backup files to somewhere else for safety suffers not only technical faults but, has some, er, serious, say, mental problems as well.
To have a backup permanently available, it's not enough to just store it on some other machine: What, if that one crashes as well? So it's rather likely, that you will have to do something with the backup archives. Storing to tape or burning on CD comes to mind. Whatever you're up to, you should watch the file sizes of the produced (full i.e. monthly/weekly) backup archives and by chance adjust the directories (passed with the --dirs argument) so that the backups won't become larger than whatever may fit on your chosen storage media. (And see the section about Exclusions below as well.)
Let's assume a CDROM (i.e. ~600MB) for permanent saving of your archived data. Considering most of the files in your home directory are (kind of) text and thus compressable with a good ratio (say ~60%) this numbers would mean that about 1GB of (raw) data would fit into a backup archive writable on CD. Of course, this highly depends on the files actually stored in your directories. If there a lots of, say, sound or video files that can't be compressed much more, you might even end up with a 1:1 ratio. Redesigning your directory structure and calling this script not only once for the whole structure but for several sub-directories instead, combined with well chosen Exclusions (see below) might help in such cases. – Anyway, at least you've got the idea, didn't you?
Just to summarize it finally: The usual call would be
incBackup -b /destination/dir -d /some/dir/to/save
with upto 12 monthly full backups per year, one weekly full backup as of last sunday and six daily incremental files with changes sunday-to-weekday. The least safe (and least disc space using) call would look like
incBackup -b /destination/dir -s -T -d /some/dir/to/save
with only one monthly full backup per year, one weekly full backup and six daily incremental files with changes day-by-day. In contrast the most safe (and most disc space consuming) call would look like
incBackup -b /destination/dir -w -d /some/dir/to/save
with upto 52 weekly full backups per year and six daily incremental files (holding changes sunday-to-weekday). In between would be calls like
incBackup -b destination/dir -T -d /some/dir/to/save (12 monthly, 1 weekly full backup, six day-by-day incrementals) or
incBackup -b destination/dir -T -w -d /some/dir/to/save (52 weekly full backups, six day-by-day incrementals; this is, in fact, the way I use in my personal crontab to backup my home-dir). – It's completely up to you to use those options that match best with your very personal balance of comfort, disc usage and safety.
[up to Table of Contents]
As mentioned (or indicated) before there are some commandline switches in both short and long variants which must ([m]andatory) or may ([o]ptional) be used. The long option names may be shortened to at least three letters; the phrase prints out means, the output is written to
stdout (i.e. usually your display/screen) so that you can redirect it to a file or pipe through some other tool (e.g.
lpr). – In alphabetical order:
bashdoes not depend on indentation), the --html and --info options, of course, won't print anything (all the fine docs are gone!), and it will give you a hard time to read and understand the source (since all the structure/indentation and comments are gone as well). The only benefit of such an operation is the size reduction from ~40KB to ~6KB (lines from ~970 to ~250) and hence a faster startup theoretically. So unless you're really very short of disc space, just ignore this option. – I once needed it for an embedded device of a client where space was rare, and left it here in case someone else might find it useful. (See also the --gzexe option below.)
incBackup -F -Z) which results in a compressed script of ~15KB in size. Please note that in either case the compressed script relies on gzip and some other utilities (tail, chmod, rm) to run.
To make a long story, er, list short: During normal usage you'll only need the -b and -d options (and possibly -w or -T). Everything else seems luxury or may be potentially dangerous (i.e. less safe) in one aspect or another. As kind of additional hint the short commandline switches supposed to be given less frequently (or never at all) are using uppercase letters.
[up to Table of Contents]
If a directory given with the --dirs option contains (at its relative root) a dot-file named
.nobackup it is assumed (w/o any testing) suitable to get passed to tar as a list of patterns for files not (repeat: not) to save (see
info tar for more details about this). For an users home directory such an exclusion file might look like:
*~ *.bak */cache/* */Cache/* */cache?/* */devel/*.o
This suppresses backup files made by some editors as well as files in cache-directories (e.g. of web-browsers or news-readers); the last line says that we don't want a backup of the object files in our personal developement area (after all, we have the sources there). – Generally speaking, you should put patterns for all those files into the exclusions-file that are temporary in one way or another, or can be restored (possibly better) from other sources such as install discs. Following this rule keeps your backup archives as small and sensible as possible.
If no such
.nobackup file exists, however, everything gets stored in the backup archive regardless whether it makes sense or not and provided there's enough space in the chosen destination directory. You should consider carefully, what to exclude; the example above should give you an idea, which kind of files would only waste backup space without any real benefit in case of a desaster.
[up to Table of Contents]
tar gets called with the --one-file-system option causing it to not backup files and directories on other filesystems than the one the start directory resides on. This might look like a drawback since the backup archive might seem to be incomplete (compared to the real directory structure in-use). But it allows for e.g. sym-linked directories from other filesystems to be part of the directory structure without having files therein saved twice (once as part of your backup and another time when the original directory is saved). And for the cases where real mounting points happen to be within the directory to backup: Just call this script with a --dirs argument naming that very mount point and you're done.
Another option passed to tar is the --ignore-failed-read switch which makes sure that the backup archive gets created even in cases an unreadable file is encountered. This way you'll have at least all readable files saved instead of none at all.
For storing the time of the very last full backup a (hidden) 0-byte dot-file is maintained with a name of
.Host.sub.dir.lasttime. This file (i.e. its last-modified date) is used by tar to determine which files should be put into the usual incremental backup archive. Removing that file would result in a complete/full backup the next time this script is run. See the --force commandline argument above for a better way of initializing a full backup. – In case the --trueincremental switch is given, the date of the last backup before the current day (i.e. usually yesterdays file) is used to check for (and backup) files newer than that. If no such yesterday (or day before yesterday) file could be found, the date of the last full backup is used, and if that one doesn't exist either, actually a new full backup is made (although the filename remains that of an incremental backup). You don't have to worry about this, I explained it just to ensure you, that your backups will be saved even if someone by mistake deleted one or the other file.
Over the years of usage those backup files not meant for storage on CD became bigger and bigger eventually crossing the 2GB border, resulting in corrupted (artificially truncated thus incomplete) archives. Since I hadn't that much time to investigate and after experimenting with switching between NFS- and SMB-mounted destination directories, I finally kind of, er, "resolved" the problem by replacing the tar commandline argument -f filename by output redirection. This in consequence means, that neither tar nor gzip will ever lsearch() (which is limited to 2GB) but only write to stdout, leaving the issue completely to the host/OS receiving the data stream, where the 2GB border may be crossed during write() calls but without lsearch(). – Of course I'm well aware that this "solution" is just a workaround (and far from bullet-proof) until all userspace programms (including tar etc.) and filesystems (e.g. SMB and NFS) are fixed to fully use 64-bit addressing shemes. But hey, as long as the hack works, why not use it? – And again: it's better, of course, to keep the backups smaller anyway (see the discussion on Archive Files above).
[up to Table of Contents]
The following utilities are used by this script and are assumed to be reachable in PATH (
/sbin:/bin:/usr/sbin:/usr/bin:/opt/bin by default):
[up to Table of Contents]
$Log: incBackup,v $ Revision 1.8 2007/11/18 08:35:13 matthias * added mentioning of "tmpreaper"; # fixed a "sed" expression; Revision 1.7 2007/06/05 07:30:26 matthias + implemented use of nice tool; * in case of missing (removed) time flag file a weekly backup is done possibly overwriting the last weekly backup file; * the time flag file is written by echo redirection to avoid problems with shells not implementing the redir operator w/o command; Revision 1.6 2004/08/09 11:38:05 matthias * added a test for the last-time flag for cases where it was removed by accident (a complete/weekly backup is made in this case); * updated sed scripts and CSS; Revision 1.5 2003/01/06 20:50:07 matthias + added -V|-W|-Z options; # fixed a problem when 1st'o'month was linked to daily instead of weekly archive file if that day was a sunday; * updated/enhanced docs, especially HTML output (which got a linked ToC); Revision 1.4 2002/09/02 15:04:43 matthias + added/implemented -T option (incl. docs); Revision 1.3 2001/01/26 18:30:05 matthias + added -H|-I|-S options and a lot more documentation; * tar now uses output redirection (instead of --file) to avoid (?) possible problems with archives gt 2GB; Revision 1.2 1999/09/12 20:13:53 matthias # modified screen output on --help; Revision 1.1 1997/03/14 22:02:32 matthias + (long delayed) initial CVS checkin;
[up to Table of Contents]
*) In case you're wondering who that famous Miss Ruth might be: In German the firstname Ruth and the English word root sound exactly the same.
Disclaimer: No bits or bytes were harmed and no harddisk destroyed in order to create this page.
All letters and digits on this page are strictly virtual and
any resemblance to real letters or digits – monospaced, serif or sans-serif – is purely coincidental.