Why Patch files matter to web developers... and how to use them.

When developing web services and their related web sites, transfering code is easy. It is normally handled via git which automatically creates patches behind the scenes making the process fast and efficient.

However, few websites are complete without substantial collections of images (file sets) that are often, quite reasonably, separate from the code repository. Sometimes you need to make lots of minor changes to these collections and the process may involve several very large transfers as the image collection gets first downloaded and then re-uploaded with the changes. Judicious use of rsync can mitigate this to some extent but where file ownership or other elements change, rsync often gets confused and does little better than the tried and trusted process of creating a tarball and transfering the collection by hand.

I generally try to get the relevant files/folders made into a git repo but sometimes this is not possible (git slows down with fifty thousand plus scanned book files, for example) so I'll assume a git repo is not an option here.

If we are working on the file set for the first time, create a tarball or rsync to get a local copy. Once you have a local copy (`~/original`), make a copy of it (`~/updated`) to work on (that would be a git branch normally, of course).

For fun, I'll just create an original with some files in it:

~ $ mkdir -p original/a/aa
~ $ mkdir -p original/b
~ $ echo 'original content' > original/a/aa/foo
~ $ echo 'original content' > original/b/bar

When you finish work on `~/updated`, create the patch:

Here's some fictitious work on updated:

~ $ cp -R original updated
~ $ rm updated/b/bar
~ $ echo 'more content' >> updated/a/aa/foo
~ $ echo 'new file' > original/b/mumble
~ $ mkdir -p updated/c/cc
~ $ echo 'new file in new folder' > updated/c/cc/grumble

Now we make our patch. This will be a nice small file, easily transferred back to the server.

~ $ diff -urNa original updated > my_patch

Diff options (these are probably all you'll need)

  • -u output NUM (default 3) lines of unified context.
  • -r recursively compare any subdirectories found
  • -N treat absent files as empty
  • -a treat all files as text (don't pipe the output to the screen when using this ;)

The -a switch is important if you are patching anything involving binaries (such as file sets containing images).

To test the patch we'll create a copy of the original (just in case) and patch it:

~ $ cp -R original target
~ $ patch -p1 -d target -i ../my_patch
patching file a/aa/foo
patching file b/bar
patching file c/cc/grumble

Let's see how it did:

~ $ diff -urNa updated target
~ $ diff -urNa original target
~ $ diff -urNa original/a/aa/foo target/a/aa/foo
--- original/a/aa/foo 2014-09-16 15:57:49.124409329 +1000
+++ target/a/aa/foo 2014-09-16 16:30:56.984435375 +1000
@@ -1 +1,2 @@
original content
+more content
diff -urNa original/b/bar target/b/bar
--- original/b/bar 2014-09-16 15:58:01.700409494 +1000
+++ target/b/bar 1970-01-01 10:00:00.000000000 +1000
@@ -1 +0,0 @@
-original content
diff -urNa original/c/cc/grumble target/c/cc/grumble
--- original/c/cc/grumble 1970-01-01 10:00:00.000000000 +1000
+++ target/c/cc/grumble 2014-09-16 16:30:56.984435375 +1000
@@ -0,0 +1 @@
+new file in new folder

Perfect. The usual gotcha is the 'p' parameter on the patch. Use the --dry-run option if worried and look at the file names in the patch. Use p +1 for each non required parent folder associated with the files.

The result of all this... That 256Gb file transfer only ever needs doing once. Patching thereafter has you just transfering changes. This can save heaps of time... although I still aim to turn anything where this is going on into a git repo first, if at all possible - git makes life so much easier by doing all the patching for you.

If others on the server are changing the file set (or perhaps the service changes it). Make sure you copy the original on the server before transferring it locally. That way you can make patches in future to transfer the updates others have made.


Comments

There are currently no comments

New Comment

required

required (not published)

optional

Australia: 07 3103 2894

International: +61 410 545 357

Feeds

RSS / Atom