Explaining Compression, Encryption, and Archive

Venn diagram with algorithmsOne of the hardest videos I’ve done recently for ScreenCasts Online is the one just published for an app called KekaKeka is a sweet little $2 app designed to compress and expand files.  Keka itself is simple and easy to use, but understanding the technologies behind it was where the real effort came in.

In researching this project, I felt like my dad when he would try to build something in the garage. He would start a project, but then realize he had to fix a tool to do the project, but then the bench the tool was on had a wobbly leg, and then he realized that concrete was actually messed up under the bench. This was one of those projects. I thought it might be informative to teach you what I learned behind the scenes in order to explain Keka.

There are three basic concepts involved and it was trying to learn the differences where things got interesting. There’s archive, compression, and encryption. I’m sure you’ve heard of file formats like gzip, zip, tar, ISO and DMG, but did you know that they don’t all perform all three of these functions? Let’s walk through each concept and talk about the differences.

Archive

An archive is a set of files or a folder that has been packaged up into a single file. By definition, it’s not necessarily compressed, nor is it by definition encrypted. It’s just piled all together into that single file.

In the old days, we used to back up files onto tape, and thus tape archive, aka tar, was born. tar is nothing more than the files bundled together, with no compression, no encryption, so it’s a true archive format.

Another archive format with which you might be familiar is ISO. When you install a virtual machine on your Mac, Linux or Windows PC, you’ll often get the new operating system as an ISO file. ISO files look like a CD or DVD to the virtual machine, but to you, they look like a single file. Again, one file that is neither compressed nor encrypted.

Compression

Compressed files are just what it says on the tin: they have been squashed down to take up less space. You can compress a single file and it won’t be an archive and you don’t have to encrypt it. Some files compress better than others. For example documents, text files, bitmap images, certain audio and video formats such as WAV and MPEG compress very well.

Other file types such as JPEG images and MP3 audio files do not compress at all well because they have already been compressed. Compressing them again can actually make the files increase in size.

Two examples of compression formats are Gzip and the more modern and efficient bzip2.

Compression and Archive

Compression alone isn’t that useful, so it’s often combined with archive. Let’s say you have a pile of files and you want to package them up and make the resulting file smaller. A common way to do this is to apply both algorithms together.

To archive the files together, we often use our friend tar and then compress with something like Gzip or bzip2. When both compression and archive are applied together, the resulting file gets an extension with a combination of the letters.  tar + Gzip = tgz and tar + bzip2 = tbz2.

We talked about ISO as an archive format, and it doesn’t actually get to play in the compression or encryption game.

Encryption

The final piece of the puzzle is encryption. This is pretty obvious but it’s where you password protect the file from being opened (hopefully using the latest AES-256 encryption) and in some formats, you can even encrypt the names of the files within the resulting file.

I studied three algorithms for encryption, zip, 7-Zip and our old friend DMG. It turns out that these three formats can do all three things – they can archive, compress and encrypt. They don’t need to be combined with another algorithm to do all three. Encryption isn’t mandatory with these algorithms, but you can encrypt with them all. Note that Gzip, bzip2, tar, and ISO do not support encryption.

But there are so many more

Lest you think that I’ve explored every algorithm and defined them all here, let me set your mind at ease. There are positively pages and pages of “standards” (please note the quotes) written up in Wikipedia under archive, compression, and encryption (example: Comparison of archive formats on Wikipedia).

The contributing authors have valiantly tried to explain which technologies only do archive, which only do compression, and which ones do some of both. I spent days lost in the mire reading these pages and trying to sort these things out!

If you’d like to see this all laid out in a nice video form, and then learn about the super secret hidden app on your Mac called Archive Utility and learn about how janky it really is, and then learn how easy Keka makes your life for only $2, check out ScreenCasts Online show #745. ScreenCasts Online is a subscription tutorial service but you can get a free trial and watch this show and the entire back archive (not that kind of archive).

Leave a Reply

Your email address will not be published. Required fields are marked *

Scroll to top