For the past year or so we’ve made an effort here at Made by Many to start embracing more tried and tested approaches to releasing software. After having a long debate around what toolchain we’d like to use, Debian packaging was suggested as a possible unit of deployment. It’s a platform that’s been used for ages, is well tested, and Ubuntu, our operating system of choice, uses it by default.
When we took a deeper look into what a .deb file actually does when installed and found it had some properties that we really liked. It:
- bundles files into an archive to unpack on the filesystem directly
- allows you to execute scripts pre- and post-install
- allows you to declare dependencies that must be satisfied before said package can be installed
- has an eventing system so you can have packages listen for updates to other packages and react to them during install
- supports multiple delivery protocols by default: HTTPS, HTTP, FTP, SSH
- allows you to sign files against a GPG key for verification.
So this train of thought started me on my journey understanding the dark world of Debian packaging. After consulting the Debian packaging policy (which I have to say is quite a read), I realised that our objectives are different from package maintainers, so there’s a lot that isn’t really applicable to the way we work.
What I needed to do was glean the bits we wanted from this tome, and build a pipeline from it. We currently aren’t interested in maintaining OS packages, we just want to get stuff done in a testable, repeatable, more atomic way.
I’m going to share with you the things I’ve learnt whilst on this journey of discovery, and hopefully remove some stigma around packaging being hard. Along the way I’ll introduce you to some tools and techniques that will hopefully help you understand the process, and help you package up your own software more easily. So without further delay, lets get started with the first part of the puzzle. What a package actually looks like, as understanding the format is the first hurdle!
If all a package consists of is files then we should be able to unpack one right? Interestingly enough, a .deb file isn’t actually that special. Think of it as a glorified tarball, and, like tarballs, we have some tools to help us manipulate them. One of these tools is the not-too-unfamiliar apt-get: yes, the command you use to install stuff. But it’s not exclusively used for installation, it can do so much more! One thing I use it for is interrogating system packages, as I find it’s good to look at how other people package software as it serves as a good platform to base your packages from.
Note: apt-get always requires elevated privileges, as it’s doing very root-based things!
So first, let’s download the default redis-server package from the Ubuntu repositories so we can take a look inside. I’m currently using Ubuntu 14.04, and as of today the version of redis-server in the default repositories is 2.8.4. Run the following command.
apt-get download redis-server
This will put a downloaded version of the redis-server package into your current working directory; this is the package that would be unpacked onto your system when installed through apt-get, if you were to run the following:
apt-get install -y redis-server
So now that we’ve got this downloaded version, what we want to do is take a look inside. Whilst apt-get is a great tool for managing packages, you can’t do anything with it in terms of prying them open. But fear not, in the world of UNIX there must be “a tool for that”… and sure enough there is. It’s called dpkg.
As quick intro to dpkg, it’s actually used under the hood by apt-get to do the package extraction, so think of them as complementary rather than competing tools. apt-get handles the higher level dependency management whilst dpkg is more low level grunt worker. So lets unpack the contents of the redis-server package.
dpkg -x redis-server_2%3a2.8.4-2_amd64.deb redis
This will -x (for extract) the file system contents of package into a directory called redis. I’ve highlighted the above, in terms of file system as surely if we we’re “just” unpacking files, we’d just use tar as a format. If you change directory into the newly created redis folder you’ll see some familiar faces.
If you think they look like folders from the root of a Ubuntu filesystem, you’d be right! When dpkg/apt-get installs something they use the output of dpkg -x and place it directly into the root of the Ubuntu filesystem. Now that’s useful! This is also why apt-get needs to run as root, as you can alter anything in the root file system space.
If you change directory into /usr/bin, you’ll see the compiled version of redis-server available. You’ll also see that it already has executable permissions. File permissions are preserved from pack-up to extraction, but user permissions will persist only if the user exists on both systems – the system it was packed up on, and the system is was unpacked to. As you’d expect, the Redis config is placed in /etc/redis/ and data is placed in /var in the forms of log and lib directories.
So nothing too out of the ordinary with that command. But now we get onto the more useful part of the Debian package format. Every Debian package has a special folder packed into the root of the archive. So if dpkg -x was to extract all the contents rather than just file system folders you’d see an extra directory in that unpacked “redis” folder. This is the all important DEBIAN folder (all caps), and this houses all the package metadata. This is the reason we just don’t use tar as a format as we can do extra stuff in here! To extract this special folder we need the run the following command.
dpkg -e redis-server_2%3a2.8.4-2_amd64.deb redis/DEBIAN
This will extract all that package information contained in the DEBIAN package directory into a folder called DEBIAN inside our unpacked directory redis. So once this command completes lets go take a look at this special directory.
Now this looks a little more interesting, if you list the contents of this directory you should see the following files.
Now here are some of the files you can include in a package that can do some magic apt-get stuff. It’s not the full list, but it’s enough to install a package, and add some additional logic on install.
The most important one in this directory is the control file. This houses all the package metadata, like what it’s called, what dependencies it has, what CPU architecture it can be installed on etc. Let's take a look:
So let’s explain some of these fields in a bit more detail.
Package: The name used to uniquely identify the package. Make sure you don’t clash with existing package names! When you apt-get install my-package, the name will be looked up in this field.
Source: This is a version of the package’s source code, this means you can build this package from source as well as installing precompiled binaries. We do not use source packages when packaging up Rails projects, so we don’t use it.
Version: The current version of your package. It’s worth checking out the policy manual for version numbers, as they aren’t just lexicographically sorted.
Architecture: the target architecture this package has been built for. If it has compiled code then the architecture needs to be specified, as the compiler targets the chip architecture. Common values are i386, amd64, all, any, and if a source package is specified, source.
Depends: This is where specify which packages this package depends on, you can tie it to specific versions, or a version greater than X. Dependencies will be installed before the package is installed. In the redis-server package libc6 (>= 2.14) specifies that libc6 must be greater than or equal to 2.14.
Section: This is the “category” for the software, mostly this is defaulted to main, but somethings like “security”, “database” etc. are specified here.
Priority: This is used for setting the priority of the package, this is more applicable for OS packages that have really high priority, aka the OS wouldn’t function without them. Generally you would use “optional” for this, unless you’re software is supercritical.
Homepage: The project page URL, where to get info on the package. This should be a URL.
Description: A description explaining the package, if you want line breaks to appear where querying information through apt-cache, you need to replace a line break with a “.” on a single line.
There are other fields you can add to this manifest file, full documentation can be found here.
Here’s an explanation of the other files located in the redis-package
This file tells dpkg that you don’t with to replace these files on every install/upgrade. So if you would like a file to persist between upgrades you should add its name to this file.
This is executed after the package is installed/upgraded, the first argument is the type of “install” that is happening. This is probably the most useful file, as it’s parameterised when called. The first parameter passed to it is the type of install this is. This has values you can switch on to execute different logic under different conditions. These are:
Here’s an example bash script that uses a switch case to handle different conditions.
So that script would be executed once the package is installed, and would be passed various parameters under various conditions.
Preinstall is fired before the package is installed. Therefore the logic should be common to all installation types.
Before you remove something you may need to stop some services etc. This would be a good place to do that.
You may need to clean up some files/directories that were made in the postinst script. This is the best place to do this as a process might have a lock on them.
Called after the package is configured, you may need to start/restart some services.
For a full description on how these scripts are called and what parameters are passed to them under installation/failure look here. It’s pretty much a giant state machine around the state of the package.
These four scripts allow you a lot of flexibility on how the package is installed/configured. When placing these scripts into a packages “DEBIAN” directory you must make sure they’re executable otherwise they will be ignored by “dpkg”. There are also failure conditions which allow you handle the case of partially installed completely failed installs.
A list of files to their hashes. This verifies the package contents doesn’t change upon install. Though MD5 is a bit old-hat, it’s good enough for checking a bit hasn’t been flipped by those pesky cosmic rays.
When we package things up here we mostly use the control and postinst files. You can also include a changelog, where you could list the differences between package versions. Details on the changelog format can be found here.
Now we’ve taken a closer look at this directory we can now do the inverse of what we’ve just done and repack the redis-server package back into a workable “.deb” package. If you change directory back to your working directory, where you originally downloaded the package we can run the following command.
dpkg-deb --build redis redis-server.deb
This will create a new .deb file called redis-server.deb from the files located in the redis directory. If you had a package repository to push this to then you could install this on another compatible machine that has said repository configured.
So hopefully this gives you a good basis on how to unpack and interrogate existing Debian packages, and an explanation of the internals of a package and how they’re installed.
In the next post I’ll explain how you can get this package into a repository, and how Debian repositories work with the magic of “apt-get update” to make your software installable on Ubuntu.
I hope you found this useful!
When you open source a project, you might be surprised by some of the uses people find for it.
This is the first of a series of blogs about using Swift, Apple's new programming language that was annouced at WWDC 2014. They are not in any particular ...