Comparing Linux/UNIX Binary Package Formats

This is a comparison of the deb, rpm, tgz, slp, and pkg package formats, as used in the Debian, Red Hat, Slackware, and Stampede linux distributions respectively (pkg is the SVr4 package format, used in Solaris). I've had some experience with each of the package formats, both building packages, and later in my work on the Alien package conversion program.

I've tried to keep this comparison unbiased, however for the record, I'm a fan of the deb format, and a Debian developer. If you discover any bias or inaccuracy in this comparison, or any important features of a package format I have left out, please mail me so I can correct it. Several people have already done so. I'm also looking for data to fill in the places marked by `?'.

This comparison deals only with the package formats, not with the various tools (dpkg, rpm, etc.), that are used to deal with and install the packages. It also does not deal with source packages, only binary packages.

Package format comparison table.

featuredebrpmtgzslppkg
Security, authentication, and verification
signed packagesyes[1]yesnonono
checksumsyesyesnonoyes
permissions, owners, etcyesyesyesyesyes
Usability by standard linux tools
recognizable by fileyesyesnonoyes
data unpackable by standard toolsyes [3] no [4] yesyes [5] no [6]
metadata accessible by standard toolsyesnoN/Anono [7]
creatable by standard toolsyesnoyesnono
Metadata
dependenciesyesyesnoyesyes
recommendationsyesnononono
suggestionsyesnononono
conflictsyesyesnoyesyes
virtual packages and providesyesyesno??no
versioned dependencies and conflictsyesyesno??yes
boolean package relationshipssyesno [9] nonono
file dependenciesnoyesnonono
copyright infono [11] yesnoyesyes
groupingyesyesnonoyes
priorityyesnonoyesno
Special files
config filesyesyesnoyesyes
documentation filesnoyesnonoyes [12]
ghost filesnoyesnonono
Package programs
binary programs allowedyesno??yesno
pre-install programyesyesno [13] noyes
post-install programyesyesyesyesyes
pre-remove programyesyesno [13] noyes
post-remove programyesyesyes [13] noyes
verify programnoyesnonono
triggersnoyesnonono
Scalability
no hard-coded limitsyesyes [14] yesnono [7]
new metadatayesyes [15] N/Anono [7]
new sectionyesnononono [7]
format version datayesyesnoyesno [7]

What is compared.

Security, authentication, and verification.

This section deals with ensuring that you know who created the package, and that you can check the package installed on your system to see if the files in it have ben modified since you installed it.
signed packages
Does the package format contain internal support for a GPG or PGP signature that can be used to verify who created it?
checksums
Are checksums available for all the files in the package?
permissions, owners, etc
Is information on the files in the package, their proper permissions, sizes, owners, groups, major and minor number (for devices), etc, available?

Usability by standard linux tools.

Recognising that it's important sometimes to be able to peer inside packages without using their package managers, this section compares how the various packages can be processed with tools available on any linux system [2].
recognizable by file
Is the package format able to be recognized by file?
data unpackable by standard tools
Can an experienced user, when presented with a package in this format, extract its payload using only tools that will be on any linux system? They can remember a few facts to help them deal with the format, but remembering file offsets and stuff like that is too hard.
metadata accessible by standard tools
If the package has some sort of metadata (ie, package name, description, version) contained in it, can this data be accessed by standard tools, without too much difficulty?
creatable by standard tools
Can a package be created using standard tools, without too much difficulty?

Metadata.

Metadata is my term for the information about a package contained in the package. This includes things like the package name, description, and version number.
dependencies
A dependency says a package needs another package to be installed for the first package to work properly.
recommendations
A recommendation says a package will almost always need to have another package installed.
suggestions
A suggestion says a package may sometimes work better if another package is installed. The user can just be informed of this as a FYI.
conflicts
A conflict is a package that cannot be installed when this package is installed. One common reason is if the two packages both contain the same files.
virtual packages and provides
This means that there are so called "virtual packages", such as a web browser, or a mail delivery system, and packages can say they provide those virtual packages, while other packages can depend on the virtual packages.
versioned dependencies and conflicts
A package can depend on or conflict with (or recommend, etc.), a specific version of a package, or all versions > or < a given version.
boolean package relationshipss
This means that a package can depend, conflict, etc on a package AND (another package OR a third package). Any boolean expression must be representable, no matter how complex. [8]
file dependencies
This means a package can require that some other package - any other package - be installed that contains a given file (like /bin/sh) [10].
copyright info
The package's metadata contains basic copyright information. This is useful for automatic copyright sorting, etc.
grouping
The package can be assigned to a group (ie, web browsers, libraries), which might be used to group the packages when viewing a list of available packages, etc. This makes it easier to deal with large groups of packages.
priority
The package can be assigned a priority, which says how important this package is to the system. For example, packages with high priority should be looked at carefully when you are setting up a system, but you can skip installing all the packages with low priority and still know you'll still get a functional unix system.

Special files.

The ability to categorize files depending on what they are used for, so they can be dealt with in special ways.
config files
Are config files supported? These are files that the user will typically want to edit, so when a new version of a package is installed, the package manager should be able to know to leave them alone, or do something smart like prompt the user for what to do if they have modified the files, or at least make backups of the user's changes before overwriting them. (Maybe I need more granularity here?)
documentation files
Can documentation files be specially marked? This could be useful to help a user find documentation.
ghost files
Ghost files are files that are not actually present in the package, but are listed as being a part of it once the package is installed. This is useful for log files.

Package programs.

These are programs that are contained in the package, to be run by the package manager when the package is installed, or uninstalled, or at other times.
binary programs allowed
Must these programs be scripts, or can compiled binaries be used as well?
pre-install program
A program to be run by the package manager before the package is installed on the system.
post-install program
A program to be run by the package manager after the package is installed on the system.
pre-remove program
A program to be run by the package manager before the package is removed.
post-remove program
A program to be run by the package manager after the package is removed.
verify program
A program to be run by the package manager when the state of the installed package is being verified.
triggers
This is a whole set of programs, that are run not when this package changes state, but when another package changes state.

Scalability.

How well the package format is able to grow to meet future needs. This is of great importance. Many of the comparisons above have little value in the face of this section, because new package programs, new metadata fields, etc can all be added to a scalable package format with little difficulty.
no hard-coded limits
Are there no limits hard-coded into the package format, that might prevent it from expanding to meet future needs? For example, are package names or versions of unlimited size?
new metadata
Can new information (text, binary data, whatever) be added to the metadata easily, without changing the package format?
new section
Can the whole new sections be added to the packages, without changing the package format? For example, could the package format be expanded to have a pgp signature attached at the end, or to have a second set of data files, compiled for a different architecture or with different optimizations, attached the end? This is the ultimate test of how flexible the format is, I'm basically asking, was it designed to cope with unforeseen new requirements?
format version data
Is there some way to look at a package and tell which version of the package format it is using? In extreme cases, this means, the whole package format can be thrown out and redesigned but old tools will still be able to read enough of the packages to know they can't deal with them.

Todo.

Footnotes.

1. Not yet widely used though.
2. Why standard linux tools, not unix tools in general? It's been pointed out that eg, gzip is not at all standard on all the unix systems out there.
3. The admin would only have to remember that a deb is an ar archive, containing some tarballs.
4. rpm2cpio can do it, but it's not a standard tool, except on rpm-based systems. Some fairly short programs can do it, but none of them are something you'd want to memorize.
5. Assuming that bunzip2 is a standard linux tool, or that the package uses gzip compression instead. You need only remember that the package starts with its payload; the metadata is tacked on the end and will be ignored.
6. Most repositories use a specific "datastream" format, while some others simply use tarballs. In the case of tarballs, yes. For the datastream format, a pkgtrans program is available on systems using the pkg format, but not quite standard enough for the purposes of this question.
7. Most repositories use a specific "datastream" format, while some others simply use tarballs. In the case of tarballs, yes.
8. Though you might have to do some factoring.
9. An rpm may depend on a list of packages, but boolean OR is not supported. You can often get the same effect using virtual packages and provides. This isn't quite the same, since it does require more coordination between packagers, and the following relationship cannot be expressed with provides: foo (<< 1.1) | foo (>> 2.0)
10. Some people consider file dependancies a gross misfeature.
11. Copyright info is included in debian packages, but not in an easily extractable format.
12. Fields exist, but there is no standard way to use them.
13. Supported by a version of this package format used at one time by SuSE Linux.
14. Technically, the rpm "lead" contains hard-coded limits on the package name, but the lead is no longer really used by anything except file.
15. To be useful, you need to get a tag number assigned to your new piece of metadata, which implies modifying the rpm program.

Copyright 1998-2003 by Joey Hess under the terms of the GNU GPL, either version 2 or at your option, any later version.
Last modified at Mon Jul 21 09:50:01 2003; generated from this source XML by this program.