Packaging RFC

Overview

There are many different people and organizations trying to use Globus (and succeeding).  The Globus project wants to satisfy the needs of its users, but the reality is that different communities have different needs. Ideally, a tailored distribution could be constructed to meet the specific needs of an individual community.   Right now, Globus' ability to be easily customized is minimal, and difficult to work with, at best.  The goal of the packaging effort is to construct a framework and a set of Globus packages which can be used to create tailored Globus distributions that fit the specific needs of the organizations using Globus.  Another goal of the packaging effort is to assist the process of developing the Globus toolkit, by allowing individual pieces of what is now a monolithic Globus distribution to be released on separate schedules.

The current monolithic Globus toolkit which consists of many components is inflexible in that it is difficult to distribute subsets of Globus. To alleviate this problem the Globus components will be divided into packages. Using the new packaging framework, it will be possible for organizations to construct both source and binary distributions of the selected packages they are interested in. No longer will users be forced to build and configure components which are of no interest to them.

Under the new packaging framework, it will be possible to release updates to an individual component without necessarily releasing updates to all other components. Additionally, an individual component can be built and tested without configuring and building unrelated components. Thus the packaging framework substantially increases the efficiency of the development and release process.

This paper is an attempt to lay out the design of a packaging framework that can be used to enable the packaging of Globus components. We will introduce packaging concepts, and discuss strategies and policies for dividing large pieces of software into logical packages (think Globus, but what we say is applicable in general). We will describe the different types of binary packages, as well as the distinctions between source packages and binary packages. We will also describe the mechanism by which the packaging system manages complex build environments, and what it means to have a Globus installation in the context of the packaging system. Finally, we will describe the metadata needed by the packaging system, for each package type, to allow the system to effectively manage the packages.

Packaging Concepts

We plan to provide a simple portable framework for creating and managing Globus packages. Here is a list of the features that will be supported.

Install / Uninstall / Upgrade

The installation management of binary packages is the responsibility of a tool (or tools, collectively) known as the Package Manger. The Package Manager is the interface through which the person installing a package will install, uninstall, or upgrade it. An example of a package manager that is probably known to many of our readers is RPM, the RedHat Package Manager. On a RedHat Linux system, one typically uses RPM to install, uninstall, or upgrade any binary package.

Dependency tracking

Versioning

Whenever more than one separately released pieces of software need to interact, a need arises to ensure that the software, as installed, is interoperable. The time-tested way of doing this is to assign a version number to each release of each piece of software, so that any given version is readily identifiable. This allows the encoding of version numbers into package dependencies, as you may know that your package foo 2.11 requires package bar 2.0 or greater.

Flavored binaries

Certain compile time options used when creating a library, must also be used when a program is linked that uses that library. Otherwise, linking errors will occur. For example, it can be important that the same compiler be used, or that the same threads package be used. Additionally, it is very important whether or not we are compiling 32 bit or 64 bit. We refer to such sets of compile time options as flavors.

Relocatable binaries

It is important for ease of installation that the binaries not be tied to specific directories. (i.e. a program should not insist on being installed in /usr/local/bin). However, given large sets of inter-package dependencies, especially with scripts calling programs, it is a relatively hard problem to enable the easy installation of packages into totally arbitrary locations. A reasonable compromise, which we make for Globus Toolkit packaging, is to insist that all packages comprising a given installation be installed into a single installation tree, but that tree can be rooted anywhere it is desired.

External programs and libraries

When distributing any software onto systems where the underlying operating system is not packaged using the same packaging system (i.e. every system onto which Globus is installed, unless someone makes a Linux or BSD distribution using our packaging system sometime in the future), there will be programs and libraries that do not have packaging system metadata associated with them. There are, in general, two ways of dealing with such external programs and libraries. One is to make special exceptions for them in the tools that check dependencies. The other, which is by no means mutually exclusive with the first, is to create "virtual packages" which consist of appropriate metadata for the external "package", and, if necessary, links from the install tree to the actual location of the external programs and/or libraries.

Compatible with existing package managers

The packaging system described in this document will have its own accompanying package manager, but the system has been designed with compatibility in mind. That is, it should be relatively simple to take a set of binary packages created for this system, and convert them into binary RPMS, for example. We are providing our own package manager so that we can ensure that a package manager is available on all platforms that we support, but organizations that will be creating distributions of Globus packages may wish to use their own package manager, for ease of installation management.

(A distribution of Globus packages is a set of Globus packages, along with the runtime configuration files (or tools to generate the runtime configuration files) needed for a particular set of users. RedHat linux is a good example of the distinction between packages and a distribution; each piece of software is a package, in an RPM, but the RedHat Linux Distribution includes tools (such as their install tool/bootdisk) that sets up the system for a particular configuration.)

Runtime Configuration files (vs. static data files) and who manages them (RPM vs. Linuxconf)

Programs, scripts, and possibly libraries, may require some information provided to them at runtime, per machine or per user, in order to function as desired. We will refer to files containing such information as runtime configuration files (we will always use this term instead of simply using "configuration files" to avoid confusing runtime configuration files with files used by configure when building a package). In this packaging system, binary packages may require that some runtime configuration files exist in order to function, but the package itself shall not install the actual files. This allows organizations that wish to create personalized distributions to create runtime configuration packages that can be installed and managed separately from the packages they relate to. This makes the process of upgrading a package without changing its runtime configuration files extremely easy--it is the default.

We can look to the Linux world for an example of this division. Linuxconf is a wizard that allows the user to relatively easily manipulate the runtime configuration files for various different packages that might be installed on a linux system. However, since RPM allows packages to install and manipulate their own runtime configuration files, there is no consistent method for ensuring that you retain your old runtime configuration when upgrading a package.

To illustrate this say that you have a globus package foo which has a runtime configuration file "foo.conf" that is modified by the user using the Linuxconf like GUI wizard "gui_fee". For this illustration the concept ownership is defined in such a way that when a package is responsible for installing, uninstalling, or updating a file it "owns" the file.

If foo.conf is owned by package foo then several problems arise. First any modifications by the user for gui_fee are lost whenever package foo is uninstalled, reinstalled, or updated. Second, foo.conf will be re-installed every time a new version of package foo is released even though the format of foo.conf most likely did not change. Some packagers have tried to resolve these problems by introducing pre and post install/uninstall scripts that are run during an action on package foo but this introduces an unacceptable amount of complexity to our packaging framework design.

The same problems occur when foo.conf is owned by the package gui_fee. In addition, gui_fee probably manages the runtime configurations of several packages not all of which have to be installed. Finally not all globus installations will be able to run a GUI wizard but will still need to have foo.conf.

The only acceptable solution is to have foo.conf in its own package freeing it from the actions needed for the other packages.

Decisions Regarding How Globus is Split into Packages

    The Globus Toolkit has some requirements that dictate what the strategy for splitting up the toolkit will be:

  1. The Toolkit has a complex network of dependencies between its various components. Many of these dependencies are circular unless Globus is split into atomic sized packages (ie. 1 package = 1 library). This network of dependencies also dictate that the toolkit has to be treated as a distribution of packages rather than a loose collection of packages.
  2. The development portion of the toolkit is built with a number of compiler flavors. A compiler flavor is defined as the set of build environment variables (compiler choice, linker choice, compiler/linker flags etc.) which every binary that will be linked together has to use. Flavors cannot be mixed which means that for any given source dependency tree there is an equivalent binary dependency tree for every flavor.
  3. A significant portion of Globus is not compiled at all. Instead it consist of scripts and data files which are not flavor specific.
  4. The installed files that are the product of Globus source code can be used in a number of different ways as discussed in the Overview. None of these uses require the exact same set of installed files. For example, consider a source code package that installs the program foo, the static and shared versions of the library libfoo, and the header foo.h. Program foo is needed at runtime. The shared library libfoo.so is need at runtime if other programs which link to it are installed. Both static and shared libraries libfoo.a and libfoo.so as well as the header file are needed when the package foo is used in a development tree.
  5. Several globus components use files that have to be modified after installation. These files have to be treated special so that the Globus user does not lose the modifications when packages are re-installed or updated. (These files are by definition runtime configuration files, as mentioned above.)
  6. Because of the complex dependencies anticipated between globus packages a flexible versioning scheme is needed. This scheme needs to allow both the package maintainer and the package users to individually express compatibility among the package versions. For example, the maintainer for package foo is releasing an update. This maintainer needs to be able to express that the update can be safely used in place of the previous version. Conversely, the maintainer needs to be able to express that the update is incompatible with previous versions of foo. In the same way, the users of package foo need to be able to express which versions of foo their package depends on.
The following design decisions were made to satisfy these requirements.

Small Packages

The idea behind package types is that smaller is better because it provides flexibility. For example, dependency checking between packages becomes simple if the contents of a particular package serve one consistent purpose. If dependency checking is simple, then it can be automated through package manager convenience tools. From this, users can manage a large number of packages in groups rather than individually.

Small Source Packages

The source code contained in a source code package shall always be released together. There is no provision in our current packaging framework for different binary package types generated from the same source package to have different version numbers. In other words all of the source code in a source code package uses the same name and version number. Thus, putting the source to a library, and to programs using that library into the same source package implies that you will never want to release a new version of the library without releasing a new version of the programs, and vice versa.

Multiple Binary Package Generation

One source code package shall generate several binary packages. For example, if the package generates binaries than each build flavor these binaries were built with would need to be in a seperate binary package. These packages will use the name and version of the source package and add their own unique extension. This allows dependency checking to be tailored to a particular user requirement (ie. run-time vs development tree) which will simplify the checking.

Binary Package Types

All of the installed files of a given source code package build shall be contained in several different binary package types depending on how the files are used. No file shall belong to more than one binary package. Consider the example discussed in requirement 4 in the previous section.

Here is how the installed files would be divided into seperate binary packages:

  1. Dynamically linked program foo. This is used if the runtime environment supports shared libraries.
  2. Statically linked program foo. This is used if the runtime environment does not support shared libraries.
  3. Shared library libfoo.so. This is used when other dynamically linked programs are linked with this library.
  4. Static library libfoo.a and foo.h. Used in a development tree.

Flavored Binary Packages

Any binary package that contains compiled code or files configured by flavor shall be tagged with a flavor name. Any binary package that does not contain these types of files will be tagged as a "noflavor" package.

Expressing Dependency data in packages

Each package shall store only it's direct dependencies. For example, if the source code in package foo had an include statement which referenced a header file in package fum then package foo has a direct dependency to package fum. On the other hand if the header file in fum includes headers from other packages those dependency belong to package fum not package foo. A packaging system will have to examine all of the dependent packages in order to obtain the entire dependency tree.

Circular Dependencies

A circular dependency is defined by the situation where dependencies between two or more packages can only be accommodated if all of the packages are installed simultaneously. For example if packages foo and fum depend on each other then a PM install cannot install foo before fum. Nor can it install fum before foo. This situation shall be resolved by splitting up foo and fum in such a way that the dependency tree becomes a directed acyclic graph.

Runtime Configuration Files

Files required by a globus package after installation will not be include in the package as defined by this framework. Globus distributions are encouraged to distribute these files in separate packages.

Libtool Versioning Scheme

We have adopted a variation of libtool's version numbering scheme. In libtool's scheme, each version number consists of three fields, major version number, minor version number, and a "compatibility range" number. The major version number is bumped for any interface change, the minor version represents gets bumped for bugfixes, etc in a given interface, and the third number represents the range for which the first number is backwardly compatible. For example, 5.4.3 is backwardly compatible with 2.x.x and up (through 5), since 5-3=2. We will refer to this versioning scheme as "aging version".

Dependency Version Specifications

To provide flexibility in specifying versions as a part of a package dependency the following shall be done. A packager will be able to specify the version of a dependency as either one version number or a range. If if one version is specified, then the framework will use the compatibility range mentioned previously to determine whether a dependency is met or not. If a range is specified then the packaging framwork will look for versions only within that range. In the source package a packager will be able to specify a list of ranges and versions. This list shall be in order of preference. We will refer to this list as a "version specification".

As an example consider the installed package foo which has a version number of 5.3. As was mentioned in the previous section, this specifies a compatibility range of 2 to 5. Now we want to install package fum which depends on foo. The following table shows how the versioning works:

Version specification for
Dependency foo to fum
Dependency is met?
1 No
1 to 4 No
4 to 4 No
3 Yes
3 to 6 Yes

Binary packages will only have the first version specification that was met when the source package was built. So for specifications were listed in the example, binary package foo would have the version specification of 2.3.

Globus Package Types

Supported Package Types:

Source Package
Dynamically Linked Program Binary Package
Statically Linked Program Binary Package
Non-Flavored Headers Package
Development Binary Package
Data Binary Package
Document Binary Package

Source Package

This package consists of source code, scripts, and documents which are configured and built to produce binary packages. One source package will produce one or more binary packages each of which is a different package type. Source packages have two sources of dependencies to consider. The first source are the compile and link dependencies that are present when the source code is being built. The second source is the run-time dependencies that need to be stored in the binary packages when they are generated.

Source packages are different from all of the other package types, in that they are not managed by the package manager. Source packages are not installed into the installation tree, so they do not need to include metadata for the purpose of their own uninstallation. Rather, the metadata included in a source package is necessary for ensuring that the compile and link dependencies are satisfied when building the binary packages, and for generating the metadata necessary for each binary package being produced. The metadata can also be used by a convenience tool which builds/installs/ generates binary packages from multiple source package ordered by their dependencies.

Source packages shall have the following metadata:

Dynamically Linked Program Binary Package (pgm)

This package contains dynamically linked executables and scripts. It will always be generated from a source package and will share the source package's name and version. If the package contains executables it shall also have a flavor as part of its identity. If the package contains only scripts then it can be designated as "noflavor".

Program packages can have run-time dependencies if their executables and scripts call executables and scripts in other program packages. They could also have runtime dependencies on data files and documents.

A program package can also have runtime linking dependencies if its executables are linked with libraries from rtl packages. For example, say that an executable links with libfoo in package fum. If the executable is linked to the shared library libfoo.so then the linking dependency translates to the fum_rtl package which will have to be installed before the program package.

Dynamically linked program binary package metadata:

Statically Linked Program Binary Package (pgm_static)

This package contains statically linked executables. It will always be generated from a source package and will share the source package's name and version. The package shall also have a flavor as part of its identity.

Static program packages can have run-time dependencies if their executables call executables and scripts in other program packages. They could also have runtime dependencies on data files and documents. In addition these packages absorb the runtime dependencies of the static libraries they are linked with. For example, consider a program foo that statically links with a library libfee.a that has a system call to still another program fum. The library libfee has a runtime dependency to the program fum. The program foo will have to absorb this dependency so that program fum is installed before program foo is installed.

A program package can also have regeneration dependencies if its executables are linked with libraries from other packages. For example, say that an executable links with libfoo in package fum. If the library was linked statically to libfoo.a then the dependency is translated to the fum_dev package. In this case the program package will have to be regenerated any time fum_dev is updated. A build number will be updated to reflect the regeneration.

None of the executables in a program package shall ever be built with a mixture of static and shared package libraries because this complicates the compatibility checks needed at runtime to make sure that all of the libraries are compatible.

Statically linked program binary package metadata:

Development Binary Package (dev)

This package contains flavored header files, static libraries, and libtool library files. It will always be generated from a source package and will share the source package's name and version. The package shall always have a flavor as part of its identity.

Development packages can have run-time dependencies if their libraries call executables and scripts in other program packages. They could also have runtime dependencies on data files and documents.

Even though development packages are not installed for run-time they can still have run-time dependencies with other pgm, data, and doc packages if the libraries access files or programs in these packages. The run-time dependencies of a static library will have to be absorbed by a pgm_static package if it contains an executable that was linked with the library.

A development package can have a compile dependency to another dev package if it contains a header file that includes headers from the other package.

A development package can also have linking dependencies if its libraries use symbols from libraries contained in other packages. These dependencies are contained here so that the dependency tree for an executable (from a pgm_static) can be recursively extracted when the executable is built.

Development binary package metadata:

Non-Flavored Headers Package (hdr)

This package contains header files. It will always be generated from a source package and will share the source package's name and version. The package contains only header files which are not configured for a flavor and so is assumed to be "noflavor".

A non-flavored headers package can have a compile dependency to another dev package if it contains a header file that includes headers from the other package.

Non-Flavored Headers binary package metadata:

Runtime Library Binary Package (rtl)

This package contains libraries used at run-time by programs and scripts. It will always be generated from a source package and will share the source package's name and version. If the package contains binaries it shall also have a flavor as part of its identity. Otherwise it is a noflavor.

Runtime packages can have run-time dependencies if their libraries call executables and scripts in other program packages. They could also have runtime dependencies on data files and documents.

Runtime packages have linking dependencies which are needed at runtime. For example when a program using shared library foo starts execution, it needs to load libfoo.so as well as all of the shared libraries that libfoo depends on for symbols. Runtime library binary package metadata:

Data Binary Package (data)

This package contains data files which cannot be modified by users. It will always be generated from a source package and will share the source package's name and version. If the package shall also have a flavor as part of its identity if any data files are configured for flavored. Otherwise it will be noflavored.

Data packages have run-time dependencies if data files include files from other data packages.

Data binary package metadata:

Document Binary Package (doc)

This package contains documents. It will always be generated from a source package and will share the source package's name and version. It will always be noflavored.

Document packages have run-time dependencies if document files include files from other doc packages.

Document binary package metadata:

The Globus Packaging Framwork

Globus Installations

A globus installation is defined as a set of packages installed in one location whose dependencies are completely resolved with other packages installed in the same location.

A platform can (should) have multiple globus installations. The packages in any one of these installations shall have no knowledge of the other installations.

Users shall be able to switch between installations by using the environment variable $GLOBUS_INSTALL_PATH.

Only one version of a package shall be installed in any particular globus installation.

Multiple flavors and binary package types for a particular package can be installed in the same globus installation.

Only one flavor of a pgm or pgm_static package can be installed. This is because executables are not tagged with the flavor name.

Other than programs, any installed file that is configured for a flavor shall have the flavor name appended to its filename. For example libfoo.so compiled with the flavor sweet shall be installed as libfoo_sweet.so. The exception is flavored header files which keep their name but are installed in a flavored subdirectory. In other words a header foo.h which is configured for the flavor sweet is installed as $GLOBUS_INSTALL_PATH/include/sweet/foo.h.

Every package shall install packaging metadata in $GLOBUS_INSTALL_PATH/etc so that the data can be used for package management tasks, for building other globus packages, and for building applications which use globus components.

Globus Core

Globus core shall be used to define the build environment for a particular flavor. The flavor is defined when the core is configured. Specifics for a flavor can be passed into configure using --with-* and --enable-* options. The flavor name is specified using the --with-flavor= option.

Globus core shall install a flavor specific header file and a flavor specific script initializer which shall be used to build all other packages for that particular flavor.

Globus core treats the name of a flavor as an arbitrary label as long as it can be used in directory and file names.

The Globus group shall establish a flavor naming policy that allows packages from different globus distributions to be interoperable. This policy shall not be part of globus core.

Globus core shall provide a script which will create a run time configuration file that locates various scripting tools needed to run globus scripts and build globus components.