Extending Debian with customizable packages

In my previous post I drafted a rough idea how to add a feature to Debian that would bridge the gap between Debian as a Binary Distribution and any Source Distribution. The feature in question is giving users the chance to build customized Debian packages for specific feature sets. The possible use cases are simple:

Lets say Anthony User needs super-duper-tool with postgresql support. However this is a not so common use case and so the archive only has postgresql support. If he now wants to build a custom package, the process is „as simple“ as:

apt-get source super-duper-tool
cd super-duper-tool-*
apt-get build-dep super-duper-tool OR sudo mk-build-deps -r -i
(possibly try out which ./configure options are supported by super-duper-tool)
vi debian/rules
edit debian/rules
dpkg-buildpackage -us -uc
sudo debi

That is, to be honest lot of work. And the biggest problem about it is that you need to know all this, which is not really user knowledge.

The basic idea is that Anthony can do something like
export DEB_BUILD_OPTIONS=nomysql,postgresql
deb-build-tool build super-duper-tool

and is done.
So the question is how could this be achieved?

  1. Define a set of common options (options should be consistent through all source packages, otherwise it makes not that much sense) that every package should support, if it is a supported feature by the package in question. This includes flags like mysql, postgresql, ldap and its no-equivalents (nomysql, nopostgresql) to negate it.
  2. Enable use of this common options in the package build process and that is really the hardest part. Defining a lot of ifneq..findstring constructs for 2-10 options per source package in any available source package is…. not…. a good approach. Luckily a lot of packages uses autoconf where enabling and disabling features is as easy as adding a –with-foo or –enable-foo option to the ./configure parameters. So we could write a wrapper that would handle these options. The debhelper scripts already have dh_auto_configure: Maybe this could get enhanced.
  3. If it does not exist: Write a tool to auto-build packages with DEB_BUILD_OPTIONS set by the user. Should automatically get the source, (optional: setup/update a pbuilder environment) and build it and optional install/upload it (to a user repository for example) .

Still, the idea is only a rough draft. There are still a lot of open questions/issues, for example:

  • How to handle these feature flags in packages that don’t use autoconf
  • How to enable these feature flags for packages that don’t use debhelpers
  • Shall a central place for setting DEB_BUILD_OPTIONS exist (like it does for Gentoo, FreeBSD etc.) and if so: Where?
  • How should packages define new build options, if those defined Debian-wide, aren’t enough?
  • What about support? Do we support such custom builds of packages or will this be a support-on-good-will thing

Is anybody interested in pushing this idea forward? I’m happily interested in it and I know people who like the idea, too. But I can’t push this forward on my own all alone. Certainly the work won’t affect Lenny anymore, but considerable the time frame that is planned for Lenny+1 by the release team, this could be a feature for Lenny+1.

Re: Gentoo destroying earth?

I fully agree that working with Gentoo is no fun at all, as Julian points out in his post
My impression, when I tried Gentoo for the first (and last) time is the same. Thats not because it is made bad. In fact things in Gentoo aren’t made that bad. There is good documentation, portage is quiet nice (with its USE flags and alike), but in the end I’m not really satisfied.
After all Gentoo uses a copied concept, which itself is good. The concept is derived from the BSD-world and is just the concept of source-based systems. This concept has its advantages over binary distributions, because it allows a flexibility that is not really possible with a binary distribution. That really is the only appreciable advantage of these systems. Users of these systems (including FreeBSD and alike) tend to give other arguments as well: Newest Software, Highest Performance and even Security is a point they give.

Usually the „newest software“ argument is brought up with a rant against Debian. I keep hearing statements like „Debian Etch is totally outdated and Testing is broken“. That statement in itself is just wrong, but the important thing is: What do you need such new software for? On a server its preferrable not to upgrade to each and every major release, for certain reasons. Its also a bad thing to build on production systems, so you need to take further measures to administer systems like that. On a desktop I can understand the logic, but then again: No. Why would I want a system I have to compile from scratch and on each upgrade (wasting a lot of time, power etc.) which changes often?

The performance argument is the dumbest of all. Binary distributions usually build binaries for various platforms. And while they can’t be optimized for a very specific processor or a very specific feature set (=reduces binary size) they usually perform well enough that a difference between the self-compiled Gentoo system and a foreign-compiled Debian is not noticed by the user and sometimes not even measurable. So the time you save during the lifetime of your builds (which is not very long on a Desktop, is it?) because of the enhanced performance is used up a hundred times, by the time you waste for compiling the whole software only for yourself. Even if some of the software performs noticable better (e.g. video processing is said to benefit a lot from an optimized platform).

What about the security argument? It has been said, that you get your updates earlier then users of binary distributions. That is partly true, because in a binary distribution the update needs to be built first, before it gets to the end user (however, everything before is the same for a binary or a source distribution). But do you really follow security issues that close, that you benefit a lot from this? If you do, indeed, you save some hours. If (and only if) both distributions have a solution for a security issue at the same time.

Anyway. The flexibility argument can’t get discussed away. Its the argument that makes Ports in BSD-systems attractive. You can have exactly the features you want, with the build-time options you want. In a comfortable manner. That is in contrast to binary distributions where we package developers need to guess which features might be needed (/wanted) by the users with mixed results. Good if they can decide for themselves. Still, I don’t see the reason for compiling the whole system, if I need one or two customized components. So I go with Debian and customize packages if I really need/want to. The only difference is that it takes a lot of more effort.

I would love to see something done in Debian to reach a compromise. Making rebuilding of packages with different options very comfortable. We have DEB_BUILD_OPTIONS, maybe we should enhance it to support a lot more options then noopt,nostrip or nodoc. Possibly it would be a good thing to standardize on some use flags (e.g. [no]ssl, [no]ldap, etc.) and support them in the debian/rules file. This way building customized packages would be as easy as setting sensible DEB_BUILD_OPTIONS and run dpkg-buildpackage on the source. This could be eased further by providing a tool to download the source and build a given package (IIRC such a tool already exists). How does that sound?

Where to find help for commands?

Today an discussion rised up in #debian-devel on the OFTC IRC Network. It started because someone noted the bug report #501318 which is, to summarize it, just a user mistake, because someone obviously read the man page (time(1)) for the /usr/bin/time command, while time is also a shell builtin (which does not accept the same arguments as the time command. Certainly this bug reports appears to be funny at the first sight, but on the other hand, there is a suboptimal situation that leads to this.

  1. There are some builtin commands that also have binary equivalents (like time, printf, echo). Its quiet easy to tell weither the one or the other is used, by using the which command, but thats not really a realistic workflow, so its better to know this. The difference between these commands is often causing problems, which we have to cope with. For example bashs builtin echo behaves different as /bin/echo and people who use the bash as their default shell tend to use what bash provides, which in turn causes problems if other people who use a different default shell try to work with these scripts. But this is another problem, because…
  2. … every program, utility and function in Debian has to provide a manpage, as said by our policy: „Each program, utility, and function should have an associated manual page included in the same package.“ I guess that the rationale behind that is that ‚man‘ is a very common tool in the Un*x world, which is wide-spread and which usage is much recommended to find out how specific tools behave. It is always referred to in Documentation, weither it is Debian specific or not, e.g. in books.

    The time package, which is of priority Standard and which includes /usr/bin/time, does conform to that policy by providing a manpage for the time command. The various shells don’t do that, because they usually don’t have a man page for each and every command (usually they have a more generic manpage which includes the builtins or in some seldom cases a special manpage for such and similar things (as for zsh, which has zshmisc(1)) and because its not that easy, because the man command cannot (AFAIK) distinguish between the user joey_average calling ‚man time‘ in a bash, while the user schoenfeld calls the command ‚man time‘ in a zsh, or if joey_foobar calls ‚man read‘ in a shell which does not have a builtin time command and uses /usr/bin/time instead.

So whats wrong about this situation? Would you say that users that expect ‚man time‘ (or similar examples) to do the right thing are making wrong assumptions? I disagree. Its what they’ve always been told to do. And if it does not show anything eventually run info, or look at HTML documentation or what else. But they haven’t been prepared for the case where the manpages does show something, something wrong. The good thing is that the manpage for time includes a sentence:

„Users of the bash shell need to use an explicit path in order to run the external time command and not the shell builtin variant.“

The bad thing about it is, that it is at approx. 70-80% of the manpage.

Clint, the maintainer of zsh, mentioned run-help which seems to be a part of the zsh, but not of any other shell, and does more or less the right thing (at least for the builtins) but not for external commands and not even for itself (it opens the code of the function instead of something user-readable like a manpage). I guess „one tool for a specific need“ is a good maxime, but is it really a good maxime for finding documentation?

But how could the situation be bettered? I could think of a wrapper for man, similar to run-help but as a more generic solution. Any ideas for it? Is it the right way at all to better the users experience? Any other ideas? Other opinions?