Improving package managers

debian debtags eng pdo

I noticed two posts on improving package managers none of which mentions Debtags.

Daniel Burrows mentions various issues:

the current sections in Synaptic are useless
there are better keyword search technologies than strstr()
we could use popularity contest data to sort results
it would be cool to do amazon-like things using popcon data

David Nusinov mentions that the ideal package manager should look like Google, where you search for things using just a simple one line text entry and pick from the results what you want to install.

I should probably do a bit of recap of things that have been going on.

I'll go through that list again:

The current sections in Synaptic are useless

Agreed. This used to be a bug about this, which has been closed by Debtags more than one year ago. We now have much more useful category data for about 73% of the archive (including experimental), but what we lack is software using it.

Here's a quick trick to try:

install debtags, and this gives you an easy to read text file in /var/lib/debtags/package-tags.
from that file, pick packages that have the tags role::program, scope::application and interface::x11.
display the results, and use the tags works-with::* and use::* to navigate the results.

There is a python-debian package in experimental that has a debtags module you could play with.

Why is that that so far noone has written a simple package manager just for gamers, which uses only the game::* tags?

Do you think Debtags gives you too many tags? Then check out:

The Debtags smart search, and especially how it does not show you all the tags, but it is able to infer the tags you want from your google-like query (hi David!).
The Debtags tag editor, and especially the search-as-you type feature on all the tags and the tag search (analogous to the Debtags smart search, but it only searches tags.
The Debtags tag cloud, and if you don't like that one try to make your own: there are countless ways of generating tag clouds from Debtags data.

To summarise so far, we not only do have better categories, but also a number of cool algorithms to use them, and some interface prototypes as well. Just don't expect me to write a package manager as well: that's a job that so far I decided to leave to someone else. adept gave it a try, with positive results.

there are better keyword search technologies than strstr()

Indeed, Xapian for example. I use it as part of the backend of the Debtags smart search, and here's our Xapian-powered normal keyword based package search interface which does stemming, indexing and all you want to ask from a serious full text index.

In that page you don't see all the nice features of Xapian, but only the ones that I needed for my Debtags evil plans. Have a look at the documentation and give it a try.

Here is a way to see Xapian's similarity matching in action:

go to the Go tagging! page
click on a random untagged package
the system gives you a rather relevant selection of tags
look at it again: the package was untagged: how could the web engine possibly figure those tags out?

What is happening under the scenes is that:

I ask Xapian: "what packages are similar to this one?".
I aggregate the tags of the resulting packages.
I rank the tags by how many resulting packages have them.

While we are on this topic, why don't we decide that we maintain a Xapian index of our package descriptions in, for example, /var/lib/apt/fulltext/, so that various applications can share it?

we could use popularity contest data to sort results

Indeed. Anyone would like to implement this little "popcon" tool? Having the data easily accessible locally can encourage people to use them.

The Debtags Go tagging! page already uses popcon data to show the most common untagged packages at the top, with double reason: it shows packages that more people are likely to know (and therefore likely to categorise) and it pushes for the most common packages to be tagged more urgently.

it would be cool to do amazon-like things using popcon data

Indeed. Anyone volunteers to implement a prototype? The full unaggregated (but anonymised) popcon data are accessible to every Debian Developer on the host gluck.debian.org in the directory /org/popcon.debian.org/popcon-mail/popcon-entries.

Ideally one can do many interesting things with this concept: besides tag suggestions, one could identify the packages that are most representative of an installed system, and also offer negative suggestions like: "people who have packages like yours usually don't have this package: would you like to remove it?".

There is more than all this that could be done. Recently, almost by accident, I had the idea of querying packages by example, like pointing to a file and find packages that can work with it. I've asked Jeroen to have Mole collect info on all files that could possibly get installed in /usr/lib/mime/packages/ (as suggested by Bernhard R. Link), to see if that prototype can be made more accurate.

Query by similarity would be nice: I don't like this program, but what else do we have that does the same job? This is best implemented using Debtags data, since it directly maps to semantic properties. Note that you don't have to show a single tag to the user to implement this kind of interface. Do we have a way to point at the X window of an application and get the name of the package that installed it? Wouldn't it be about time to have it?

Why don't we have a system updater utility that shows the Debian weather?

Why aren't more people playing with semantic web?

But more generally, the problem with package managers is that we seem to be irrationally compulsive in wanting to make the one and only big easy and complete interface for everyone. Other more reasonable people would tell you that if you have two very different kinds of users you may want to consider having two different user interfaces.

Ubuntu for example installs by default 3 package manager interfaces: Synaptic; the thing that you access from the application menu to add applications to it; and the update manager. Does it sound like a waste? To me it makes lots of sense.

We have lots of interesting, usable metadata; we have algorithms; we have prototypes; we have ideas for lots of cool, implementable features. The question is, are we able to write applications that just combines what is needed from all this treasure to provide the right interface(s) for our base(s) of users?

Even if my English in 2004 wasn't easy to understand, a read here might still be useful.

There is so much really cool stuff to be written, just within reach.