While answering to a long message in the debtags-devel mailing list I accidentally put together the pieces of a fun idea.
This is the bit of message I was answering:
- It would be very useful if the means for indicating the supported data formats was more comprehensive. This could mean a lot of expanding in the "works-with-format" section of the vocabulary, which doesn't even include formats such as gif or mpg at the moment. I don't know how feasible it is to alter underlying debtags functionality, but perhaps it would be the easiest to make "works-with-format" a special case tag which allows for formats not listed in the vocabulary.
This is my answer:
Good point. The idea has popped up in the past to list supported mime types among the package metadata, so that one could point to a file and get a list of all the packages that can work with it.
I'm not sure it's a good idea to encode mime types in debtags and I'd like to see something ad-hoc for it. In the meantime works-with-format is the best we can do, but we should limit it to the most common formats.
This is the fun idea: if works-with-format
is the best we can do, what can we
do with it?
Earlier today I worked on resurrecting some old code of mine to expand Zack's ls2rss with Dublin Core metadata extracted from the files. The mime type scanner was ready for action.
Some imports:
import sys # Requires python-extractor, python-magic, python-apt # and an unreleased python-debtags from http://bzr.debian.org/bzr/pkg-python-debian/trunk/ import extractor import magic from debian_bundle import debtags import re from optparse import OptionParser import apt
A tenative mapping between mime types and debtags tags:
mime_map = ( ( r'text/html\b', ("works-with::text","works-with-format::html") ), ( r'text/plain\b', ("works-with::text","works-with-format::plaintext") ), ( r'text/troff\b', ("works-with::text", "works-with-format::man") ), ( r'image/', ("works-with::image",) ), ( r'image/jpeg\b', ("works-with::image:raster","works-with-format::jpg") ), ( r'image/png\b', ("works-with::image:raster","works-with-format::png") ), ( r'application/pdf\b', ("works-with::text","works-with-format::pdf")), ( r'application/postscript\b', ("works-with::text","works-with-format::postscript")), ( r'application/x-iso9660\b', ('works-with-format::iso9660',)), ( r'application/zip\b', ('works-with::archive', 'works-with-format::zip')), ( r'application/x-tar\b', ('works-with::archive', 'works-with-format::tar')), ( r'audio/', ("works-with::audio",) ), ( r'audio/mpeg\b', ("works-with-format::mp3",) ), ( r'audio/x-wav\b', ("works-with-format::wav",) ), ( r'message/rfc822\b', ("works-with::mail",) ), ( r'video/', ("works-with::video",)), ( r'application/x-debian-package\b', ("works-with::software:package",)), ( r'application/vnd.oasis.opendocument.text\b', ("works-with::text",)), ( r'application/vnd.oasis.opendocument.graphics\b', ("works-with::image:vector",)), ( r'application/vnd.oasis.opendocument.spreadsheet\b', ("works-with::spreadsheet",)), ( r'application/vnd.sun.xml.base\b', ("works-with::db",)), ( r'application/rtf\b', ("works-with::text",)), ( r'application/x-dbm\b', ("works-with::db",)), )
Code that does its best to extract a mime type:
extractor = extractor.Extractor() magic = magic.open(magic.MAGIC_MIME) magic.load() def mimetype(fname): keys = extractor.extract(fname) xkeys = {} for k, v in keys: if xkeys.has_key(k): xkeys[k].append(v) else: xkeys[k] = [v] namemagic = magic.file(fname) contentmagic = magic.buffer(file(fname, "r").read(4096)) return xkeys.has_key("mimetype") and xkeys['mimetype'][0] or contentmagic or namemagic
Command line parser:
parser = OptionParser(usage="usage: %prog [options] filename", version="%prog "+ VERSION, description="search Debian packages that can handle a given file") parser.add_option("--tagdb", default="/var/lib/debtags/package-tags", help="Tag database to use (default: %default)") parser.add_option("--action", default=None, help="Show the packages that allow the given action on the file (default: %default)") (options, args) = parser.parse_args() if len(args) == 0: parser.error("Please provide the name of a file to scan")
And here starts the fun: first we load the debtags data:
# Read full database fullcoll = debtags.DB() tagFilter = re.compile(r"^special::.+$|^.+::TODO$") fullcoll.read(open(options.tagdb, "r"), lambda x: not tagFilter.match(x))
Then we scan the mime type and look up tags in the mime_map
above:
type = mimetype(args[0]) #print >>sys.stderr, "Mime type:", type found = set() for match, tags in mime_map: match = re.compile(match) if match.match(type): for t in tags: found.add(t) if len(found) == 0: print >>sys.stderr, "Unhandled mime type:", type else:
If the user only gave the file name, let's show what Debian can do with that file:
if options.action == None: print "Debtags query:", " && ".join(found) query = found.copy() query.add("role::program") subcoll = fullcoll.filterPackagesTags(lambda pt: query.issubset(pt[1])) uses = map(lambda x:x[5:], filter(lambda x:x.startswith("use::"), subcoll.iterTags())) print "Available actions:", ", ".join(uses)
If the user picked one of the available actions, let's show the packages that do it:
else: aptCache = apt.Cache() query = found.copy() query.add("role::program") query.add("use::"+options.action) print "Debtags query:", " && ".join(query) subcoll = fullcoll.filterPackagesTags(lambda pt: query.issubset(pt[1])) for i in subcoll.iterPackages(): aptpkg = aptCache[i] desc = aptpkg.rawDescription.split("\n")[0] print i, "-", desc
\o/
The morale of the story:
- Debian is lots of fun
- We have amazing tecnology just waiting for good ideas.
- I'd love to see more little scripts like this getting written.