Filtering planet entries

Here is how to setup liferea not to show me some entries in Planet Debian:

  1. Create a script that reads the rss from stdin, removes the entries you don't want and then writes the rss to stdout;
  2. From the feed properties in liferea, choose the source tab, enable the conversion filter and point that at your script.

Now you just need a simple script that filters the RSS. Here is mine:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#!/usr/bin/python

# Copyright (C) 2007 Enrico Zini <enrico@debian.org>
# This software is licensed under the therms of the GNU General Public
# License, version 2 or later.

import libxml2, re

# What links we should filter out
unwanted = re.compile(r"^(http://feed1.example.com|http://feed2.example.com)")

doc = libxml2.parseFile("-")
root = doc.getRootElement()

# Create an xpath context and register the namespaces
xpc = doc.xpathNewContext()
for d in root.nsDefs():
    if d.name == None:
        xpc.xpathRegisterNs("rss", d.content)
    else:
        xpc.xpathRegisterNs(d.name, d.content)

# Remove unwanted items from the channel list
for x in xpc.xpathEval("/rdf:RDF/rss:channel/rss:items/rdf:Seq/rdf:li"):
    res = x.nsProp("resource", "http://www.w3.org/1999/02/22-rdf-syntax-ns#")
    if unwanted.match(res):
        x.unlinkNode()
        x.freeNode()

# Remove unwanted items from the item list
for x in xpc.xpathEval("/rdf:RDF/rss:item"):
    res = x.nsProp("about", "http://www.w3.org/1999/02/22-rdf-syntax-ns#")
    if unwanted.match(res):
        x.unlinkNode()
        x.freeNode()

# Serialize the result
print doc.saveFormatFile("-", True)

Now, getting to this simple script took some spitting blood. Basically, in Debian we seem to have lots of simple libraries for:

I tried, in order:

Update: Nemui Ailin told me that with the most recent upstream version it works. I've reported the bug