Posts containing useful tips.
Temporarily disabling file caching
Does it happen to you that you cp a big, big file (say, similar in order of magnitude to the amount of RAM) and the system becomes rather unusable?
It looks like Linux is saying "let's cache this", and as you copy it will try to free more and more ram in order to cache the big file you're copying. In the end, all the RAM is full with file data that you are not going to need.
This varies according to how /proc/sys/vm/swappiness is set.
I learnt about posix_fadvise and I tried to play with it. The result is this
preloadable library that
hooks into open(2) and fadvises everything as POSIX_FADV_DONTNEED.
It is all rather awkward. fadvise in that way will discard existing cache pages if the file is already cached, which is too much. Ideally one would like to say "don't cache this because of me" without stepping on the feet of other system activities.
Also, I found I need to also hook into write(2) and run fadvise after every
single write, because you can't fadvise a file to be written in its entirety,
unless you pass fadvise the file size in advance. But the size of the output
file cannot be known by the preloaded library, so meh.
So, now I can run: nocache cp bigfile someplace/ without trashing the
existing caches. I can also run nocache tar zxf foo.tar.gz and so on.
I wish, of course, that there were no need to do so in the first place.
Here is the nocache library source code, for reference:
/* * nocache - LD_PRELOAD library to fadvise written files to not be cached * * Copyright (C) 2009--2010 Enrico Zini <enrico@enricozini.org> * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ #define _XOPEN_SOURCE 600 #include <fcntl.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <dlfcn.h> #include <stdarg.h> #include <errno.h> #include <stdio.h> typedef int (*open_t)(const char*, int, ...); typedef int (*write_t)(int fd, const void *buf, size_t count); int open(const char *pathname, int flags, ...) { static open_t func = 0; int res; if (!func) func = (open_t)dlsym(RTLD_NEXT, "open"); // Note: I wanted to add O_DIRECT, but it imposes restriction on buffer // alignment if (flags & O_CREAT) { va_list ap; va_start(ap, flags); mode_t mode = va_arg(ap, mode_t); res = func(pathname, flags, mode); va_end(ap); } else res = func(pathname, flags); if (res >= 0) { int saved_errno = errno; int z = posix_fadvise(res, 0, 0, POSIX_FADV_DONTNEED); if (z != 0) fprintf(stderr, "Cannot fadvise on %s: %m\n", pathname); errno = saved_errno; } return res; } int write(int fd, const void *buf, size_t count) { static write_t func = 0; int res; if (!func) func = (write_t)dlsym(RTLD_NEXT, "write"); res = func(fd, buf, count); if (res > 0) { int saved_errno = errno; int z = posix_fadvise(fd, 0, 0, POSIX_FADV_DONTNEED); if (z != 0) fprintf(stderr, "Cannot fadvise during write: %m\n"); errno = saved_errno; } return res; }
Cropping images with GDAL
I am working to get a better integration between Meteosatlib and GDAL.
A nice aspect of GDAL is that it allows to create read/write drivers around two
functions: Open and CreateCopy.
Open opens a datset read only, and to implement that all you have to do is to
implement read access to your data using the
GDALDataset interface.
To implement CreateCopy, all you have to do is to create a new file reading
information from a GDALDataset; then call Open on it. This means that there
is no need to support incremental updates, and that all the data required to
create a new file is readily available. This simplifies matters a lot.
GDAL provides some interesting image manipulation functions, that can work over just these two calls. The way it does it is by exploiting the concept of a virtual dataset, which wraps an existing dataset but changes some parameters on the fly.
This means that you can wrap a read only dataset with a virtual dataset that
transforms it somehow, and then pass the virtual dataset to CreateCopy to
save the transformed image in a format of your choice.
On top of that, for many transformations you do not need to create your own virtual datasets, but you can use the functions provided by the VRT GDAL driver.
One can learn a lot on how to use VRTDataset by reading the source code for
gdal_translate. By doing that I could come up with this code for cropping an
image, which is an interesting VRTDataset example.
GDALDataset* crop(GDALDataset* poDS, int xoff, int yoff, int xsize, int ysize) { VRTDataset *poVDS = (VRTDataset*)VRTCreate(xsize, ysize); // Copy dataset info const char* pszProjection = poDS->GetProjectionRef(); if (pszProjection != NULL && strlen(pszProjection) > 0) poVDS->SetProjection(pszProjection); double adfGeoTransform[6]; if (poDS->GetGeoTransform(adfGeoTransform) == CE_None) { // Adapt the geotransform matrix to the subarea adfGeoTransform[0] += xoff * adfGeoTransform[1] + yoff * adfGeoTransform[2]; adfGeoTransform[3] += xoff * adfGeoTransform[4] + yoff * adfGeoTransform[5]; poVDS->SetGeoTransform(adfGeoTransform); } poVDS->SetMetadata(poDS->GetMetadata()); // Here I also copy metadata from my own domain char **papszMD; papszMD = poDS->GetMetadata(MD_DOMAIN_MSAT); if (papszMD != NULL) poVDS->SetMetadata(papszMD, MD_DOMAIN_MSAT); for (int i = 0; i < poDS->GetRasterCount(); ++i) { GDALRasterBand* poSrcBand = poDS->GetRasterBand(i + 1); GDALDataType eBandType = poSrcBand->GetRasterDataType(); poVDS->AddBand(eBandType, NULL); VRTSourcedRasterBand* poVRTBand = (VRTSourcedRasterBand*)poVDS->GetRasterBand(i + 1); poVRTBand->AddSimpleSource(poSrcBand, xoff, yoff, xsize, ysize, 0, 0, xsize, ysize); poVRTBand->CopyCommonInfoFrom(poSrcBand); // Again, I copy my own metadata papszMD = poSrcBand->GetMetadata(MD_DOMAIN_MSAT); if (papszMD != NULL) poVRTBand->SetMetadata(papszMD, MD_DOMAIN_MSAT); } return poVDS; }
This function wraps a dataset with a virtual dataset that crops it. Just pass
the resulting dataset to GDALCreateCopy to save it in the format that you
need.
Custom function decorators with TurboGears 2
I am exposing some library functions using a TurboGears2 controller (see web-api-with-turbogears2). It turns out that some functions return a dict, some a list, some a string, and TurboGears 2 only allows JSON serialisation for dicts.
A simple work-around for this is to wrap the function result into a dict, something like this:
@expose("json") @validate(validator_dispatcher, error_handler=api_validation_error) def list_colours(self, filter=None, productID=None, maxResults=100, **kw): # Call API res = self.engine.list_colours(filter, productID, maxResults) # Return result return dict(r=res)
It would be nice, however, to have an @webapi() decorator that
automatically wraps the function result with the dict:
def webapi(func): def dict_wrap(*args, **kw): return dict(r=func(*args, **kw)) return dict_wrap # ...in the controller... @expose("json") @validate(validator_dispatcher, error_handler=api_validation_error) @webapi def list_colours(self, filter=None, productID=None, maxResults=100, **kw): # Call API res = self.engine.list_colours(filter, productID, maxResults) # Return result return res
This works, as long as @webapi appears last in the list of decorators.
This is because if it appears last it will be the first to wrap the function,
and so it will not interfere with the tg.decorators machinery.
Would it be possible to create a decorator that can be put anywhere among the decorator list? Yes, it is possible but tricky, and it gives me the feeling that it may break in any future version of TurboGears:
class webapi(object): def __call__(self, func): def dict_wrap(*args, **kw): return dict(r=func(*args, **kw)) # Migrate the decoration attribute to our new function if hasattr(func, 'decoration'): dict_wrap.decoration = func.decoration dict_wrap.decoration.controller = dict_wrap delattr(func, 'decoration') return dict_wrap # ...in the controller... @expose("json") @validate(validator_dispatcher, error_handler=api_validation_error) @webapi def list_colours(self, filter=None, productID=None, maxResults=100, **kw): # Call API res = self.engine.list_colours(filter, productID, maxResults) # Return result return res
As a convenience, TurboGears 2 offers, in the decorators module, a way to
build decorator "hooks":
class before_validate(_hook_decorator): '''A list of callables to be run before validation is performed''' hook_name = 'before_validate' class before_call(_hook_decorator): '''A list of callables to be run before the controller method is called''' hook_name = 'before_call' class before_render(_hook_decorator): '''A list of callables to be run before the template is rendered''' hook_name = 'before_render' class after_render(_hook_decorator): '''A list of callables to be run after the template is rendered. Will be run before it is returned returned up the WSGI stack''' hook_name = 'after_render'
The way these are invoked can be found in the _perform_call function in
tg/controllers.py.
To show an example use of those hooks, let's add a some polygen wisdom to every data structure we return:
class wisdom(decorators.before_render): def __init__(self, grammar): super(wisdom, self).__init__(self.add_wisdom) self.grammar = grammar def add_wisdom(self, remainder, params, output): from subprocess import Popen, PIPE output["wisdom"] = Popen(["polyrun", self.grammar], stdout=PIPE).communicate()[0] # ...in the controller... @wisdom("genius") @expose("json") @validate(validator_dispatcher, error_handler=api_validation_error) def list_colours(self, filter=None, productID=None, maxResults=100, **kw): # Call API res = self.engine.list_colours(filter, productID, maxResults) # Return result return res
These hooks cannot however be used for what I need, that is, to wrap the result inside a dict. The reason is because they are called in this way:
controller.decoration.run_hooks( 'before_render', remainder, params, output)
and not in this way:
output = controller.decoration.run_hooks( 'before_render', remainder, params, output)
So it is possible to modify the output (if it is a mutable structure) but not to exchange it with something else.
Can we do even better? Sure we can. We can assimilate @expose and @validate
inside @webapi to avoid repeating those same many decorator lines over and
over again:
class webapi(object): def __init__(self, error_handler = None): self.error_handler = error_handler def __call__(self, func): def dict_wrap(*args, **kw): return dict(r=func(*args, **kw)) res = expose("json")(dict_wrap) res = validate(validator_dispatcher, error_handler=self.error_handler)(res) return res # ...in the controller... @expose("json") def api_validation_error(self, **kw): pylons.response.status = "400 Error" return dict(e="validation error on input fields", form_errors=pylons.c.form_errors) @webapi(error_handler=api_validation_error) def list_colours(self, filter=None, productID=None, maxResults=100, **kw): # Call API res = self.engine.list_colours(filter, productID, maxResults) # Return result return res
This got rid of @expose and @validate, and provides almost all the
default values that I need. Unfortunately I could not find out how to access
api_validation_error from the decorator so that I can pass it to the
validator, therefore I remain with the inconvenience of having to explicitly
pass it every time.
Building a web-based API with Turbogears2
I am using TurboGears2 to export a python API over the web. Every API method is wrapper by a controller method that validates the parameters and returns the results encoded in JSON.
The basic idea is this:
@expose("json") def list_colours(self, filter=None, productID=None, maxResults=100, **kw): # Call API res = self.engine.list_colours(filter, productID, maxResults) # Return result return res
To validate the parameters we can use forms, it's their job after all:
class ListColoursForm(TableForm): fields = [ # One field per parameter twf.TextField("filter", help_text="Please enter the string to use as a filter"), twf.TextField("productID", help_text="Please enter the product ID"), twf.TextField("maxResults", validator=twfv.Int(min=0), default=200, size=5, help_text="Please enter the maximum number of results"), ] list_colours_form=ListColoursForm() #... @expose("json") @validate(list_colours_form, error_handler=list_colours_validation_error) def list_colours(self, filter=None, productID=None, maxResults=100, **kw): # Parameter validation is done by the form # Call API res = self.engine.list_colours(filter, productID, maxResults) # Return result return res
All straightforward so far. However, this means that we need two exposed methods for every API call: one for the API call and one error handler. For every API call, we have to type the name several times, which is error prone and risks to get things mixed up.
We can however have a single error handler for all methonds:
def get_method(): ''' The method name is the first url component after the controller name that does not start with 'test' ''' found_controller = False for name in pylons.c.url.split("/"): if not found_controller and name == "controllername": found_controller = True continue if name.startswith("test"): continue if found_controller: return name return None class ValidatorDispatcher: ''' Validate using the right form according to the value of the "method" field ''' def validate(self, args, state): method = args.get("method", None) # Extract the method from the URL if it is missing if method is None: method = get_method() args["method"] = method return forms[method].validate(args, state) validator_dispatcher = ValidatorDispatcher()
This validator will try to find the method name, either as a form field
or by parsing the URL. It will then use the method name to find the form to use
for validation, and pass control to the validate method of that form.
We then need to add an extra "method" field to our forms, and arrange the forms inside a dictionary:
class ListColoursForm(TableForm): fields = [ # One hidden field to have a place for the method name twf.HiddenField("method") # One field per parameter twf.TextField("filter", help_text="Please enter the string to use as a filter"), #... forms["list_colours"] = ListColoursForm()
And now our methods become much nicer to write:
@expose("json") def api_validation_error(self, **kw): pylons.response.status = "400 Error" return dict(form_errors=pylons.c.form_errors) @expose("json") @validate(validator_dispatcher, error_handler=api_validation_error) def list_colours(self, filter=None, productID=None, maxResults=100, **kw): # Parameter validation is done by the form # Call API res = self.engine.list_colours(filter, productID, maxResults) # Return result return res
api_validation_error is interesting: it returns a proper HTTP error status,
and a JSON body with the details of the error, taken straight from the form
validators. It took me a while to find out that the form errors are in
pylons.c.form_errors (and for reference, the form values are in
pylons.c.form_values). pylons.response is a WebOb Response that we can play with.
So now our client side is able to call the API methods, and get a proper error if it calls them wrong.
But now that we have the forms ready, it doesn't take much to display them in web pages as well:
def _describe(self, method): "Return a dict describing an API method" ldesc = getattr(self.engine, method).__doc__.strip() sdesc = ldesc.split("\n")[0] return dict(name=method, sdesc = sdesc, ldesc = ldesc) @expose("myappserver.templates.myappapi") def index(self): ''' Show an index of exported API methods ''' methods = dict() for m in forms.keys(): methods[m] = self._describe(m) return dict(methods=methods) @expose('myappserver.templates.testform') def testform(self, method, **kw): ''' Show a form with the parameters of an API method ''' kw["method"] = method return dict(method=method, action="/myapp/test/"+method, value=kw, info=self._describe(method), form=forms[method]) @expose(content_type="text/plain") @validate(validator_dispatcher, error_handler=testform) def test(self, method, **kw): ''' Run an API method and show its prettyprinted result ''' res = getattr(self, str(method))(**kw) return pprint.pformat(res)
In a few lines, we have all we need: an index of the API methods (including their documentation taken from the docstrings!), and for each method a form to invoke it and a page to see the results.
Make the forms children of AjaxForm, and you can even see the results together with the form.
Lessons learnt from Oracle
Lesson number 1: "how to handle temporary files".
$ rm tp* $ proc code=cpp lines=yes dba_vm.pc Pro*C/C++: Release 10.2.0.1.0 - Production on Wed Sep 30 11:10:00 2009 Copyright (c) 1982, 2005, Oracle. All rights reserved. System default option values taken from: /usr/local/oracle/10.2.01/db_1/precomp/admin/pcscfg.cfg $ echo $? 0 $ ls tp* tpoHRjc8 tpqkU4Cp tpY5Eo4G $
Today we learn this: for every successful compilation of a single source file, it is a Good Thing to leave three temporary files around as a tribute to the Holy Trinity who bestowed upon us the grace of seeing our prayers fulfilled and our work rewarded.
For those illiterate of us who do not want to learn from the Market Leaders,
and do not want to put a rm -f tp* in their makefiles just to avoid in the
future to see hours of work on tpl_support.cc disappear at the first
invocation of make, here's the Yokel's Wrapper:
#!/bin/sh -ue if [ $# -ne 2 ] then echo "Usage: $0 infile outfile" >&2 exit 1 fi DIR=`mktemp -d` cleanup() { rm -rf "$DIR" } trap cleanup EXIT cp "$1" "$DIR" ( cd $DIR proc code=cpp lines=yes "$1" mv `basename "$1" .pc`.c "$2" ) mv "$DIR/$2" . exit 0
Getting dbus signatures right from Vala
I am trying to play a bit with Vala on the FreeRunner.
The freesmartphone.org stack on the OpenMoko is heavily based on DBus. Using DBus from Vala is rather simple, if mostly undocumented: you get a few examples in the Vala wiki and you make do with those.
All works fine with simple methods. But what with providing callbacks to
signals that have complex nested structures in their signatures, like aa{sv}?
You try, and then if you don't get the method signature right, the signal is
just silently not delivered because it does not match the method signature.
So this is how to provide a callback to
org.freesmartphone.Usage.ResourceChanged, with signature sba{sv}:
public void on_resourcechanged(dynamic DBus.Object pos, string name, bool state, HashTable<string, Value?> attributes) { stderr.printf("Resource %s changed\n", name); }
And this is how to provide a callback to
org.freesmartphone.GPS.UBX.DebugPacket, with signature siaa{sv}:
protected void on_ubxdebug_packet(dynamic DBus.Object ubx, string clid, int length, HashTable<string, Value?>[] wrongdata) { stderr.printf("Received UBX debug packet"); // Ugly ugly work-around PtrArray< HashTable<string, Value?> >* data = (PtrArray< HashTable<string, Value?> >)wrongdata; stderr.printf("%u elements received", data->len); }
What is happening here is that the only method signature that I found matching the dbus signature is this one. However, the unmarshaller for some reason gets it wrong, and passes a PtrArray instead of a HashTable array. So you need to cast it back to what you've actually been passed.
Figuring all this out took several long hours and was definitely not fun.
Creating pipelines with subprocess
It is possible to create process pipelines using subprocess.Popen, by just
using stdout=subprocess.PIPE and stdin=otherproc.stdout.
Almost.
In a pipeline created in this way, the stdout of all processes except the last is opened twice: once in the script that has run the subprocess and another time in the standard input of the next process in the pipeline.
This is a problem because if a process closes its stdin, the previous process
in the pipeline does not get SIGPIPE when trying to write to its stdout,
because that pipe is still open on the caller process. If this happens, a wait
on that process will hang forever: the child process waits for the parent to
read its stdout, the parent process waits for the child process to exit.
The trick is to close the stdout of each process in the pipeline except the last just after creating them:
#!/usr/bin/python # coding=utf-8 import subprocess def pipe(*args): ''' Takes as parameters several dicts, each with the same parameters passed to popen. Runs the various processes in a pipeline, connecting the stdout of every process except the last with the stdin of the next process. ''' if len(args) < 2: raise ValueError, "pipe needs at least 2 processes" # Set stdout=PIPE in every subprocess except the last for i in args[:-1]: i["stdout"] = subprocess.PIPE # Runs all subprocesses connecting stdins and stdouts to create the # pipeline. Closes stdouts to avoid deadlocks. popens = [subprocess.Popen(**args[0])] for i in range(1,len(args)): args[i]["stdin"] = popens[i-1].stdout popens.append(subprocess.Popen(**args[i])) popens[i-1].stdout.close() # Returns the array of subprocesses just created return popens
At this point, it's nice to write a function that waits for the whole pipeline to terminate and returns an array of result codes:
def pipe_wait(popens): ''' Given an array of Popen objects returned by the pipe method, wait for all processes to terminate and return the array with their return values. ''' results = [0] * len(popens) while popens: last = popens.pop(-1) results[len(popens)] = last.wait() return results
And, look and behold, we can now easily run a pipeline and get the return codes of every single process in it:
process1 = dict(args='sleep 1; grep line2 testfile', shell=True) process2 = dict(args='awk \'{print $3}\'', shell=True) process3 = dict(args='true', shell=True) popens = pipe(process1, process2, process3) result = pipe_wait(popens) print result
Tips on using python's datetime module
Python's datetime module is one of those bits of code that tends not to do what one would expect them to do.
I have come to adopt some extra usage guidelines in order to preserve my sanity:
- Avoid using
str(datetime_object)orisoformatto serialize a datetime: there is no function in the library that can parse all its possible outputs datetime.strptimesilently throws away all timezone information. If you look very closely, it even says so in its documentation- Timezones do not exist, all datetime objects have to be naive. aware means broken.
- datetime objects must always contain UTC information
datetime.now()is never to be used. Always usedatetime.utcnow()- Be careful of 3rd party python modules: people have a dangerous tendency to
use
datetime.now() - If a conversion to some local time is needed, it shall be done via either
some ugly thing like
time.localtime(int(dt.strftime("%s")))or via the pytz module - pytz must be used directly, and never via timezone aware datetime objects, because datetime objects fail in querying pytz:
That’s right, the datetime object created by a call to datetime.datetime constructor now seems to think that Finland uses the ancient “Helsinki Mean Time” which was obsoleted in the 1920s. The reason for this behaviour is clearly documented on the pytz page: it seems the Python datetime implementation never asks the tzinfo object what the offset to UTC on the given date would be. And without knowing it pytz seems to default to the first historical definition. Now, some of you fellow readers could insist on the problem going away simply by defaulting to the latest time zone definition. However, the problem would still persist: For example, Venezuela switched to GMT-04:30 on 9th December, 2007, causing the datetime objects representing dates either before, or after the change to become invalid.
- Timezone-aware datetime objects have other bugs: for example, they fail to compute Unix timestamps correctly. The following example shows two timezone-aware objects that represent the same instant but produce two different timestamps.
>>> import datetime as dt >>> import pytz >>> utc = pytz.timezone("UTC") >>> italy = pytz.timezone("Europe/Rome") >>> a = dt.datetime(2008, 7, 6, 5, 4, 3, tzinfo=utc) >>> b = a.astimezone(italy) >>> str(a) '2008-07-06 05:04:03+00:00' >>> a.strftime("%s") '1215291843' >>> str(b) '2008-07-06 07:04:03+02:00' >>> b.strftime("%s") '1215299043'
Python versione Bignami
Tipi di dati
Classici:
- Nessun valore:
None. - Valori logici
- Numeri
- Stringhe
Funzioni e classi:
Contenitori:
Classi piú comuni:
Operazioni
- Scrivere un sorgente python
- Gestione errori
- Confronti
- Iterazione
- Lavorare con le date
- Input e print
- Eseguire programmi
- Scrivere test
- Funzioni tipo shellscript
- Collegarsi a un database SQL
- Chiamare routine Fortran
Moduli extra
Link
Mapping using the Openmoko FreeRunner headset
The FreeRunner has a headset which includes a microphone and a button. When doing OpenStreetMap mapping, it would be very useful to be able to keep tangogps on the display and be able to mark waypoints using the headset button, and to record an audio track using the headset microphone.
In this way, I can use tangogps to see where I need to go, where it's already mapped and where it isn't, and then I can use the headset to mark waypoints corresponding to the audio track, so that later I can take advantage of JOSM's audio mapping features.
Enter audiomap:
$ audiomap --help
Usage: audiomap [options]
Create a GPX and audio trackFind the times in the wav file when there is clear
voice among the noise
Options:
--version show program's version number and exit
-h, --help show this help message and exit
-v, --verbose verbose mode
-m, --monitor only keep the GPS on and monitor satellite status
-l, --levels only show input levels
If called without parameters, or with -v which is suggested, it will:
- Fix the mixer settings so that it can record from the headset and detect headset button presses.
- Show a monitor of GPS satellite information until it gets a fix.
- Synchronize the system time with the GPS time so that the timestamps of the files that are created afterwards are accurate.
- Start recording a GPX track.
- Start recording audio.
- Record a GPX waypoint for every headset button press.
When you are done, you stop audiomap with ^C and it will properly close the
.wav file, close the tags in the GPX waypoint and track files and restore the
mixer settings.
You can plug the headset out and record using the handset microphone, but then you will not be able to set waypoints until you plug the headset back in.
After you stop audiomap, you will have a track, waypoints and .wav file
ready to be loaded in JOSM.
Big thanks go to Luca Capello for finding out how to detect headset button presses.