## 2013-12-01

### Binary search in line-sorted text files

This blog post announces pts-line-bisect, a C program and an equivalent Python script and library for doing binary seach in line-sorted text files, and it also explains some of the design details of the Python implementation.

Let's suppose you have a sorted text file, in which each line is lexicographically larger than the previous one, and you want to find a specific range of lines, or all lines with a specific prefix.

I've written the program pts-line-bisect for that recently. I've also written the article titled Evolution of a binary search implementation for line-sorted text files about the topic, containing the problem statement, the possible pitfalls, an analysis of some incorrect solutions available on the web as code example, a detailed specification and explanation of my solution (including a proof), disk seek and speed analysis, a set of speedup ideas and their implementation, and further notes about the speedups in the C implementation.

As a teaser, here is an incorrect solution in Python:

def bisect_left_incorrect(f, x):
"""... Warning: Incorrect implementation with corner case bugs!"""
x = x.rstrip('\n')
f.seek(0, 2)  # Seek to EOF.
lo, hi = 0, f.tell()
while lo < hi:
mid = (lo + hi) >> 1
f.seek(mid)
f.readline()  # Ignore previous line, find our line.
hi = mid
else:
lo = mid + 1
return lo

Can you spot the all the 3 bugs?

Read the article for all the details and the solution.

As a reference, here is the correct implementation of the same algorithm for finding the start of the interval in a sorted list or other sequences (based on the bisect module):

def bisect_left(a, x, lo=0, hi=None):
"""Return the index where to insert item x in list a, assuming a is sorted.

The return value i is such that all e in a[:i] have e < x, and all e in
a[i:] have e >= x.  So if x already appears in the list, a.insert(x) will
insert just before the leftmost x already there.

Optional args lo (default 0) and hi (default len(a)) bound the
slice of a to be searched.
"""
if lo < 0:
raise ValueError('lo must be non-negative')
if hi is None:
hi = len(a)
while lo < hi:
mid = (lo + hi) >> 1
if x <= a[mid]:  # Change <=' to <', and you get bisect_right.
hi = mid
else:
lo = mid + 1
return lo

A typical real-word use case for such a binary search tool is retrieving lines corresponding to a particular time range in log files (or time based measurement records). These files text files with variable-length lines, with the log timestamp in the beginning of the line, and they are generated in increasing timestamp order. Unfortunately the lines are not lexicographically sorted, so the timestamp has to be decoded first for the comparison. The bsearch tool does that, it also supports parsing arbitrary, user-specifiable datetime formats, and it can binary search in gzip(1)ped files as well (by building an index). It's also of high performance and low overhead, partially because it is written in C++. So bsearch is practical tool with lots of useful features. If you need anything more complicated than a lexicographic binary search, use it instead.

Before getting too excited about binary search, please note that there are much faster alternatives for data on disk. In an external (file-based) binary search the slowest operation is the disk seek needed for each bisection. Most of the software and hardware components would be waiting for the hard disk to move the reading head. (Except, of course, that seek times are negligible when non-spinning storage hardware such as SSD or memory card is used.)

An out-of-the box solution would be adding the data to more disk-efficient key-value store. There are several programs providing such stores. Most of them are based on a B-tree, B*-tree or B+-tree data structure if sorted iteration and range searches have to be supported, or disk-based hashtables otherwise. Some of the high-performance single-machine key-value stores: cdb (read-only), Tokyo Cabinet, Kyoto Cabinet, LevelDB; see more in the NoSQL software list.

The fundamental speed difference between a B-tree search and a binary search in a sorted list stems from the fact that B-trees have a branching factor larger than 2 (possibly 100s or 1000s), thus each seeking step in a B-tree search reduces possible input size by a factor larger than 2, while in a binary search each step reduces the the input size by a factor 2 only (i.e. we keep either the bottom half or the top half). So both kinds of searches are logarithmic, but the base of the logarithm is different, and this causes a constant factor difference in the disk seek count. By careful tunig of the constant in B-trees it's usual to have only 2 or 3 disk seeks for each search even for 100GB of data, while a binary search in a such a large file with 50 bytes per record would need 31 disk seeks. By taking 10ms as the seek time (see more info about typical hard disk seek times), a typical B-tree search takes 0.03 second, and a typical binary search takes 0.31 second.

Have fun binary searching, but don't forget to sort your files first!

## 2013-11-25

### How to apply a macro to all arguments of a C preprocessor macro

This blog post explains how to iterate over the variable number of arguments of a C preprocessor macro, and call another macro for each argument.

Let's assume we want automate generating the following strings:

const char import0[] = "";
const char import1[] = "import os\n";
const char import2[] = "import os\nimport os.path\n";
const char import3[] = "import os\nimport os.path\nimport sys\n";
const char import4[] = "import os\nimport os.path\nimport sys\nimport re\n";
const char import5[] = "import os\nimport os.path\nimport sys\nimport re\n"
"import select\n";
const char import6[] = "import os\nimport os.path\nimport sys\nimport re\n"
"import select\nimport struct\n";

A simple option is to create a macro which uses stringification:

#define IMPORT(n) "import " #n "\n"

const char import0[] = "";
const char import1[] = IMPORT(os);
const char import2[] = IMPORT(os) IMPORT(os.path);
const char import3[] = IMPORT(os) IMPORT(os.path) IMPORT(sys);
const char import4[] = IMPORT(os) IMPORT(os.path) IMPORT(sys) IMPORT(re);
const char import5[] = IMPORT(os) IMPORT(os.path) IMPORT(sys) IMPORT(re)
IMPORT(select);
const char import6[] = IMPORT(os) IMPORT(os.path) IMPORT(sys) IMPORT(re)
IMPORT(select) IMPORT(struct);

Would it be possible to create a variadic macro which would call IMPORT for all its arguments? We want the following to work:

const char import0[] = IMPORT_ALL();
const char import1[] = IMPORT_ALL(os);
const char import2[] = IMPORT_ALL(os, os.path);
const char import3[] = IMPORT_ALL(os, os.path, sys);
const char import4[] = IMPORT_ALL(os, os.path, sys, re);
const char import5[] = IMPORT_ALL(os, os.path, sys, re, select);
const char import6[] = IMPORT_ALL(os, os.path, sys, re, select, struct);

Indeed, it's possible, define the IMPORT_ALL macro like this:

#define IMPORT_ALL(...) "" APPLY_ALL(IMPORT, __VA_ARGS__)

The tricky part is defining the APPLY_ALL macro. Here is a possible implementation which works with gcc and clang, and it supports up to 6 arguments. (It's easy to increase the number of supported arguments.)

#define APPLY0(t, dummy)
#define APPLY1(t, a) t(a)
#define APPLY2(t, a, b) t(a) t(b)
#define APPLY3(t, a, b, c) t(a) t(b) t(c)
#define APPLY4(t, a, b, c, d) t(a) t(b) t(c) t(d)
#define APPLY5(t, a, b, c, d, e) t(a) t(b) t(c) t(d) t(e)
#define APPLY6(t, a, b, c, d, e, f) t(a) t(b) t(c) t(d) t(e) t(f)

#define NUM_ARGS_H1(dummy, x6, x5, x4, x3, x2, x1, x0, ...) x0
#define NUM_ARGS(...) NUM_ARGS_H1(dummy, ##__VA_ARGS__, 6, 5, 4, 3, 2, 1, 0)
#define APPLY_ALL_H3(t, n, ...) APPLY##n(t, __VA_ARGS__)
#define APPLY_ALL_H2(t, n, ...) APPLY_ALL_H3(t, n, __VA_ARGS__)
#define APPLY_ALL(t, ...) APPLY_ALL_H2(t, NUM_ARGS(__VA_ARGS__), __VA_ARGS__)

#define IMPORT_ALL(...) "" APPLY_ALL(IMPORT, __VA_ARGS__)
#define IMPORT(n) "import " #n "\n"

const char import0[] = IMPORT_ALL();
const char import1[] = IMPORT_ALL(os);
const char import2[] = IMPORT_ALL(os, os.path);
const char import3[] = IMPORT_ALL(os, os.path, sys);
const char import4[] = IMPORT_ALL(os, os.path, sys, re);
const char import5[] = IMPORT_ALL(os, os.path, sys, re, select);
const char import6[] = IMPORT_ALL(os, os.path, sys, re, select, struct);

See http://stackoverflow.com/questions/2632300/looping-through-macro-varargs-values and http://stackoverflow.com/questions/824639/variadic-recursive-preprocessor-macros-is-it-possible for inspiration and a more detailed explanation on NUM_ARGS and C macros with a variable number of arguments.

## 2013-11-11

### How to create Git commits dated based on last-modification time of involved files

This blog post explains how to create Git commits dated based on last-modification time of involved files.

git commit creates the commit with the date taken from the current system date. Sometimes, especially if old versions of files are added to git, it's preferable to create a commit whose date is based on last-modification time of involved (i.e. non-deleted) files. There is no built-in Git command for that. As a workaround, it's possible to specify git commit --date=... ... manually, but it would be nicer if the time was autodetected as the maximum of the last-modification times of the files involved.

$echo 'puts [tk inactive]; exit' | wish If you need more information about the screen saver state or you need better accuracy (milliseconds), then compile and run the following C program: #define DUMMY \ set -ex; \ gcc -s -O2 -W -Wall -o \ get_x11_screen_saver_info get_x11_screen_saver_info.c \ -L/usr/X11R6/lib -lX11 -lXext -lXss; \ exit 2 /* * get_x11_screen_saver_info: Get and print X11 screen saver state and info * by pts@fazekas.hu at Wed Oct 30 08:28:35 CET 2013 */ #include <stdio.h> #include <unistd.h> #include <X11/Xlib.h> #include <X11/Xutil.h> #include <X11/extensions/scrnsaver.h> int main(int argc, char **argv) { static XScreenSaverInfo *info; Display *display; int screen; (void)argc; (void)argv; info = XScreenSaverAllocInfo(); if ((display=XOpenDisplay(NULL)) == NULL) { fprintf(stderr, "error: can't open display\n"); return 2; } screen = DefaultScreen(display); if (!XScreenSaverQueryInfo(display, RootWindow(display, screen), info)) { fprintf(stderr, "error: cannot query screen saver info\n"); return 3; } printf("idle for %lu ms\n", (unsigned long)info->idle); if (info->state == ScreenSaverDisabled) { printf("state disabled\n"); } else if (info->state == ScreenSaverOn) { printf("state on for %lu ms\n", (unsigned long)info->til_or_since); } else if (info->state == ScreenSaverOff) { printf("state off until %lu ms\n", (unsigned long)info->til_or_since); } else { printf("state unknown\n"); } if (info->kind == ScreenSaverBlanked) { printf("kind blanked\n"); } else if (info->kind == ScreenSaverInternal) { printf("kind internal\n"); } else if (info->kind == ScreenSaverExternal) { printf("kind external\n"); } else { printf("kind unknown\n"); } XFree(info); XCloseDisplay(display); return 0; } Please note that it's possible to affect the screen saver with the xset s ... command, and it's possible to affect the monitor power saving (sleep) with the xset dpms ... command. There are no such widely available commands for querying the screen saver state. The xset ... commands work differently for different display drivers. You may want to do a sleep .1 && xset ... to put your monitor to sleep right away. ## 2013-10-23 ### How to fix ugly text when printing a Google document This blog post explains how to fix the ugliness of letter spacing in text printed from a Google document (text document, presentation etc.). An example ugly output (some letters are too close to each other): To fix it, export the document to PDF (in the File / Download as / PDF menu item), and print the PDF. Don't choose File / Print / Save as PDF, that produces the ugly output above. ## 2013-10-21 ### How to start a Unix process in the foreground instead This blog post explains how to run a Unix process in the foreground if it automatically puts itself into the background. TL;DR Run it like this: $ prog 3>&1 | cat

Let's suppose you have two shell scripts:

$cat >fore.sh <<'END' #! /bin/sh echo foo; sleep 1; echo bar END$ chmod +x fore.sh
$cat >back.sh <<'END' #! /bin/sh (echo foo; sleep 1; echo bar) & END$ _

When running ./fore.sh you have to wait a second for it to finish, because it runs in the foreground. If you want to run it in the background, run it as: ./fore.sh & . You'll get back your prompt instantly, and you'll get the bar message one second later. The & is a standard Bourne shell feature to start processes in the background.

Doing the other way round is a bit trickier. When running ./back.sh, you get your prompt back almost instantly, because the sleeping happens in the background. There is also a race condition: sometimes you get foo first, and then the prompt, sometimes the other way round, depending on whether the kernel schedules the interactive shells or the back.sh script first. If you want to run a command in the foreground, but you don't want to change it in the implementation, a simple solution is this:

$./back.sh | cat foo bar$ ./fore.sh | cat
foo
bar
$_ So appending | cat works no matter the program wants to run in the foreground or in the background. The reason why it works is that you don't get your prompt back until cat has finished, and | cat waits for an EOF on the program's stdout, and there is no EOF on a pipe until all processes writing to that pipe have closed it, i.e. (most of the time) until the entire program terminates. The | cat  solution above breaks if the program is smart and closes its stdout early. In this case you get the prompt back immediately. For example, below bar is displayed only about a second after the$ in front of it was displayed.

$cat >smart.sh <<'END' #! /bin/sh (exec >/dev/null; echo foo >&2; exec sh -c 'sleep 1; echo bar >&2') & END$ ./smart.sh | cat
foo
$bar _  To make it work for even such a program (in addition to back.sh and fore.sh), run it like this: $ ./smart.sh 3>&1 | cat

The reason why this works is that 3>&1 instructs the shell to make a copy of the file descriptor 1 (stdout) as file descriptor 3. (Any number between 3 and 9 would do. For smarter shells numbers larger than 9 also work.) So even when the program explicitly closes (or redirects) its own stdout, it will still have file descriptor 3 open until it exits, thus preventing the EOF cat is waiting for, thus making the shell continue waiting for cat, thus not giving back the prompt. Most programs tend not to bother explicitly closing file descriptors larger than 2. The kernel closes file descriptors with the close-on-exec bit automatically closed upon an exec* call. Fortunately that doesn't apply to our file descriptor 3, because the shell has created it using the dup2 system call, which creates file descriptor 3 with the close-on-exec flag off.

## 2013-08-18

### Non-recursive, fast, compact stable sort implementation in C++

This blog post discusses stable sort implementations in various languages, and it also presents a non-recursive, fast and compact mergesort implementation in C++. The implementation runs the original mergesort algorithm in an iterative way with some optimizations, so it is stable.

Please note that C++ STL has a stable sort built in: std::stable_sort in #include <algorithm>. If it's OK to use STL in your C++ code, use that, and you won't need any custom code from this blog post.

About compactness: when compiled statically with GCC 4.1.2 (g++ -fno-rtti -fno-exceptions -s -O2 -static) for i386, the mergesort in this blog post produces an executable 4084 bytes smaller than with std::stable_sort.

About speed: between 10% and 16% faster than std::stable_sort, see speed measurements below.

### The fast, non-recursive, stable sort implementation in C++

It's a non-recursive, stable mergesort implementation, with O(n*log(n)) worst-case speed.

#include <sys/types.h>
// Stable, non-recursive mergesort algorithm. Sort a[:n], using temporary
// space of size n starting at tmp. (Please note that mergesort Algorithm
// 5.2.4N and Algorithm 5.2.4S from Knuth's TAOCP are not stable. Please
// note that most implementations of mergesort found online are recursive,
// thus slower.)
//
// by pts@fazekas.hu at Sun Aug 18 19:23:36 CEST 2013
template <typename T, typename IsLess>
void mergesort(T *a, size_t n, T *tmp, IsLess is_less) {
for (size_t f = 1; f < n; f += 2) {  // Unfold the first pass for speedup.
if (is_less(a[f], a[f - 1])) {
T t = a[f];
a[f] = a[f - 1];
a[f - 1] = t;
}
}
bool s = false;
for (size_t p = 2; p != 0 && p < n; p <<= 1, s = !s) {
// Now all sublists of p are already sorted.
T *z = tmp;
for (size_t i = 0; i < n; i += p << 1) {
T *x = a + i;
T *y = x + p;
size_t xn = p < n - i ? p : n - i;
size_t yn = (p << 1) < n - i ? p : p < n - i ? n - p - i : 0;
if (xn > 0 && yn > 0 &&
is_less(*y, x[xn - 1])) {  // Optimization (S), Java 1.6 also has it.
for (;;) {
if (is_less(*y, *x)) {
*z++ = *y++;
if (--yn == 0) break;
} else {
*z++ = *x++;
if (--xn == 0) break;
}
}
}
while (xn > 0) {  // Copy from *x first because of (S).
*z++ = *x++;
--xn;
}
while (yn > 0) {
*z++ = *y++;
--yn;
}
}
z = a; a = tmp; tmp = z;
}
if (s) {  // Copy from tmp to result.
for (T *x = tmp, *y = a, * const x_end = tmp + n; x != x_end; ++x, ++y) {
*x = *y;
}
}
}

Some convenience functions to call it:

// Stable, non-recursive mergesort.
// To sort vector a', call mergesort(a.data(), a.data() + a.size(), is_less)'
// or use the convenience function below.
template <typename T, typename IsLess>
void mergesort(T *a, T *a_end, IsLess is_less) {
const size_t n = a_end - a;
if (n < 2) return;
// Creating ptr_deleter so tmp will be deleted even if is_less or
// mergesort throws an exception.
struct ptr_deleter {
ptr_deleter(T *p): p_(p) {}
~ptr_deleter() { delete p_; }
T *p_;
} tmp(new T[n]);
mergesort(a, n, tmp.p_, is_less);
}

// Convenience function to sort a range in a vector.
template <typename T, typename IsLess>
// Doesn't match:
//static void mergesort(const typename std::vector<T>::iterator &begin,
//                      const typename std::vector<T>::iterator &end,
//static void mergesort(const typename T::iterator &begin,
//                      const typename T::iterator &end,
void mergesort(const T &begin, const T &end, IsLess is_less) {
mergesort(&*begin, &*end, is_less);
}

// Stable, non-recursive mergesort.
// Resizes v to double size temporarily, and then changes it back. Memory may
// be wasted in b after the call because of that.
template <typename T, typename IsLess>
void mergesort_consecutive(std::vector<T> *v, IsLess is_less) {
const size_t n = v->size();
if (n < 2) return;
v->resize(n << 1);  // Allocate temporary space.
mergesort(v->data(), n, v->data() + n, is_less);
v->resize(n);
}

Example invocation:

inline bool int_mask_less(int a, int b) {
return (a & 15) < (b & 15);
}
...
std::vector<int> a;
...
mergesort(a.begin(), a.end(), int_mask_less);

### A slow, non-recursive stable sort implementation in C++

If you don't care about speed, an insertion sort would do: it's simple (short), stable and non-recursive. Unfortunately it's slow: O(n2) in worst case and average.

// Insertion sort: stable but slow (O(n^2)). Use mergesort instead if you need
// a stable sort.
template <typename T, typename IsLess>
static void insertion_sort(T *a, T *a_end, IsLess is_less) {
const size_t n = a_end - a;
for (size_t i = 1; i < n; ++i) {
if (is_less(a[i], a[i - 1])) {
T t = a[i];
size_t j = i - 1;  // TODO: Use pointers instead of indexes.
while (j > 0 && is_less(t, a[j - 1])) {
--j;
}
for (size_t k = i; k > j; --k) {
a[k] = a[k - 1];
}
a[j] = t;
}
}
}

// Convenience function to sort a range in a vector.
template <typename T, typename IsLess>
static void insertion_sort(const T &begin, const T &end, IsLess is_less) {
insertion_sort(&*begin, &*end, is_less);
}

### C++ stable sort speed measurements

10000 random int vectors (of random size smaller than 16384) were sorted in place in a C++ program running on Linux 2.6.35 running on an Intel Core 2 Duo CPU T6600 @ 2.20GHz. The total user time was measured, including the sort, the running of std::stable_sort, and the verification of the sort output (i.e. comparing it to the output of std::stable_sort. The insmax parameter below indicates the size of the sublists with were sorted using insertion sort before running the mergesort (the implementation above uses insmax=2, which is implemented in the Unfold... loop).

The raw time measurement results:

mask= 15  insmax= 1          18.805s

mask=255  insertion_sort    530.27s

It looks like that the mergesort algorithm proposed in this blog post (insmax=2 in the measurements above) is between 10% and 16% faster than std::stable_sort. It looks like that increasing insmax from 2 to 8 could yield an additional 1.18% speed increase.

### Stable sort in other programming languages

• Java java.util.Collections.sort and java.util.Arrays.sort are stable. In OpenJDK 1.6, for primitive types, it uses an optimized (tuned) quicksort, and for objects (with a comparator) it uses a recursive, stable mergesort. Even though quicksort is not stable, this doesn't matter for primitive types, because there is no comparator (so it's always in ascending order), and it's impossible to distinguish different instances of the same primitive type value.
• C qsort function is not stable, because it uses quicksort. FreeBSD, Mac OS X and other BSD systems have the mergesort function in their standard library (in libc, stdlib/merge.c), Linux glibc doesn't have it. As an alternative, it's straightforward to convert the mergesort implementation in this blog post from C++ to C.
• CPython's sorted and list.sort functions are implemented with a stable, recursive mergesort (and binary insertion sort for short inputs) with a sophisticated (complicated, long) code containing lots of optimizations: see the files Objects/listsort.txt and Objects/listobject.c in the Python source code for documentation and implementation. It looks like it's not recursive.
• Perl sort uses mergesort by default, so it's stable, but the default can change in future versions. Add use sort 'stable'; to the beginning of your code to get a guaranteed stable sort. See the sort pragma page for details.
• Ruby 2.0 Array#sort doesn't say if it's stable or not, but a quick search for the string qsort in array.c reveals that it uses quicksort, so it's not stable. (a bit slow) stable sorting can be added easily:
class Array
def stable_sort
n = 0
sort_by {|x| n+= 1; [x, n]}
end
end
... as explained here. The fundamental idea in this trick is to compare the indexes if the values are equal. This fundamental idea can be implemented in any programming language, although it may have considerable memory and/or time overhead.
• JavaScript's Array.prototype.sort tends to be stable in modern browsers (see the details), but unstable on old browsers.
• C# Array.Sort is not stable (because it uses quicksort), but Enumerable.OrderBy and Enumberable.ThenBy are stable (because they use mergesort. See more info here.

### Other stable sorting algorithms

Heapsort and quicksort are not stable. Some variations of mergesort (such as Algorithm 5.2.4N and Algorithm 5.2.4S in The Art of Computer programming, Volume 3 by Knuth) are not stable.

This page lists stable sorting algorithms known to mankind. Be careful with the implementations though: even if the algorithm itself is stable, some optimized, otherwise modified or buggy implementations might not be. The algorithms are:

• O(n*log(n)), stable, comparison-based: mergesort, cascade merge sort, oscillating merge sort.
• O(n2), stable, comparison-based: bubble sort, cocktail sort, insertion sort, binary insertion sort (where the insertion index is found using binary search), gnome sort, library sort, odd-even sort.
• O(n2), stable, not comparison-based: bucket sort, proxmap sort.
• faster than O(n2), stable, not comparison-based: counting sort, pigeonhole sort, radix sort.

So it looks like that we don't know of an alternative of mergesort for comparison-based sorting in O(n*log(n)) time, the others above are slower. Maybe Edsger Dijkstra[citation needed] can come up with an alternative.

## 2013-08-13

### How to send Unix file descriptors between processes?

This blog post explains how to send Unix file descriptors between processes. The example shown below sends from parent to child, but the general idea works the other way round as well, and it works even between unrelated processes.

As a dependency, the processes need to have the opposite ends of a Unix domain socket already. Once they have it, they can use sendfd to send any file descriptor over the existing Unix domain socket. Do it like this in Python:

#! /usr/bin/python

"""sendfd_demo.py: How to send Unix file descriptors between processes

for Python 2.x at Tue Aug 13 16:07:29 CEST 2013
"""

import _multiprocessing
import os
import socket
import sys
import traceback

def run_in_child(f, *args):
pid = os.fork()
if not pid:  # Child.
try:
child(*args)
except:
traceback.print_exc()
os._exit(1)
finally:
os._exit(0)
return pid

def child(sb):
assert sb.recv(1) == 'X'
# Be careful: there is no EOF handling in
# _multiprocessing.recvfd (because it's buggy). On EOF, it
# returns an arbitrary file descriptor number. We work it around by doing
# a regular sendall--recv pair first, so if there is an obvious reason for
# failure, they will fail properly.
f = os.fdopen(_multiprocessing.recvfd(sb.fileno()))
print f.tell()  # The file size.

def main(argv):
sa, sb = socket.socketpair(socket.AF_UNIX, socket.SOCK_STREAM)
pid = run_in_child(child, sb)
# We open f after creating the child process, so it could not inherit f.
f = open('/etc/hosts')
sa.sendall('X')
_multiprocessing.sendfd(sa.fileno(), f.fileno())  # Send f to child.
assert (pid, 0) == os.waitpid(pid, 0)
# The file descriptor is shared between us and the child, so we get the file
# position where the child has left off (i.e. at the end).
print f.tell()
print 'Parent done.'

if __name__ == '__main__':
sys.exit(main(sys.argv))

It works even for unrelated processes. To try it, run the following program with a command-line argument in a terminal window (it will start the server), and in another terminal window run it several times without arguments (it will run the client). Each time the client is run, it connects to the server, and sends a file descriptor to it, which the server reads. The program:

#! /usr/bin/python2.7

"""sendfd_unrelated.py: Send Unix file descriptors between unrelated processes.

for Python 2.x at Tue Aug 13 16:26:27 CEST 2013
"""

import _multiprocessing
import errno
import os
import socket
import stat
import sys
import time

SOCKET_NAME = '/tmp/sendfd_socket'

def server():
print 'Server.'
ssock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
while 1:
try:
ssock.bind(SOCKET_NAME)
break
except socket.error, e:
raise
# TODO: Fail if the old server is still running.
print 'Removing old socket.'

ssock.listen(16)
while 1:
print 'Accepting.'
print 'Accepted.'
assert sock.recv(1) == 'X'
f = os.fdopen(_multiprocessing.recvfd(sock.fileno()))
print f.tell()  # The file size.
del sock, f

def client():
print 'Client.'
sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
while 1:
try:
sock.connect(SOCKET_NAME)
break
except socket.error, e:
if e[0] not in (errno.ENOENT, errno.ECONNREFUSED):
raise
print 'Server not listening, trying again.'
time.sleep(1)
print 'Connected.'
f = open('/etc/hosts')
sock.sendall('X')
_multiprocessing.sendfd(sock.fileno(), f.fileno())  # Send f to server.
assert '' == sock.recv(1)  # Wait for the server to close the connection.
del sock
print f.tell()

def main(argv):
if len(argv) > 1:
server()
else:
client()
print 'Done.'

if __name__ == '__main__':
sys.exit(main(sys.argv))

There is no system call named sendfd or recvfd. They are implemented in terms of the sendmsg and recvmsg system calls. Passing the correct arguments is tricky and a bit awkward. Here is how to do it in C or C++:

#include <string.h>
#include <sys/socket.h>

int sendfd(int unix_domain_socket_fd, int fd) {
char dummy_char = 0;
char buf[CMSG_SPACE(sizeof(int))];
struct msghdr msg;
struct iovec dummy_iov;
struct cmsghdr *cmsg;
memset(&msg, 0, sizeof msg);
dummy_iov.iov_base = &dummy_char;
dummy_iov.iov_len = 1;
msg.msg_control = buf;
msg.msg_controllen = sizeof(buf);
msg.msg_iov = &dummy_iov;
msg.msg_iovlen = 1;
cmsg = CMSG_FIRSTHDR(&msg);
cmsg->cmsg_level = SOL_SOCKET;
cmsg->cmsg_type = SCM_RIGHTS;
cmsg->cmsg_len = CMSG_LEN(sizeof(int));
msg.msg_controllen = cmsg->cmsg_len;
memcpy(CMSG_DATA(cmsg), &fd, sizeof fd);  /* int. */
return sendmsg(unix_domain_socket_fd, &msg, 0);  /* I/O error unless 1. */
}

int recvfd(int unix_domain_socket_fd) {
int res;
char dummy_char;
char buf[CMSG_SPACE(sizeof(int))];
struct msghdr msg;
struct iovec dummy_iov;
struct cmsghdr *cmsg;
memset(&msg, 0, sizeof msg);
dummy_iov.iov_base = &dummy_char;
dummy_iov.iov_len = 1;
msg.msg_control = buf;
msg.msg_controllen = sizeof(buf);
msg.msg_iov = &dummy_iov;
msg.msg_iovlen = 1;
cmsg = CMSG_FIRSTHDR(&msg);
cmsg->cmsg_level = SOL_SOCKET;
cmsg->cmsg_type = SCM_RIGHTS;
cmsg->cmsg_len = CMSG_LEN(sizeof(int));
msg.msg_controllen = cmsg->cmsg_len;
res = recvmsg(unix_domain_socket_fd, &msg, 0);
if (res == 0) return -2;  /* EOF. */
if (res < 0) return -1;  /* I/O error. */
memcpy(&res, CMSG_DATA(cmsg), sizeof res);  /* int. */
return res;  /* OK, new file descriptor is returned. */
}

## 2013-08-08

### webfind: How many programming languages can you speak ... at the same time?

When I wrote the following code which compiles and/or executes in 10 different programming languages (C, C++, Perl, TeX, LaTeX, PostScript, sh, bash, zsh and Prolog) and prints a different message in each of them, I think it was cool.

%:/*:if 0;"true" +s ||true<</;#|+q|*/include<stdio.h>/*\_/
{\if(%)}newpath/Times-Roman findfont 20 scalefont setfont(
%%)pop 72 72 moveto(Just another PostScript hacker,)show((
t)}. t:-write('Just another Prolog hacker,'),nl,halt. :-t.
:-initialization(t). end_of_file. %)pop pop showpage(-: */
int main(){return 0&printf("Just another C%s hacker,\n",1%
sizeof'2'*2+"++");}/*\fi}\csname @gobble\endcsname{\egroup
\let\LaTeX\TeX\ifx}\if00\documentclass{article}\begin{doc%
ument}\fi Just another \LaTeX\ hacker,\end{document}|if 0;
/(J.*)\$sh(.*)"/,print"$1Perl$2$/"if$_.=q # hack the lang! / sh=sh;test$BASH_VERSION &&sh=bash;test $POSIXLY_CORRECT&& sh=sh;test$ZSH_VERSION && sh=zsh;awk 'BEGIN{x="%c[A%c[K"
printf(x,27,27)}';echo "Just another $sh hacker," #)pop%*/ But now I know that the source of this is the the coolest ever (poly.html). I could make it work in 22 languages without changing the original file. Now, that's a feat, congratulations to the author! Good luck for the autodetect feature of your smart text editor to figure out the highlighting. The name of some languages is not explicitly mentioned in the file. Can you find them? There is another impressive one polyglot with 8 languages: COBOL, Pascal, Fortran, C, PostScript, sh, DOS .com file, Perl. ### webfind: Bootstrapping a simple compiler from nothing This document is a case study of writing and bootstrapping a compiler for a simple (but structured) language from scratch. It targets i386 Linux. It starts from a single hex-to-binary converter, and repeats this many times: write a little compiler for a little bit smarter language in the old language, compile it, then rewrite it to the new language. After about a dozen iterations a compiler for a structured language is created. See also the source code. Done in 2001. ### webfind: The smallest compiler (it compiles Brainfuck) I've found the smallest compiler so far. It's a Brainfuck compiler written in x86 assembly, and it compiles to an i386 Linux binary. Downloaded it from here and play with it on Linux (i386 or amd64): $ sudo apt-get install nasm
$wget -O bf.asm http://www.muppetlabs.com/~breadbox/software/tiny/bf.asm.txt$ nasm -f bin -o bf bf.asm && chmod +x bf
$cat >hello_world.bf <<'END' >+++++++++[<++++++++>-]<.>+++++++[<++++>-]<+.+++++++..+++.[-] >++++++++[<++++>-] <.>+++++++++++[<++++++++>-]<-.--------.+++ .------.--------.[-]>++++++++[<++++>- ]<+.[-]++++++++++. END$ ./bf hello_world && chmod +x hello_world
$./hello_world Hello world!  A 16-bit DOS version, which is even shorter (135 bytes) can be found here. There is a challange here for implementing Brainfuck interpreters, and the best so far is 106 bytes in Perl, but unfortunately the source code is not available. There is also a 98-byte interpreter for 16-bit DOS here. See a 160-byte (source code) interpreter in C here. Here is a similar page listing short interpreters for a small, Turing-complete language. ## 2013-05-22 ### How to create a PDF file with a single hyperlink in it This blog post explains how to create a PDF file with a single hyperlink in it. Such PDF files are useful as hyperlinks uploaded to prezi.com presentations, because as of 2014-05-22 prezi.com doesn't support hyperlinks with text different from the link target URL: it only supports hyperlinks whose URL appear visually in the presentation. By uploading a PDF with a hyperlink this limitation can be circumvented. Use this Python script to generate the PDF with a hyperlink. It will create a single page PDF containing green triangle with a hyperlink pointing to the URL specified in the command-line. Example usage on Linux: $ wget -O linkpdfgen.py \
$chmod +x linkpdfgen.py$ ./linkpdfgen.py http://example.org/ output.pdf`