This blog post is to announce uevalrun.
uevalrun is a self-contained computation sandbox for Linux, using
User-mode Linux for both compilation and execution of the program to be
sandboxed. The program can be written in C, C++, Python, Ruby, Perl or PHP.
uevanrun enforces memory limits, timeouts and output size limits in the
sandbox. The primary use case for uevalrun is evaluation of solution
programs submitted by contestants of programming contests: uevalrun compiles
the solution, runs it with the test input, compares its output against
the expected output, and writes a status report.
For your convenience, here is a (non-updated) copy of the documenation of uevalrun. See project home page for the most up-to-date version.
Installation
A 32-bit (x86, i386) or 64-bit (x86_64, amd64) Linux system is required
with enough RAM for both compilation and execution, plus 6 MB memory
overhead. The Linux distribution doesn't matter, because uevalrun uses
statically linked binaries, and it's self-contained: it contains all the
tools it needs, including the C and C++ compilers (from the uClibc gcc-4.1.2
toolchain).
uevalrun doesn't need root privileges: it runs as a simple user.
uevalrun needs about 160MB of disk space, most of which is consumed by
the C compiler (31MB extracted + 17MB compressed), the scripting language
interpreters (9MB compressed), and the virtual disk images (77MB,
uncompressed). After removing all temporary files, 80MB will be enough.
Download and compile:
$ svn co http://pts-mini-gpl.googlecode.com/svn/trunk/uevalrun uevalrun
$ cd uevalrun
$ ./make # This downloads some more files during compilation.
Try it (the italic parts are displayed by the program):
$ (echo '#! perl'; echo 'print "Hello"x2, "\n"') >foo.pl
$ echo HelloHello >expected.txt
$ ./uevalrun -M 10 -T 9 -E 9 -s foo.pl -t /dev/null -e expected.txt
...
@ result: pass
$ echo '/**/ int main() { return!printf("HelloHello\n"); }' >foo.c
$ ./uevalrun -M 10 -T 9 -E 9 -U 19 -N 32 -C 9 \
-s foo.c -t /dev/null -e expected.txt
...
@ result: pass
How to use
This section has not been written yet. If you have questions, don't hesitate to ask!
Requirements
Security is the most important requirement of uevalrun, followed by high
performance and usability.
The sandboxing must be secure, more specifically:
- Programs inside the sandbox must not be able to communicate with
the outside world, except for their stdin and stdout. So they don't have
access to the filesystem (except for some read-only access to some
whitelisted system libraries and binaries), and they can't do network
I/O. If uevalrun is used for programming contest submission evaluation,
this restriction prevents the program from finding and reading the file
containing the expected output.
- Sandboxed programs must not be able to use more system resources (memory,
disk, CPU) than what was allocated for them.
- Sandboxed programs must not be running for a longer period of time than
what was allocated.
- Even the compilation of programs to be run (executed) inside the sandbox
must be sandboxed (possibly with different system resource allocation),
to prevent the attacker from submitting a program source code which
exploits a bug in the compiler.
- The sandbox must be reverted to its initial state before each compilation
and execution, so sandboxed programs won't be able to gather information
about previous compilations and executions. For programming contests,
this restriction prevents the program from reading other contestants'
submissions or their output.
- Timeouts (both real time and user time) must be enforced outside the
sandbox, so the program will be reliably killed if it runs for too long
time, even if it tampers with time measurement inside the sandbox.
- Sanboxed programs must not be able to lie about their success, their
performance or the correctness of their output by writing special marker
characters to their stdout or stderr.
The sandbox must be fast and waste few resources, more specifically:
- Reverting the sandbox to its initial, empty state must be very fast
(preferably faster than 1 second).
- Starting a program inside the sandbox must be very fast (preferably
faster than 1 second).
- Running a program inside the sandbox must not be much slower than running
the same program outside the sandbox. If the program is CPU-intensive,
it may run 1.1 times slower (i.e. 10% slower) inside than outside. If
the program is I/O-intensive, it may run 10 times slower inside than
outside. This requirement is intentionally quite loose on I/O
virtualization performance.
- Running a program inside the sandbox must not need more than 10MB more
memory than running the same program outside.
- Running a program inside the sandbox must not need any disk space,
except for the disk space needed by the program binary, the test input
and the expected output, all stored as files outside the sandbox.
- Compilation inside the sandbox must require only a reasonable amount of
temporary disk space (for the file to be compiled, the temporary files,
and the output binary).
The sandbox must be easy to use and versatile, more specifically:
- Sandboxed programs must be able to accept any 8-bit binary input (stdin).
- Sandboxed programs must be able to write any 8-bit binary output to
their stdout.
- Multiple sandboxed programs must be able to run at the same time on the
same host system, without affecting each other.
- If a sandboxed program fails (e.g. because it writes to its stdout
different from what was expected, or it does a memory access violation,
it reaches a timeout, it exceeds its allocated memory etc.), the proper
failure reason has to be reported (e.g. ``wrong answer'' must not be
reported if the program times out or vice versa).
- The sandboxing software must have as little few system dependencies as
possible. It must be able to run in a restricted (e.g. chroot)
environment, it must not depend on system libraries or configuration.
It must work on 32-bit (x86, i386) and 64-bit (x86_64, amd64) Linux
systems, on any Linux distirubution.
- Sandboxed programs can be written in C, C++, Python, Ruby, Perl and PHP,
or in any language whose compiler can produce a 32-bit (i386) Linux
statically linked executable binary.
Design
To fullfill the requirements, the following fundamental building blocks
are used.
User-mode Linux
(UML) is used for virutalization: both compilation and execution is
performed in a UML guest, reading data from the host system using virtual
block devices (ubd
), and writing its output to its
/dev/tty1
(con0
), which can be read by a process
on the host. A UML guest kernel (currently 2.6.31.1) tailered to the
sandboxing requirements (security and performance) is configured and
compiled. Some kernel patches are applied to increase security, reliability
and performance. (Please note that these patches apply to the UML guest
kernel only, so the host remains unpatched, and rebooting is not needed
either.) Networking and the virtual file system are disabled in the UML
guest kernel for increased security. All unnecessary drivers and features
are removed from the UML guest kernel to get fast boot times and to reduce
the overhead.
The underlying virtualization technology used by UML is
ptrace(), which doesn't need
root privileges or a kernel module or kernel modifications in the host. As
an alternative to UML, Seccomp could be used for
sendboxing, but that's quite cumbersome, because the sandboxed process
cannot allocate memory for itself (see the Google Chrome Seccomp
sandbox for a generic solution), but it's still prohibitively cumbersome
to sandbox GCC this way (with its requirements for forking and creating
temporary files). Most sandboxing approaches on Linux require root
privileges for a short amount of time (for chroot, PID namespaces (see
clone(2)) and other
namespaces). Most virtualization approaches (like KVM, Xen, VirtualBox, except possibly for
QEMU) need loading of kernel modules or
patching the kernel, and support only slow guest boot times.
With UML, it's possibly to boot the UML guest, run a hello-world binary,
and halt the UML guest in less than 0.02 second (sustained). For that speed,
one needs a (guest) kernel patch (included and automatically applied in
uevalrun) which skips the Calibrating delay loop kernel startup step,
which usually takes 0.33 second. The memory overhead of UML is also quite
low (6 MB with a stripped-down kernel).
All software running inside and outside UML is written in C (for high
performance) and compiled with uClibc
and linked statically to avoid depending on the libc on the host, and to
avoid the need for installing a libc in the guest. (The total size of these
custom-built binaries is less than 100kB.) There is no Linux distribution
installed the guest – only a few custom binaries are copied, like a
custom /sbin/init
, which mounts /proc
, sets up
file descriptors and resource limits, and starts the sandboxed program, and
then halts the guest. The /lib
directory in the guest is empty,
except for compilation, where /lib
contains
libc.a
, libstdc++.a
etc.
BusyBox (statically linked with
uClibc) is used as a Swiss Army Knife tool for scripting and automation (of
the build and the setup process), both inside and outside UML. Please note,
however, that once all the binaries are built and the filesystem images are
created, BusyBox is not used anymore for running sandboxed programs and
evaluating their output (except for temporary filesystem creation) because
of performance reasons, but work is done by the small and fast custom C
tools just compiled.
Sandboxed program sources can be written in C (using a large subset of
uClibc as the system library), C++
(using uClibc and
libstdc++), Python (using
all the built-in Python modules), Ruby (using just the built-in Ruby
modules and classes which are implemented in C), Perl (using only the
few dozen most common modules), or PHP (using just the built-in PHP
functions which are implemented in C).
For C and C++ compilation, a GCC 4.1.2
toolchain is used which is based on
uClibc and produces statically linked
executables.
Interpreters for Python, Ruby, Perl and PHP are provided as precompiled,
self-contained, single, statically linked, 32-bit (i386) Linux executables.
It's easy to copy them to the /bin
directory of the UML guest.
All tools used for building the software are either shipped with the
software, or the software downloads them during compilation. All binary
tools are statically linked, 32-bit (i386) Linux executables. This provides
maximum host system compatibility and reproducible behavior across systems.
In our speed measurements, CPU-intensive programs running inside UML run
1% slower than outside UML, but I/O-intensive programs (such as C++
compilation with GCC) can be about 6 times slower.
The
Minix filesystem
is used in the UML guests for both read-only filesystems (e.g. the
root filesystem with the scripting language interpreters) and read-write
filesystems (such as /tmp
for the temporary assembly files
created by GCC). The Minix filesystem was chosen instead of ext2 and other
usual Linux filesystem, because it has less size overhead, it's codebase is
smaller, and it has equally good performance for the small and mostly
read-only filesystems it is used for.
License
uevanrun has been released under the GNU GPL v2.
Bugs? Problems? Contact the author. Send feedback!
Please add your feedback to the issue tracker.
All feedbacks are welcome.