2009-11-01

Tiny, self-contained C compiler using TCC + uClibc

This post presents how I played with a small C compiler and a small libc on Linux. I've combined TCC (the Tiny C Compiler) 0.9.25 and uClibc 0.9.26, compressed with UPX 3.03 to a tiny, self-contained C compiler for Linux i386. You don't need any external files (apart from the compiler executable) to compile (or interpret) C code, and the compiled executable will be self-contained as well.

Here is how download and use it for compilation on a Linux x86 or x86_64 system:

$ uname
Linux
$ wget -O pts-tcc http://pts-mini-gpl.googlecode.com/svn/trunk/pts-tcc/pts-tcc-0.9.25
$ chmod +x pts-tcc
$ ls -l pts-tcc
-rwxrwxr-- 1 pts pts 241728 Nov 1 13:07 pts-tcc
$ wget -O example1.c http://pts-mini-gpl.googlecode.com/svn/trunk/pts-tcc/example1.c
$ cat example1.c
#! ./pts-tcc -run
int printf(char const*fmt, ...);
double sqrt(double x);
int main() {
printf("Hello, World!\n");
return sqrt(36) * 7;
}
$ ./pts-tcc example1.c
$ file a.out
a.out: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, not stripped
$ ls -l a.out
-rwxrwxr-x 1 pts pts 17124 Nov 1 13:17 a.out
$ ./a.out; echo "$?"
Hello, World!
42
$ strace -e open ./pts-tcc example1.c
open("/proc/self/mem", O_RDONLY) = 3
open("/proc/self/mem", O_RDONLY) = 3
open("example1.c", O_RDONLY) = 3
open("/proc/self/mem", O_RDONLY) = 3
open("/proc/self/mem", O_RDONLY) = 3
open("a.out", O_WRONLY|O_CREAT|O_TRUNC, 0777) = 3
$ strace ./a.out
execve("./a.out", ["./a.out"], [/* 47 vars */]) = 0
ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
write(1, "Hello, World!\n", 14Hello, World!
) = 14
_exit(42) = ?
As you can see above, my version the compiler is less than 250k in size, and it doesn't need any library files (such as libc.so) for compilation (as indicated by the strace output above), and the compiled binary doesn't need library files either.

TCC can run the C program without creating any binaries:
$ rm -f a.out
$ ./pts-tcc -run example1.c; echo "$?"
Hello, World!
42
$ head -1 example1.c
#! ./pts-tcc -run
$ chmod +x example1.c
$ ./example1.c; echo "$?"
Hello, World!
42
TCC is a very fast compiler (see the speed comparisons on it web site), it is versatile (it can compile the Linux kernel), but it doesn't optimize much: the code it produces is usually larger and slower than the output of a traditional, optimizing C compiler such as gcc -O2.

The shell script http://pts-mini-gpl.googlecode.com/svn/trunk/pts-tcc/pts-tcc-0.9.25-compile.sh
builds TCC from source, adding uClibc, and compressing with UPX.

Update: TCC 0.9.25 and uClibc 0.9.30.1

$ wget -O pts-tcc http://pts-mini-gpl.googlecode.com/svn/trunk/pts-tcc/pts-tcc-0.9.25-uclibc-0.9.30.1
$ chmod +x pts-tcc
$ ls -l pts-tcc
-rwxr-xr-x 1 pts eng 336756 Apr  8 16:53 pts-tcc

Update: TCC 0.9.26 and uClibc 0.9.30.1

$ wget -O pts-tcc http://pts-mini-gpl.googlecode.com/svn/trunk/pts-tcc/pts-tcc-0.9.26-uclibc-0.9.30.1
$ chmod +x pts-tcc
$ ls -l pts-tcc
-rwxr-xr-x 1 pts eng 349640 Apr  8 16:53 pts-tcc

6 comments:

Porcupine Saul said...

This looks like a cool tool, man. Thanks for sticking it out there.

bifferos said...

I tried unsuccessfully to get this working on a build of OpenWrt. Not sure what I've done wrong. It would be nice to understand what configuration I need to do to transfer the resultant tcc exe to an embedded system and make it work.

See here for the error.
https://forum.openwrt.org/viewtopic.php?id=26000

thunder9861 said...

I really really want to get this working, and I have tried many different things, however I cannot seem to replicate the setup you have.

When I use your wrapper, I get the following:

+ echo 'int printf(char*fmt,...);main(){return 1>printf("he110\n");}'
+ ./i386-uclibc-gcc -static -o test test.c
+ ./test
+ grep -l he110 test.out
test.out
+ ./i386-uclibc-gcc -o test test.c
/usr/bin/ld: cannot find /lib//libc.so.0
/usr/bin/ld: cannot find /usr/lib//uclibc_nonshared.a
/usr/bin/ld: cannot find /lib//ld-uClibc.so.0
collect2: ld returned 1 exit status

It seems to work statically, however I have something wrong with ld, (the double slashes) and I cannot seem to find where in the script or wrapper (or the proper environment variable) that can fix this. I should note that I have also build the uclibc toolchain using buildroot, and have it stored in /usr/local/uclibc, but I cannot seem to get the uclibc ld to run instead of the (/usr/bin/ld) one.

Furthermore, by using my compiler instead of the wrapper, I am able to compile pts-tcc and it displays its help output, however when trying to use it to compile I get:

user@ubuntu:~/Downloads/Tcc$ ./pts-tcc test.c
tcc: undefined symbol '__uClibc_main'
tcc: undefined symbol 'printf'

So if you could please give a little more details as to how to set up your environment, and where things need to be located, I would appreciate it.

Also, if you could tell me what is going on with this perl line:

for F in tcclibc.a "$UCLIBC_USR"/lib/{crt1.o,crti.o,crtn.o}; do
G="${F##*/}"
export NAME="data_${G%.*}"
perl -e '$_=join("",); my$L=length; s@([^-+/\w])@sprintf"\\%03o",ord$1@ge; print".globl $ENV{NAME}\n.section .data\n.align 4\n.size $ENV{NAME},$L\n$ENV{NAME}:\n.string \"$_\\001\"\n"' <"$F"
done >tcc-0.9.25/libcdata.s

Thank you so much, I would really like to get this working.

(On a side note, would there be an binary copy already lying around, for the i386 architecture?)

thunder9861 said...

Oops, found the binary, and it works great. I just cannot figure out how you did it.

Darren said...

I -_love_ this - it's solved a big problem for me - trouble is I just found a bug in tcc*.25, that's fixed in '26. I went to try to modify your build script, but I don't have all the right versions of things... any chance of an update?

(thx)
Darren

pts said...

@Darren: I've updated pts-tcc so a version using TCC 0.9.26 is available. Please see it in the updated blog post.