How to make smaller C and C++ binaries

This blog post presents several techniques to make the binaries resulting from C or C++ compilation smaller with GCC (or Clang). Please note that almost all techniques are tradeoffs, i.e. a smaller binary can be slower and harder to debug. So don't use the techniques blindly before understanding the tradeoffs.

The recommended GCC (and Clang) flags:

  • Use -s to strip debug info from the binary (and don't use -g).
  • Use -Os to optimize for output file size. (This will make the code run slower than with -O2 or -O3.
  • Use -m32 to compile a 32-bit binary. 32-bit binaries are smaller than 64-bit binaries because pointers are shorter.
  • In C++, use -fno-exceptions if your code doesn't use exceptions.
  • In C++, use -fno-rtti if your code doesn't use RTTI (run-time type identification) or dynamic_cast.
  • In C++, use -fvtable-gc to let the linker know about and remove unused virtual method tables.
  • Use -fno-stack-protector .
  • Use -fomit-frame-pointer (this may make the code larger on amd64).
  • Use -ffunction-sections -fdata-sections -Wl,--gc-sections . Without this all code from each needed .o file will be included. With this only the needed code will be included.
  • For i386, use -mpreferred-stack-boundary=2 .
  • For i386, use -falign-functions=1 -falign-jumps=1 -falign-loops=1 .
  • In C, use -fno-unwind-tables -fno-asynchronous-unwind-tables . Out of these, -fno-asynchronous-unwind-tables makes the larger difference (can be several kilobytes).
  • Use -fno-math-errno, and don't check the errno after calling math functions.
  • Try -fno-unroll-loops, sometimes it makes the file smaller.
  • Use -fmerge-all-constants.
  • Use -fno-ident, this will prevent the generation of the .ident assembler directive, which adds an identification of the compiler to the binary.
  • Use -mfpmath=387 -mfancy-math-387 to make floating point computations shorter.
  • If you don't need double precision, but float preecision is enough, use -fshort-double -fsingle-precision-constant .
  • If you don't need IEEE-conformat floating point calculations, use -ffast-math .
  • Use -Wl,-z,norelro for linking, which is equivalent to ld -z norelro .
  • Use -Wl,--hash-style=gnu for linking, which is equivalent to ld --hash-style=gnu . You may also try =sysv instead of =gnu, sometimes it's smaller by a couple of bytes. The goal here is to avoid =both, which is the default on some systems.
  • Use -Wl,--build-id=none for linking, which is equivalent to ld --build-id=none .
  • Get more flags from the Os list in diet.c of diet libc, for about 15 architectures.
  • Don't use these flags: -pie, -fpie, -fPIE, -fpic, -fPIC. Some of these are useful in shared libraries, so enable them only when compiling shared libraries.

Other ways to reduce the binary size:

  • Run strip -S --strip-unneeded --remove-section=.note.gnu.gold-version --remove-section=.comment --remove-section=.note --remove-section=.note.gnu.build-id --remove-section=.note.ABI-tag on the resulting binary to strip even more unneeded parts. This replaces the gcc -s flag with even more aggressive stripping.
  • If you are using uClibc or diet libc, then additionally run strip --remove-section=.jcr --remove-section=.got.plt on the resulting binary.
  • If you are using uClibc or diet libc with C or C++ with -fno-exceptions, then additionally run strip --remove-section=.eh_frame --remove-section=.eh_frame_ptr on the resulting binary.
  • After running strip ... above, also run sstrip on the binary. Download sstrip from ELF Kickers, and compile it for yourself. Or get the 3.0a binary from here.
  • In C++, avoid STL. Use C library functions instead.
  • In C++, use as few template types as possible (i.e. code with vector<int> and vector<unsigned> is twice as long as the code with vector<int> only).
  • In C++, have each of your non-POD (plain old data) classes an explicit constructor, destructor, copy-constructor and assignment operator, and implement them outside the class, in the .c file.
  • In C++, move constructor, destructor and method bodies outside the class, in the .c file.
  • In C++, use fewer virtual methods.
  • Compress the binary using UPX. For small binaries, use upx --brute or upx --ultra-brute . For large binaries, use upx --lzma . If you have large initialized arrays in your code, make sure you declare them const, otherwise UPX won't compress them.
  • Compress the used libraries using UPX.
  • If you use static linking (e.g. gcc -static), use uClibc (most convenient way: pts-xstatic or diet libc (most convenient way: the included diet tool) or musl (most convenient way: the included musl-gcc tool) instead of glibc (GNU C library).
  • Make every function static, create a .c file which includes all other .c files, and compile that with gcc -W -Wall. Remove all code to which the compiler says is unused. Last time this saved about 9.2 bytes per function for me.
  • Don't use __attribute__((regparm(3))) on functions, it tends to make the code larger.
  • If you have several binaries and shared libraries, consider unifying the binaries into a single one (using symlinks and distinguishing in main with argv[0]), and moving the library code to the binary. This is useful, because the shared libraries use position-independent code (PIC), which is larger.
  • If it's feasible, rewrite your C++ code as C. Once it's C, it doesn't matter if you compile it with gcc or g++.
  • If your binary is already less than 10 kilobytes, consider rewriting it in assembly, and generating the ELF headers manually, see the tiny ELF page for inspiration.
  • If your binary is already less than 10 kilobytes, and you don't use any libc functions, use a linker script to generate tiny ELF headers. See the tarball with the linker script.
  • Drop the --hash-style=... flag passed to ld by gcc. To do so, pass the -Bmydir flag to gcc, and create the executable mydir/ld, which drops these flags and calls the real ld.
  • See more flags and ideas in this answer.

1 comment:

Dawid Ciężarkiewicz said...

In C++ you can override the default demangle function to save like 100KB:

extern "C" char* __cxa_demangle(const char* mangled_name, char* buf, size_t* n, int* status) {
if (status)
*status = -1;
return nullptr;