2017-12-06

How to restrict an SSH user to file transfers

This blog post explains how a user on a Unix server can be restricted to file transfers only over SSH. The restriction is implemented by specifying a login shell which imposes a whitelist of allowed commands (e.g. rsync, sftp-server, scp, mkdir), and Unix permissions are used to restrict which files can be read and/or written by these commands.

Implementation using a custom login shell

First install Python 2 (as /usr/bin/python), then create a custom login shell binary, and save it to e.g. /usr/local/bin/transfer_shell. The contents of /usr/local/bin/transfer_shell should be:

#! /usr/bin/python
# by pts@fazekas.hu at Wed Dec  6 15:46:18 CET 2017

"""Login shell in Python 2 for SSH service restricted to data copying.

Use normal Unix permissions to restrict what files can be accessed.
"""

import os
import stat
import sys

if os.access(__file__, os.W_OK) or os.access(
    os.path.dirname(__file__), os.W_OK):
  sys.stderr.write('error: copy shell not safe\n')
if os.getenv('SSH_ORIGINAL_COMMAND', ''):
  sys.stderr.write('error: bad command= config\n')
  sys.exit(1)

#cmd = os.getenv('SSH_ORIGINAL_COMMAND', '').split()
#print >>sys.stderr, sys.argv
cs = (len(sys.argv) == 3 and sys.argv[1] == '-c' and sys.argv[2]) or ''
if cs == '/bin/sh .ssh/rc':
  sys.exit(0)
cmd = cs.split()
# cmd0 will be '' for interactive shells, thus it will be disallowed.
cmd0 = (cmd or ('',))[0]
#print >>sys.stderr, sorted(os.environ)
if cmd0 not in ('ls', 'pwd', 'id', 'cat', 'echo', 'cp', 'mv', 'rm',
                'mkdir', 'rmdir',
                'rsync', 'scp', '/usr/lib/openssh/sftp-server'):
  # In case of sftp, we can't write to stderr.
  sys.stderr.write('error: command not allowed: %s\n' % cmd0)
  sys.exit(1)
def is_scp_unsafe(cmd):
  has_tf = False
  for i in xrange(1, len(cmd) - 1):
    arg = cmd[i]
    if arg == '--' or not arg.startswith('-'):
      break
    elif arg in ('-t', '-f'):  # Flags indicating remote operation.
      has_tf = True
    elif arg not in ('-v', '-r', '-p', '-d'):
      return True
  return not has_tf
if ((cmd0 == 'rsync' and (len(cmd) < 2 or cmd[1] != '--server')) or
    cmd0 == 'scp' and is_scp_unsafe(cmd)):
  # This is to disallow arbitrary command execution with rsync -e and
  # scp -S.
  sys.stderr.write('error: command-line not allowed: %s\n' % cs)
  sys.exit(1)
os.environ['PATH'] = '/bin:/usr/bin'
os.environ.pop('DISPLAY', '')  # Disable X11.
os.environ.pop('XDG_SESSION_COOKIE', '')
os.environ.pop('XAUTHORITY', '')
try:
  os.chdir('data')
except OSError:
  sys.stderr.write('error: data dir not found\n')
  sys.exit(1)
try:
  # This is insecure: os.execl('/bin/sh', 'sh', '-c', cmd)
  os.execvp(cmd0, cmd)
except OSError:
  sys.stderr.write('error: command not found: %s\n' % cmd0)
  sys.exit(1)

Run these commands as root (without the leading #) to set the permissions transfer_shell:

# chown root.root /usr/local/bin/transfer_shell
# chmod 755       /usr/local/bin/transfer_shell

To set up restrictions for a new user

  1. Create the Unix user if not already created.
  2. Set up Unix groups and permissions on the system so the user doesn't have access to more files than he should have.
  3. Optionally, set up SSH public keys in ~/.ssh/authorized_keys for the user. No need to specify command="..." or other restrictions in that line.
  4. To change the login shell of the user, run this command as root (substituting USER with the login name of the user): chsh -s /usr/local/bin/transfer_shell USER
  5. Create a symlink named data in the home directory of the user. It should point to the default directory for file transfers.
  6. It's strongly recommended that you make the home directory and its contents unwritable by the user. Example command (run it as root, substitute USER): chown root.root ~USER ~USER/.ssh ~USER/.ssh/authorized_keys

Alternatives considered

  • Using a restrictive login shell and setting Unix file permissions. (This is implemented above, and also in scponly and rssh/) The disadvantage is that by accident the Unix permissions may be set up incorrectly (i.e. they are too permissive), and the user has access to too many files. Another disadvantage is that the custom login shell implementation may be vulnerable or hard to audit (example exploits for running arbitrary commands with rsync and scp: https://www.exploit-db.com/exploits/24795/).
  • Using a restrictive command="..." in ~/.ssh/authorized_keys. This is insecure, because OpenSSH sshd still runs ~/.bashrc and ~/.ssh/rc as shell scripts, and a malicious user could upload their own version of these files, or trigger some command execution in /etc/bash.bashrc. Any of these could lead to the user being able to execute arbitrary shell commands, which we don't want for this user.
  • Running a restrictive, custom SSH server implementation on a different port (while OpenSSH sshd is still running on port 22). This comes with its own risk of possible security bugs, and needs to be upgraded regularly. Also it can be complex to understand and set up correctly.
  • See some more alternatives here: https://serverfault.com/questions/83856/allow-scp-but-not-actual-login-using-ssh.

2017-10-06

Comparison of encrypted Git remote (remote repository) implementations

This blog post is a comparison of encrypted Git remote implementations. A Git remote is a combination of storage space on a remote server, remote server software and local software working together. An encrypted Git remote is a Git remote which makes sure that the storage space on the remote server contains the Git objects encrypted. It is useful if the Git repository contains sensitive information (e.g. passwords, bank account details), and the remote server is not trusted to keep such information hidden from unauthorized readers.

See the recent Hacker News dicsussion Keybase launches encrypted Git for the encrypted, hosted cloud Git remote provided by Keybase.

Comparison

  • name of the Git remote software
    • grg: git-remote-gcrypt
    • git-gpg: git-gpg
    • keybase: git-remote-keybase, the encrypted, hosted cloud Git remote provided by Keybase
  • does it support collaboration (users with different keys pull and push)?
    • grg: yes
    • git-gpg: yes
    • keybase: yes
  • does it encrypt the local .git repository directory?
    • grg: no
    • git-gpg: no
    • keybase: no
  • does it encrypt any files in the local working tree?
    • grg: no
    • git-gpg: no
    • keybase: no
  • does it encrypt the remote repository users push to?
    • grg: yes, it encrypts locally before push
    • git-gpg: yes, it encrypts locally before push
    • keybase: yes, it encrypts locally before push
  • by looking at the remote files, can anyone learn the total the number of Git objects?
    • grg: no
    • git-gpg: yes
    • keybase: probably yes
  • can root on the remote server learn the list of contributors (users who do git pull and/or git push)?
    • grg: yes, by making sshd log which SSH public key was used
    • git-gpg: yes, by making sshd log which SSH public key was used
    • keybase: yes
  • by looking at the remote files, can anyone learn the list of contributors (users who do git pull and/or git push)?
    • grg: no
    • git-gpg: no
    • keybase: probably yes
  • by looking at the remote files, can anyone learn when data was pushed?
    • grg: yes
    • git-gpg: yes
    • keybase: probably yes
  • does it support hosting of encrypted remotes on your own server?
    • grg: yes
    • git-gpg: yes
    • keybase: no, at least not by default, and not documented
  • supported remote repository types
    • grg: rsync, local directory, sftp, git repo (local or remote)
    • git-gpg: rsync, local directory
    • keybase: custom, data is stored on KBFS (Keybase filesystem, an encrypted network filesystem)
  • required software on the remote server
    • grg: sshd, (rsync or sftp-server or git)
    • git-gpg: sshd, rsync
    • keybase: custom, the KBFS server, there are no official installation instructions
  • required local software
    • grg: git, gpg, ssh, (rsync or sftp), git-remote-gcrypt
    • git-gpg: git, gpg, ssh, rsync, Python (2.6 or 2.7), git-gpg
    • keybase: binaries provided by Keybase: keybase, git-remote-keybase, kbfsfuse (only for remote repository creation)
  • product URL with installation instructions
    • grg: https://git.spwhitton.name/git-remote-gcrypt/tree/README.rst
    • git-gpg: https://github.com/glassroom/git-gpg
    • keybase: https://keybase.io/blog/encrypted-git-for-everyone
  • source code URL
    • grg: https://git.spwhitton.name/git-remote-gcrypt/tree/git-remote-gcrypt
    • git-gpg: https://github.com/glassroom/git-gpg/blob/master/git-gpg
    • keybase: https://github.com/keybase/kbfs/blob/master/kbfsgit/git-remote-keybase/main.go
  • implementation language
    • grg: Unix shell (e.g. Bash), single file
    • git-gpg: Python 2.6 and 2.7, single file
    • keybase: Go
  • source code size, number of bytes, including comments
    • grg: 21 448 bytes
    • git-gpg: 19 702 bytes
    • keybase: 5 617 305 bytes (including client/go/libkb/**/*.go and kbfs/{env,kbfsgit,libfs,libgit,libkbfs}/**/*.go)
  • is the source code easy to understand?
    • grg: yes, but some developers reported it's less easy than git-gpg
    • git-gpg: yes
    • keybase: no, because it's huge; individual pieces are simple
  • encryption tool used
    • grg: gpg (works with old versions, e.g. 1.4.10 from 2008)
    • git-gpg: gpg (works with old versions, e.g. 1.4.10 from 2008)
    • keybase: custom, written in Go
  • is it implemented as a Git remote helper?
    • grg: yes, git push etc. works
    • git-gpg: no, it works as git gpg push instead of git push etc.
    • keybase: yes, git push etc. works
  • how much extra disk space does it use locally, per repository?
    • grg: less than 1000 bytes
    • git-gpg: stores 2 extra copies of the .git repository locally, one of them containing only loose objects (thus mostly uncompressed)
    • keybase: less than 1000 bytes
  • how much disk space does it use remotely, per repository?
    • grg: one encrypted packfile for each push, encryption has a small (constant) overhead, occasionally runs git repack (locally, and uploads the result), and right after repacking it stores only 1 packfile (plus a small metadata file) per repository
    • git-gpg: one encrypted file for each object, encryption has a small (constant) overhead, no packfiles (thus the remote repository will be large and contain a lot of files, because of the lack of diff compression supported by the packfiles)
    • probably one encrypted file for each object

2017-09-02

How to run Windows XP on Linux using QEMU and KVM

This blog post is a tutorial explaining how to run Windows XP as a guest operating system using QEMU and KVM on a Linux host. It should take less then 16 minutes, including installation.

Requirements: You need a recent Linux system (Ubuntu 14.04 LTS will work) with a GUI, 620 MB of free disk space and 550 MB of free memory. If you don't want to browse the web from Windows XP, then 300 MB of free memory is enough.

Software used:

  • The latest version of Hiren's BootCD (version 15.2) was released on 2012-11-09. It contains a live (no need to install) mini Windows XP system with a web browser (Opera). (Additionally, it contains hundreds of system rescue, data recovery, antivirus, backup, password recovery, hard disk diagnostics and system diagnostics tools. To see many of them with screenshot, look at this article about Hiren's BootCD, or click on the See CD Contents link on the official Hiren's BootCD download page.)
  • QEMU. It's a fully system emulator, which can emulate multiple architectures, and it can run multiple operating systems as a guest.
  • KVM. It's a fast virtualization (emulation) of guest operating systems on Linux. It's used by QEMU, and it lets QEMU execute the CPU-intensive operations on guest systems quickly, with only 10% or less overhead. (I/O-intensive operations can be much slower.)

Log in to the GUI, open a terminal window, and run the following command (without the leading $, copy-paste it as a single, big, multiline paste):

$ python -c'import os, struct, sys, zlib
def work(f):  # Extracts the .iso from the .zip on the fly.
 while 1:
  data, i, c, s = f.read(4), 0, 0, 0
  if data[:3] in ("PK\1", "PK\5", "PK\6"): return f.read()
  assert data[:4] == "PK\3\4", repr(data); data = f.read(26)
  _, _, mth, _, _, _, cs, us, fnl, efl = struct.unpack("<HHHHHlLLHH", data)
  fn = f.read(fnl); assert len(fn) == fnl
  ef = f.read(efl); assert len(ef) == efl
  if fn.endswith(".iso"): uf = open("hirens.iso", "wb")
  else: mth = -1
  if mth == 8: zd = zlib.decompressobj(-15)
  while i < cs:
   j = min(65536, cs - i); data = f.read(j); assert len(data) == j; i += j
   if mth == 8: data = zd.decompress(data)
   if mth != -1: uf.write(data)
  if mth == 8: uf.write(zd.flush())
work(os.popen("wget -nv -O- "
    "http://www.hirensbootcd.org/files/Hirens.BootCD.15.2.zip"))'

The command above downloads the Hiren's BootCD image and extracts it to the file hirens.iso. (Alternatively, you could download from your browser and extract the .iso manually. That would use more temporary disk space.)

Install QEMU. If you have a Debian or Ubuntu system, do it so by running the command (without the leading $):

$ sudo apt-get install qemu-system-x86

On other Linux systems, use your package manager to install QEMU with KVM support.

The only step in this tutorial which needs root access (and thus the root password) is the QEMU installation above.

Run the following command in your terminal window (without the leading $, copy-paste it):

$ SDL_VIDEO_X11_DGAMOUSE=0 qemu-system-i386 -m 512 -machine pc-1.0,accel=kvm \
    -cdrom hirens.iso -localtime -net nic -net user -smb "$HOME"

This command will start a virtual machine running Hiren's Boot CD, and it will display it in a window (of size 800x600). The command will not exit until you close the window (and thus abort the virtual machine).

The virtual machine will use 512 MB of memory (as specified by -mem 512 above. It's possible for the mini Windows XP to use less memory, e.g. you if you specify -mem 256 instead, then it will still work, but web browsing (with Opera) won't work, and you will have to click OK on the Your system is low on vitual memory. dialog later.

In a few seconds, the boot menu of Hiren's BootCD is displayed in the QEMU window:

Press the down arrow key and press Enter to choose Mini Windows Xp. Then wait about 1 minute for Windows XP to start. It will look like this:

To use the mouse within the QEMU window, click on the window. To release your mouse (to be used in other windows), press Ctrl and Alt at the same time.

Networking (such as web and file sharing) is not enabled by default. To enable it, click on the Network Setup icon in the QEMU window desktop, and wait about 20 seconds. The IP address of the guest Windows XP is 10.0.2.15, and the IP address of host Linux system is 10.0.2.2. Because of the user mode networking emulation provided by QEMU, external TCP connections can also be made from Windows XP (e.g. you can browse the web). Please note that ping won't work (because QEMU doesn't emulate that).

To browse the web, click on the Internet icon in the QEMU Windows desktop. It will start the Opera browser. Web browsing will be quite slow, so better try some fast sites such as google.com or whatismyip.com.

To use the command line, click on the Command prompt icon in the QEMU Windows desktop. There is a useful command to type to that window: net use s: \\10.0.2.4\qemu (press Enter after typing it). That will make your Linux home folder available as drive S: in Windows XP, for reading and writing. (You can change which folder to make available by specifying it after -smb when starting QEMU.)

Copy-pasting between Linux and Windows XP clipboards doesn't work.

You can make the QEMU window larger by changing Start menu / Settings / Control Panel / Display / Settings / Screen resoluton to 1024 by 768 pixels. The 1024x768 shortcut on the QEMU Windows desktop doesn't work,

Because of efficient CPU virtualization by KVM, an idle Windows XP in a QEMU window doesn't use more than 10% CPU on the host Linux system.

Hiren's boot CD contains hundrends of Windows apps. Only a fraction of the apps are available from the Windows XP start menu. To see all apps, click on the HBCD Menu icon in the QEMU Windows desktop, and then click on the Browser Folder button.

2017-02-28

How to avoid unnecessary copies when appending to a C++ vector

This blog post explains how to avoid unnecessary copies when appending to a C++ std::vector, and recommends the fast_vector_append helper library, which eliminates most copies automatically.

TL;DR If you are using C++11, and your element classes have an efficient move constructor defined, then just use push_back to append, it won't do any unnecessary copies. In addition to that, if you are constructing the to-be-appended element, you can use emplace_back to append, which even avoids the (otherwise fast) call to the move constructor.

Copying is slow and needs a lot of (temporary memory) if the object contains lots of data. Such an object is a long std::string: the entire array of characters get copied to a new array. This hurts performance if the copy is unnecessary, e.g. if only a temporary copy is made. For example:

std::string create_long_string(int);

std::vector<std::string> v;
{
  // Case A.
  std::string s1 = create_long_string(1);
  std::string s2 = create_long_string(2);
  std::string s3 = create_long_string(3);
  // Case B.
  v.push_back(s1);
  std::cout << s1;
}
// Case C.
v.push_back("foo");
// Case D, from C++11.
v.emplace_back("foo");
// Case E.
v.push_back(create_long_string(4));
// Case F.
v.push_back(std::string()); v.back().swap(s2);
// Case G, from C++11.
v.push_back(std::move(s3));

In Case A, return value optimization prevents the unnecessary copying: the string built in the function body of create_long_string is placed directly to s1.

In Case B, a copy has to be made (there is no way around it), because v is still valid after s1 is destroyed, thus it cannot reuse the data in s1.

Case C could work without a copy, but in C++98 an unnecessary copy is made. So first std::string("foo") is called (which makes a copy of the data), and then the copy constructor of std::string is called to create a new string (with a 2nd copy of the data), which gets added to v.

Case D avoids the 2nd (unnecessary) copy, but it works only from C++11. In earlier versions of C++ (such as C++98), std::vector doesn't have the emplace_back method.

In Case E, there is an unnecessary copy in C++98: create_long_string creates an std::string, and it gets copied to a new std::string within v. It would be better if create_long_string could create the std::string to its final location.

Case F shows the workaround in C++98 of adding s2 to an std::vector without a copy. It's a workaround because it's a bit ugly and it still involves some copying. Fortunately this copying is fast: it copies only the empty string. As a side effect, the value of s2 is lost, it will then be the empty string.

Case G shows the C++11 way of adding s3 to an std::vector without a copy. It doesn't work in C++98 (there is no std::move in C++98). The std::move(s3) visibly documents that the old value of s3 is lost.

C++11 (the version of C++ after C++98) introduces rvalue references, move constructors and move semantics to avoid unnecessary copies. This will fix both Case C and Case E. For this to work, new code needs to be added to the element class (in our case std::string) and to the container class (in our case std::vector) as well. Fortunately, the callers (including our code above and the body of create_long_string) can be kept unchanged. The following code has been added to the C++ standard library (STL) in C++11:

class string {
  ...
  // Copy constructor. C++98, C++11.
  string(const string& old) { ... }
  // Move constructor. Not in C++98, added in C++11.
  string(string&& old) { ... } ... }
};

template<typename T, ...>
class vector {
  ...
  // Takes a const reference. C++98, C++11.
  void push_back(const T& t);
  // Takes an rvalue reference. Not in C++98, added in C++11.
  void push_back(T&& t);
};

As soon as both of these are added, when v.push_back(...) will attempt to call the 2nd method (which takes the rvalue reference), which will call to move constructor of std::string instead of the copy constructor. This gives us the benefit of no copying, because typically the move constructor is fast, because it doesn't copy data. In general, the move constructor creates the new object with the data of the old object, and it can leave the old old object in an arbitrary but valid state. For std::string, it just copies the pointer to the data (which is fast, because it doesn't copy the data itself), and sets the pointer in the old std::string to nullptr. Thus Case C and Case E become fast in C++11. Case B is not affected (it still copies), and that's good, because we want to print s1 to cout below, so we want that data there. This happens automatically, because in the call v.push_back(s1), s1 is not an rvalue reference, thus the cost-reference push_back will be called, which does a copy. For more details about the magic to select the proper push_back, see this tutorial or this tutorial.

Guidelines to avoid unnecessary copies

Define your (element) classes like this:

  • Define the default constructor (C() { ... }).
  • Define the destructor (~C() { ... }).
  • Define the copy constructor (C(const C& c) { ... }).
  • It's a good practice to define operator=, but not needed here.
  • For C++11 classes, define a move constructor (e.g. C(C&& c) { ... }).
  • For C++11 classes, don't define a member swap method. If you must define it, then also define a method void shrink_to_fit() { ... }. It doesn't matter what the method does, you can just declare it. The fast_vector_append library detects shrink_to_fit, and will use the move constructor instead of the swap method, the former being slightly faster, although neither copies the data.
  • For C++98 classes, don't define a move constructor. In fact, C++98 doesn't support move constructors.
  • For C++98 classes, define a member swap method.

To append a new element to an std::vector without unnecessary copying, as fast as possible, follow this advice from top to bottom:

  • If it's C++11 mode, and the object is being constructed (not returned by a function!), use emplace_back without the element class name.
  • If it's C++11 mode, and the class has a move constructor, use push_back.
  • If it's C++11 mode, and the class has the member swap method, use: { C c(42); v.resize(v.size() + 1); v.back().swap(c); }
  • If the class has the member swap method, use: { C c(42); v.push_back(C()); v.back().swap(c); }
  • Use push_back. (This is the only case with a slow copy.)

Automating the avoidance of unnecessary copies when appending to a vector

It would be awesome if the compiler could guess the programmer's intentions, e.g. it would pick emplace_back if it is faster than push_back, and it will avoid the copy even in C++98 code, e.g. it will use swap if it's available, but the move constructor isn't. This is important because sometimes it's inconvenient to modify old parts of a codebase defining the element class, and it already has swap.

For automation, use fast_vector_append(v, ...) in the fast_vector_append library to append elements to an std::vector. It works in both C++98 and C++11, but it can avoid more copies in C++11. The example above looks like:

#include "fast_vector_append.h"
std::string create_long_string(int);

std::vector<std::string> v;
{
  // Case A. No copy.
  std::string s1 = create_long_string(1);
  std::string s2 = create_long_string(2);
  std::string s3 = create_long_string(3);
  // Case B. Copied.
  fast_vector_append(v, s1);
  std::cout << s1;
}
// Case C. Not copied.
fast_vector_append(v, "foo");
// Case D. Not copied.
fast_vector_append(v, "foo");
// Case E. Copied in C++98.
fast_vector_append(v, create_long_string(4));
{ std::string s4 = create_long_string(4);
  // Case E2. Not copied.
  fast_vector_append_move(v, s4);
}
// Case F. Not copied.
fast_vector_append_move(v, s2);
// Case G. Not copied.
fast_vector_append_move(v, s3);
// Case H. Copied in C++98.
fast_vector_append(v, std::string("foo"));

Autodetection of class features with SFINAE

The library fast_vector_append does some interesting SFINAE tricks to autodetect the features of the element class, so that it will be able to use the fastest way of appending supported by the class.

For example, this is how it detects whether to use the member swap method:

// Use swap iff: has swap, does't have std::get, doesn't have shrink_to_fit,
// doesn't have emplace, doesn't have remove_suffix. By doing so we match
// all C++11, C++14 and C++17 STL templates except for std::optional and
// std::any. Not matching a few of them is not a problem because then member
// .swap will be used on them, and that's good enough.
//
// Based on HAS_MEM_FUNC in http://stackoverflow.com/a/264088/97248 .  
// Based on decltype(...) in http://stackoverflow.com/a/6324863/97248 .
template<typename T>   
struct __aph_use_swap {
  template <typename U, U> struct type_check;
  // This also checks the return type of swap (void). The checks with
  // decltype below don't check the return type.
  template <typename B> static char (&chk_swap(type_check<void(B::*)(B&), &B::swap>*))[2];
  template <typename  > static char (&chk_swap(...))[1];
  template <typename B> static char (&chk_get(decltype(std::get<0>(*(B*)0), 0)))[1];  
// ^^^ C++11 only: std::pair, std::tuple, std::variant, std::array. template <typename > static char (&chk_get(...))[2]; template <typename B> static char (&chk_s2f(decltype(((B*)0)->shrink_to_fit(), 0)))[1];
// ^^^ C++11 only: std::vector, std::deque, std::string, std::basic_string. template <typename > static char (&chk_s2f(...))[2]; template <typename B> static char (&chk_empl(decltype(((B*)0)->emplace(), 0)))[1];
// ^^^ C++11 only: std::vector, std::deque, std::set, std::multiset, std::map, std::multimap, std::unordered_multiset, std::unordered_map, std::unordered_multimap, std::stack, std::queue, std::priority_queue. template <typename > static char (&chk_empl(...))[2]; template <typename B> static char (&chk_rsuf(decltype(((B*)0)->remove_suffix(0), 0)))[1];
// ^^^ C++17 only: std::string_view, std::basic_string_view. template <typename > static char (&chk_rsuf(...))[2]; static bool const value = sizeof(chk_swap<T>(0)) == 2 && sizeof(chk_get<T>(0)) == 2
&& sizeof(chk_s2f<T>(0)) == 2 && sizeof(chk_empl<T>(0)) == 2
&& sizeof(chk_rsuf<T>(0)) == 2; };

The autodetection is used like this, to select one of the 2 implementations (either with v.back().swap(t) or v.push_back(std::move(t))):

template<typename V, typename T> static inline
typename std::enable_if<std::is_same<typename V::value_type, T>::value &&
    __aph_use_swap<typename V::value_type>::value, void>::type
fast_vector_append(V& v, T&& t) { v.resize(v.size() + 1); v.back().swap(t); }                               

template<typename V, typename T> static inline
typename std::enable_if<std::is_same<typename V::value_type, T>::value &&
    !__aph_use_swap<typename V::value_type>::value, void>::type
fast_vector_append(V& v, T&& t) { v.push_back(std::move(t)); }

2017-02-24

How to back up your WhatsApp chats and photos safely on Android

This blog post explains how to make backups of your WhatsApp chats and photos safely on your Android device, and how to restore your backups. By safely we mean that you won't lose data unless you remove some backup files manually.

WhatsApp saves all chats to the WhatsApp/Databases folder on the phone's storage (sdcard), and it saves all photos and other media files to the WhatsApp/Media folder. (In fact, from the chats only the file WhatsApp/Databases/msgstore.db.cryptNNN may be needed, where NNN is an integer, currently 12.) If you make a copy of these folders, and copy them back to a new or reinstalled Android device before installing WhatsApp, then this effectively restores the backup, WhatsApp will recognize and use these files the first time it's installed (you need to tap on the Restore button within WhatsApp). See this FAQ entry for more information on restoring WhatsApp backups on Android.

WhatsApp supports creating backups to Google Drive (automatically, every day), and restoring those backups when the app is (re)installed. This sounds like convenient and safe, but it's not safe, you can still lose your chat history and photos (see below how). So if you care about your WhatsApp chat history and photos, and you need an automated backup for them, here is my recommendation: use the FolderSync Lite free Android app to make automatic backups to the cloud (it supports Google Drive, DropBox and more than 20 other cloud providers). To restore the backup, you can use FolderSync Lite, or you can download the files and copy them to your Android device manually.

Here is how to set up FolderSync Lite on Android for automatic backups of WhatsApp chats, photos and other media:

  1. Create a Google account, open Google Drive, create a folder named FolderSyncBackup-WhatsApp, and within it create subfolders Databases and Media (both case sensitive). It can also be done similarly on Dropbox instead, but this tutorial focuses on Google Drive.
  2. Install FolderSync Lite to your Android device.
  3. Add your Google account to FolderSync lite.
  4. Create a folderpair for backing up chats:
    • Account: your Google account
    • Unique name: wad
    • Sync type: To remote folder
    • Remote folder: /FolderSyncBackup-WhatsApp/Databases/
    • Local folder: .../WhatsApp/Databases/
    • Use scheduled sync: yes
    • Sync itnerval: Daily
    • Copy files to time-stamped folder: no
    • Sync subfolders: yes
    • Sync hidden files: yes
    • Delete source files after sync: no
    • Retry sync if failed: yes
    • Only resync source files if modified (ignore target deletion): yes
    • Sync deletions: no
    • Overwrite old files: Always
    • If conflicting modifications: Skip file
    • Use WiFi: yes
    • (Many settings below are fine, skipped here.)
  5. Save the folderpair, and do the first sync manually.
  6. Create a folderpair for backing up media files, including photos:
    • Account: your Google account
    • Unique name: wam
    • Sync type: To remote folder
    • Remote folder: /FolderSyncBackup-WhatsApp/Media/
    • Local folder: .../WhatsApp/Media/
    • (Subsequent settings are the same as above.)
  7. Save the folderpair, and do the first sync manually.
  8. Optionally, you can turn of WhatsApp's automatic backup to Google Drive in the WhatsApp app's chat settings.
  9. To remove the WhatsApp's automatic backup files from Google Drive, go to Google Drive, click on the gear icon (Settings), click on Settings, click on Manage Apps, find and click on the Options off WhatsApp Messenger, click on Delete app data, and then click on Disconnect from Drive.

If FolderSync starts failing consistently with the error message IllegalStateException: Expected BEGIN_OBJECT but was STRING, you can fix it by unlinking and reauthenticating the sync account on the Android device. To do so, open the FolderSync app on the device, tap Accounts, tap on your GoogleDrive account, tap on the black UNLINK ACCOUNT button, tap on the black RE-AUTHENTICATE ACCOUNT button, tap on Save, go back, tap on Folderpairs, and tap on the black Sync buttons with an error next to them.

WhatsApp saves all chat history so far to a new file every day (file name pattern: WhatsApp/Databases/msgstore-????-??-??.*.db.crypt*). These files will accumulate and fill up your Google Drive quote in a year or two, so you may want to remove old files. You can do it manually on the Google Drive web UI, just visit the FolderSyncBackup-WhatsApp/Databases folder, select old files (by date), and remove them. Alternatively, you can automate removal using an Apps Script. Here is how:

  1. Visit http://script.google.com/
  2. You see a script editor window. Remove the existing code (function myFunction() { ... }).
  3. In the File / Rename... menu, rename the project to script to remove old WhatsApp backups
  4. Copy-paste the following code to the script editor window:
    var FOLDER_NAME = 'FolderSyncBackup-WhatsApp';
    // Please note that msgstore.db.crypt12 will always be kept in addition to the dated files.
    var MINIMUM_DBS_TO_KEEP = 8;  // A value smaller than 8 doesn't make sense, WhatsApp keeps that much, and these files would be reuploaded by FolderSync.
    var USE_TRASH = true;
    var DO_DRY_RUN = false;
    
    function getSingleFolder(foldersIter) {
      var folder = null;
      while (foldersIter.hasNext()) {
        if (folder != null) throw new Error("Multiple folders found.");
        folder = foldersIter.next();
      }
      if (folder == null) throw new Error("Folder not found.");
      return folder;
    }
    
    function getAll(iter) {
      var result = [];
      while (iter.hasNext()) {
        result.push(iter.next());
      }
      return result;
    }
    
    function compareByName(a, b) {
      var an = a.getName(), bn = b.getName();
      return an < bn ? -1 : an == bn ? 0 : 1;
    }
    
    function removeOldWhatsAppBackups() {
      Logger.log('<removeOldWhatsAppBackups>');
      Logger.log('Config FOLDER_NAME=' + FOLDER_NAME);
      Logger.log('Config MINIMUM_DBS_TO_KEEP=' + MINIMUM_DBS_TO_KEEP);
      Logger.log('Config USE_TRASH=' + USE_TRASH);
      Logger.log('Config DO_DRY_RUN=' + DO_DRY_RUN);
      // var folders = DriveApp.getFolders();  // Also lists subfolders.
      var folder = getSingleFolder(getSingleFolder(DriveApp.getRootFolder().getFoldersByName(FOLDER_NAME)).getFoldersByName('Databases'));
      var files = getAll(folder.getFiles());
      var sortedFiles = [];
      var i;
      for (i = 0; i < files.length; ++i) {
        var file = files[i];
        var name = file.getName();
        // Logger.log(name + ': ' + file.getDateCreated());  // No last-modification time :-(.
        if (name.match(/^msgstore-\d\d\d\d-\d\d-\d\d[.]/)) {
          sortedFiles.push(file);
        }
      }
      sortedFiles.sort(compareByName);  // The name reflects the date. Earlier file first.
      var toKeep = MINIMUM_DBS_TO_KEEP;
      if (toKeep < 1) toKeep = 1;
      for (i = 0; i < sortedFiles.length - toKeep; ++i) {
        var file = sortedFiles[i];
        // Logger.log('?? ' + file.getName() + ': ' + file.getSize());
        // A 100-byte decrease in file size is tolerable, can be a compression artifact.
        if (file.getSize() - 100 < sortedFiles[i + 1].getSize()) {
          if (DO_DRY_RUN) {
          } else if (USE_TRASH) {
            file.setTrashed(true);
          } else {
            DriveApp.removeFile(file);
          }
          // Utilities.sleep(100); // Prevent exceeding rate limit (currently 10 requests per second). Is it still in effect?
          Logger.log('-- Removing: ' + file.getName() + ': ' + file.getSize());
        } else {
          Logger.log('Keeping: ' + file.getName() + ': ' + file.getSize());
        }
      }
      for (; i < sortedFiles.length; ++i) {
        var file = sortedFiles[i];
        Logger.log('Keeping recent: ' + file.getName() + ': ' + file.getSize());
      }
      Logger.log('</removeOldWhatsAppBackups>');
    }
  5. Click on the File / Save menu to save the script.
  6. Open the Resources / All your triggers menu window, choose Add a new trigger, add this: removeOldWhatsAppBackups, Time-driven, Day timer, 5am to 6am, also add a notification to yourself in e-mail, sent at 6am.
  7. Click on Save to save the triggers.
  8. You may wonder if this Apps Script deletes chat backups in a safe way. You have to decide this for yourself. It keeps the 8 last backups, and it also keeps all backups after which the backup file size has decreased. This latter condition prevents the data loss case described below (in the next section).

That's it, automatic and safe backup of WhatsApp chats, photos and other media files to Google Drive is now set up using FolderSync Lite.

Is WhatsApp's built-in backup to Google Drive safe?

No it isn't, you can lose all your data in some cases. The data loss happened to a friend in Feb 2017 in the following way:

  • The Android phone was lost, thus the WhatsApp folder on the phone wasn't available.
  • There was a working and recent backup on Google Drive.
  • When WhatsApp was installed to a new phone, it started restoring the backup from Google Drive.
  • Shortly after the start of the restore, the internet connection broke and the restore was aborted. Only a little part of the messages were restored.
  • The owner of the phone didn't notice that many chats are missing, and started using WhatsApp.
  • Within 24 hours, WhatsApp created a new backup to Google Drive, overwriting the old, full data with the new partial data. At this point the majority of the old chats got lost.
  • A couple of days later the owner of the phone noticed, but it was too late.

If WhatsApp's built-in backup to Google Drive had been written in a way that it never overwrites old backups, it would have been possible to reinstall WhatsApp and restore all the chats without data loss. Unfortunately the users of WhatsApp have no control on how WhatsApp does backups. But they can use an alternative backup method which never loses data (see the method with FolderSync Lite above).

Another way to safely restore WhatsApp's built-in backup from Google Drive would be downloading it from Google Drive first, and keeping a a copy until WhatsApp has successfully restored everything on the new Android device. Unfortunately this is impossible, because it's impossible for the user to download their own backup from Google Drive (!), because it is saved to a hidden app folder, which only the WhatsApp app can read and write, and the user is unable to access it (apart from deleting it). This StackOverflow question has an answer which describes a cumbersome and fragile way for downloading, but this can easily change in the future, so I don't recommend it for general use. I recommend the method with FolderSync Lite above instead, which makes it easy for the user to see and download their WhatsApp backup from Google Drive.

2017-02-07

Checking whether a Unix filename is safe in C

This blog post explains how to check whether a filename is safe for overwriting the file contents on a Unix system, and it also contains C code to do the checks.

The use case is that an archive (e.g. ZIP) extractor is extracting an untrusted archive and creating and overwriting files. The archive may be created by someone malicious, trying to trick the extractor to overwrite sensitive system files such as /etc/passwd. Of course this would only work if the process running the extractor has the permission to modify system files (which it normally doesn't have, because it's not running as root). The most basic protection the extractor can provide is refusing to write to a file with an absolute name (i.e. starting with /), so that even if it's running as root, it would do no harm if it's not running in the root directory.

However, checking whether the first character is a / isn't good enough, because the attacker may specify a file with name ../.././../../../.././../tmp/../etc/././passwd, which is equivalent to /etc/passwd if the current directory isn't deep enough. Enforcing the following conditions makes sure that such attacks are not possible:

  • The filename isn't be empty.
  • The filename doesn't start or end with a /.
  • The strings . and .. are not present as a pathname component (i.e. as an item when the filename is split on /).
  • The filename doesn't contain a double slash: //.

Here is how to check all these in C:

int is_filename_safe(const char *p) {
  const char *q;
  for (;;) {
    for (q = p; *p != '\0' && *p != '/'; ++p) {}
    /* In the first iteration: An empty filename is unsafe. */
    /* In the first iteration: A leading '/' is unsafe. */
    /* In subsequent iterations: A trailing '/' is unsafe. */
    /* In subsequent iterations: A "//" is unsafe. */
    if (p == q ||
    /* A pathname component "." is unsafe. */
    /* A pathname component ".." is unsafe. */
        (*q == '.' && (q + 1 == p || (q[1] == '.' && q + 2 == p)))) {
      return 0;  /* Unsafe. */
    }
    if (*p++ == '\0') return 1;  /* Safe. */
  }
}

An even better solution is running the extractor in an isolated sandbox (jail), thus protecting against malicious input and all kinds of software bugs.