2016-01-06

How to extract comments from a JPEG file

This blog post explains how to extract comments from a JPEG file. Each JPEG file consists of segments. Each segment describes parts of the image data or metadata. The comments are in segments with marker COM (0xfe), there can be any number of them, anywhere (usually before the SOS segment) in the file.

Use the rdjpgcom command-line tool to extract comments. The tool is part of libjpeg, and on Ubuntu and Debian systems it can be installed with (don't type the leading $):

$ sudo apt-get install libjpeg-progs

Once installed, use it like this to print all comments in the JPEG file, with a terminating newline added to each:

$ rdjpgcom file.jpg

If the file doesn't have any comment, the output of rdjpgcom is empty. Here is how to add comments:

$ wrjpgcom -c 'COMfoo' com0.jpg >com1.jpg
$ wrjpgcom -c 'COMbar' com1.jpg >com2.jpg
After adding the comments, it will look like this:
$ rdjpgcom com2.jpg
COMfoo
COMbar

If you also want to see the unprintable characters (unsafe on a terminal), pass the -raw flag:

$ rdjpgcom -raw file.jpg

If you need a library, use the following C code, which is a minimalistic reimplementation of rdjpgcom -a:

/*
 * getjpegcom.c: Get comments from JPEG files.
 * by pts@fazekas.hu at Wed Jan  6 20:48:07 CET 2016
 *
 *   $ gcc -W -Wall -Wextra -Werror -s -O2 -o getjpegcom getjpegcom.c &&
 *   $ ./getjpegcom <file.jpg
 */

#include <stdio.h>
#if defined(MSDOS) || defined(WIN32)
#include <fcntl.h>  /* setmode. */
#endif

/* Get and print all comments in a JPEG file. Comments are written to of,
 * with a newline appended as terminator.
 *
 * Returns 0 on success, or a negative number on error.
 */
static int get_jpeg_comments(FILE *f, FILE *of) {
  int c, m;
  unsigned ss;
#if defined(MSDOS) || defined(WIN32)
  setmode(fileno(f), O_BINARY);
  setmode(fileno(of), O_BINARY);
#endif
  if (ferror(f)) return -1;
  /* A typical JPEG file has markers in these order:
   *   d8 e0_JFIF e1 e1 e2 db db fe fe c0 c4 c4 c4 c4 da d9.
   *   The first fe marker (COM, comment) was near offset 30000.
   * A typical JPEG file after filtering through jpegtran:
   *   d8 e0_JFIF fe fe db db c0 c4 c4 c4 c4 da d9.
   *   The first fe marker (COM, comment) was at offset 20.
   */
  if ((c = getc(f)) < 0) return -2;  /* Truncated (empty). */
  if (c != 0xff) return -3;
  if ((c = getc(f)) < 0) return -2;  /* Truncated. */
  if (c != 0xd8) return -3;  /* Not a JPEG file, SOI expected. */
  for (;;) {
    /* printf("@%ld\n", ftell(f)); */
    if ((c = getc(f)) < 0) return -2;  /* Truncated. */
    if (c != 0xff) return -3;  /* Not a JPEG file, marker expected. */
    if ((m = getc(f)) < 0) return -2;  /* Truncated. */
    while (m == 0xff) {  /* Padding. */
      if ((m = getc(f)) < 0) return -2;  /* Truncated. */
    }
    if (m == 0xd8) return -4;  /* SOI unexpected. */
    if (m == 0xd9) break;  /* EOI. */
    if (m == 0xda) break;  /* SOS. Would need special escaping to process. */
    /* printf("MARKER 0x%02x\n", m); */
    if ((c = getc(f)) < 0) return -2;  /* Truncated. */
    ss = (c + 0U) << 8;
    if ((c = getc(f)) < 0) return -2;  /* Truncated. */
    ss += c;
    if (ss < 2) return -5;  /* Segment too short. */
    for (ss -= 2; ss > 0; --ss) {
      if ((c = getc(f)) < 0) return -2;  /* Truncated. */
      if (m == 0xfe) putc(c, of);  /* Emit comment char. */
    }
    if (m == 0xfe) putc('\n', of);  /* End of comment. */
  }
  return 0;
}

int main(int argc, char **argv) {
  (void)argc; (void)argv;
  return -get_jpeg_comments(stdin, stdout);
}

Here is how to compile and run it:

$ gcc -W -Wall -Wextra -Werror -s -O2 -o getjpegcom getjpegcom.c
$ ./getjpegcom com2.jpg
COMfoo
COMbar