pts.blog: 2012

2012-12-30

Hungarian spell checking, hyphenation and synonyms in LibreOffice 3 and OpenOffice 3

This blog post explains how to install the Hungarian spell checking, hyphenation and synonyms extensions to OpenOffice 3.x and LibreOffice 3.x.

Install OpenOffice 3.x or LibreOffice 3.x.
Download the most recent and relevant .oxt file from http://code.google.com/p/openscope/wiki/DictHuOxt with your web browser.
The web browser will start OpenOffice or LibreOffice for you, and it will install the extension.
Exit from OpenOffice and LibreOffice, and start it again.
To try the spell checker, type the word hejtelen into Writer, select it, and in Format / Character specify Hungarian as the language. The spell checker will mark it as incorrect (by underlining it with a wavy red line) and it will suggest helytelen as the fix.
To try the automatic hyphenation, type the word meggyültetvény, repeat it about a 100 times, select all, in Format / Character specify Hungarian as the language, and in Format / Paragraph enable automatic hyphenation. The hyphenator will hyphenate the word like meggy-ül-tet-vény.
There are many related extensions you can optionally install. Don't be afraid to install OpenOffice extensions to LibreOffice. They usually work.
- Hungarian grammar checker: http://extensions.libreoffice.org/extension-center/magyar-mondatellenorzo
- Text generation for numbers (Numbertext): http://extensions.libreoffice.org/extension-center/numbertext-1
- Linux Libertine G and Linux Biolinum G fonts: http://numbertext.org/linux/
- Typography toolbar: http://extensions.services.openoffice.org/en/project/typo
- Graphite font extension: http://www.thanlwinsoft.org/GraphiteOOoExt/

The most important contribution of this blog post is the http://code.google.com/p/openscope/wiki/DictHuOxt link above, which is very hard to find with web search engines.

2012-12-03

Announcing intalg: integer and number theory algorithms in pure Python 2.x

This is the initial release announcement for pybasealgo (fast implementation of basic algorithms in pure Python 2.x, minimum version:) and the first set of algorithms, intalg (integer and number theory algorithms). The algorithm implementations can be used in Project Euler and programming contest problem solving. These are free software under the GPL. Click on the links above to download the source.

intalg contains the following algorithms:

integer bit count (bit_count)
square root rounding down (sqrt_floor)
k^th root rounding down (root_floor)
bounded prime list builder using the sieve of Eratosthenes (primes_upto, first_primes_moremem, first_primes)
bounded prime generator using the sieve of Eratosthenes (yield_primes_upto, yield_first_primes)
unbounded prime generator using the sieve of Eratosthenes (yield_primes, yield_composites)
high precision upper bound for natural and base 2 logarithms (log2_256_more, log_more)
greatest common divisor using Euclidean algorithm (gcd)
primality test using the Rabin-Miller test with fast parameters (is_prime)
finding the next prime (next_prime)
computation of binary coefficients (choose)
integer factorization to prime divisors (factorize, yield_slow_factorize, finder_slow_factorize)
computation of the number of divisors (divisor_count)
computation of the sum of divisors (divisor_sum)
random number generation using the Mersenne twister MT19937, without using floating point numbers (MiniIntRandom)
Pollard's Rho algorithm aiding integer factorization (pollard)
Brent's algorithm aiding integer factorization (pollard)
simplification of fractions (simplify_nonneg)
RLE: run-length encoding (rle)
conversion of fraction of very large integers to float (fraction_to_float)

Design principles:

Pure Python: Provides full functionality using only built-in Python modules.
Fast: Fast implementations of fast algorithms, e.g. Newton iteration for square root and k^th root, sieve of Eratosthenes for prime number generation, Brent's algorithm for integer factorization. Typically faster than most of the sample code and other libraries found online. (If you find something faster, please let me know by posting a comment.) Not as fast as C code (e.g. pygmp).
Correct: Many Python code examples (e.g. Brent's algorithm) found on the net are buggy (especially in corner cases) or are too slow. pybasealgo strives to be correct. Production use (i.e. I use this library for solving Project Euler problems) and unit tests help this.
Simple: Maintains a balance between speed and implementation simplicity. It implements sophisticated and fast algorithms in a simple way, but it doesn't contain the fastest possible algorithm if it's very complex (e.g. for integer factorization, pyecm is 1472 lines long, our intalg contains Brent's algorithm, which is slower but much simpler).
No-float: Doesn't use floating point numbers, not even for temporary computations (e.g. math.log(...)). This is to avoid rounding errors, which make the correctness of an algorithm hard to prove and they make some computations nondeterministic on some architectures and systems. And also to support arbitrarily large numbers: floats have a maximum (e.g. all IEEE 754 64-bit are less than 2e308), so they cannot be used in calculations with very large numbers. Python uses temporary floating point numbers for random integer generation, so pybasealgo provides an integer-only replacement.
Arbitrarily large integers: All algorithms work with small integers (of type int) and arbitrarily large integers (of type long). There is no automatic truncation of the results.
Deterministic: Every function returns the same value for the same input, and it runs at the same speed. (Except if some return values are cached in memory, then subsequent calls in the same process will be faster.) This makes debugging and benchmarking a lot easier. Determinism is very important in a programming contest situation: if your program succeeds for you, you don't want it to fail or time out when the judges rerun it. Even those algorithms which use random numbers (e.g. Pollard's Rho algorithm and Brent's algorithm for integer factorization) are deterministic by default, because they use pseudorandom numbers with a seed which depends only on the input.

Have fun programming and Python and I wish you great success in Project Euler and in programming contests.

2012-11-15

How to start and kill a Unix process tree

This blog post explains how to start a (sub)process on Unix, and later kill it so that all its children and other descendants are also killed. The sample implementation is written in Python 2.x (≥ 2.4), but it works equally well in other programming languages. It has been tested on Linux, but it should also work on other Unix systems such as *BSD and the Mac OS X.

For example, the following doesn't work as expected:

import os, signal, subprocess, time
p = subprocess.Popen('cat /dev/zero | wc', shell=True)
time.sleep(.3)
os.kill(p.pid, signal.SIGTERM)

The intention here is to kill the long-running command after 0.3 second, but this will kill the shell only, and both cat /dev/zero (the long-running pipe source) and wc will continue running, both of them consuming 100% CPU until killed manually. The trick for killing the shell and all descendant processes is to put them into a process group first, and to kill all processes in the process group in a single step. Doing so is very easy:

import os, signal, subprocess, time
p = subprocess.Popen('sleep  53 & sleep  54 | sleep  56',
                     shell=True, preexec_fn=os.setpgrp)
time.sleep(.3)
print 'killing'
os.kill(-p.pid, signal.SIGTERM)
p.wait()

By specifying preexec_fn=os.setpgrp we ask the child to call the setpgrp() system call (after fork() but before exec...(...)ing the shell), which will create a new process group and put the child inside it. The ID of the process group (PGID) is the same as the process ID (PID) of the child. All descendant processes of the child will be added to this process group. By specifying a negative PID to os.kill (i.e. the kill(...) system call), we ask for all process group members to be killed in a single step. This will effectively kill all 3 sleep processes.

Of course if some of the descendant processes ignore or handle the TERM signal, they won't get killed. We could have sent the KILL signal to forcibly kill all process group members. Sending the INT signal is similar to the TERM signal, except that Bash explicitly sets up ignoring the INT signal for processes started in the background (sleep 53 in the example above), so those processes won't get killed unless they restore the signal handler to the default (by calling the system call signal(SIGINT, SIG_DFL)). sleep doesn't do that, so sleep 53 wouldn't get killed if the INT signal was sent.

Using shell=True above is not essential: the idea of setting up a process group and killing all process group members in a single step works with shell=False as well.

See The Linux kernel: Processes for a ground-up lesson about processes, process groups, sessions and controlling terminals on Linux. This is a must-read to gain basic understanding, because the relevant system call man pages (e.g. man 2 setpgrp) don't define or describe these basic concepts.

The same idea works through SSH, with a few modifications. There is no need for preexec_fn=os.setpgrp, because sshd sets up a new process group for each new connection. We need to do extra work though to get the remote PID, because p.pid would just return the local PID of the short-living first ssh process. Here is a working example:

import subprocess, time
ssh_target = '127.0.0.1'
command = 'sleep  53 & sleep  54 | sleep  56'
p = subprocess.Popen(('ssh', '-T', '--', ssh_target,
                      'echo $$; exec >/dev/null; (%s\n)&' % command),
                     stdout=subprocess.PIPE)
pid = p.communicate()[0]  # Read stdin (containing the remote PID).
assert not p.wait()
pid = int(pid)
time.sleep(.3)
print 'killing'
assert not subprocess.call(('ssh', '-T', '--', ssh_target,
                            'kill -TERM %d' % -pid))

In the SSH example above, kill is usually the built-in kill command of Bash, but it works equally well if it's the /bin/kill external program. The remote PID (same as the remote PGID) is written by echo $$.

The exec >/dev/null command is needed to redirect stdout. Without it the ssh client would wait for an EOF on stdout before closing the connection. This can be useful in some cases, but in our design p.wait() is called early, and the first SSH connection is expected to close early, not waiting for the command to finish.

2012-11-10

How to prevent GNU Screen from switching screens upon termination and detach

This blog post explains how to prevent GNU Screen from restoring the original terminal window contents upon a termination or a detach. The original terminal window contents is how the terminal window looked like before attaching. This is useful if a short-running command is running in Screen and you are interested in its final output.

Run this once:

(echo; echo 'termcapinfo * ti:te') >>~/.screenrc

Screens attached after running this won't restore the original terminal window contents.

Explanation: ti is the termcap equivalent of the terminfo entry smcup, and te is the termcap equivalent of the terminfo entry rmcup. They describe how to switch to the alternate screen and back. Setting them to empty will prevent Screen from switching to the alternate screen upon attach, and switching back to the normal screen upon termination and detach. Read more about these directives on http://chenyufei.info/blog/2011-12-15/prevent-vim-less-clear-screen-on-exit/.

Unfortunately, Screen still moves the cursor to the last line, and writes the [screen is terminating] message by scrolling down to the next line. This can't be disabled, it's hardwired to the GotoPos(0, D_height - 1); call in FinitTerm() function in display.c . Printing many newlines when starting the Screen session improves the situation a bit, because it moves the cursor down.

Bitwise tricks in Java

This blog post shows some tricky uses of bitwise operators in C, mostly to compute some functions quickly, processing many bits at a time in a parallel way.

See also Bitwise tricks in C for the same functions in C and C++.

See more tricks like this in the GNU Classpath source of java.lang.Integer and java.lang.Long. Check the methods bitCount, rotateLeft, rotateRight, highestOneBit, numberOfLeadingZeros, lowestOneBit, numberOfTrailingZeros, reverse.

See even better tricks in Chapter 2 of Hacker's Delight, in Bit Twiddling Hacks and in HAKMEM.

public class BitTricks {
  public static boolean is_power_of_two(int x) {
    return (x & (x - 1)) == 0;
  }

  public static boolean is_power_of_two(long x) {
    return (x & (x - 1)) == 0;
  }

  public static int bitcount(int x) {
    x = (x & 0x55555555) + ((x >> 1) & 0x55555555);
    x = (x & 0x33333333) + ((x >> 2) & 0x33333333);
    x = (x & 0x0F0F0F0F) + ((x >> 4) & 0x0F0F0F0F);
    x = (x & 0x00FF00FF) + ((x >> 8) & 0x00FF00FF);
    x = (x & 0x0000FFFF) + (x >>> 16);
    return x;
  }

  public static int bitcount(long x) {
    x = (x & 0x5555555555555555L) + ((x >>  1) & 0x5555555555555555L);
    x = (x & 0x3333333333333333L) + ((x >>  2) & 0x3333333333333333L);
    x = (x & 0x0F0F0F0F0F0F0F0FL) + ((x >>  4) & 0x0F0F0F0F0F0F0F0FL);
    x = (x & 0x00FF00FF00FF00FFL) + ((x >>  8) & 0x00FF00FF00FF00FFL);
    x = (x & 0x0000FFFF0000FFFFL) + ((x >> 16) & 0x0000FFFF0000FFFFL);
    return (int)x + (int)(x >>> 32);
  }

  public static int reverse(int x) {
    x = (x & 0x55555555) <<  1 | ((x >> 1) & 0x55555555);
    x = (x & 0x33333333) <<  2 | ((x >> 2) & 0x33333333);
    x = (x & 0x0F0F0F0F) <<  4 | ((x >> 4) & 0x0F0F0F0F);
    x = (x & 0x00FF00FF) <<  8 | ((x >> 8) & 0x00FF00FF);
    x =                x << 16 | x >>> 16;
    return x;
  }

  public static long reverse(long x) {
    x = (x & 0x5555555555555555L) <<  1 | ((x >>  1) & 0x5555555555555555L);
    x = (x & 0x3333333333333333L) <<  2 | ((x >>  2) & 0x3333333333333333L);
    x = (x & 0x0F0F0F0F0F0F0F0FL) <<  4 | ((x >>  4) & 0x0F0F0F0F0F0F0F0FL);
    x = (x & 0x00FF00FF00FF00FFL) <<  8 | ((x >>  8) & 0x00FF00FF00FF00FFL);
    x = (x & 0x0000FFFF0000FFFFL) << 16 | ((x >> 16) & 0x0000FFFF0000FFFFL);
    x =                x << 32 | x >>> 32;
    return x;
  }

  public static int floor_log2(int x) {
    x |= x >>> 1;
    x |= x >>> 2;
    x |= x >>> 4;
    x |= x >>> 8;
    x |= x >>> 16;
    return bitcount(x) - 1;
  }

  public static int floor_log2(long x) {
    x |= x >>> 1;
    x |= x >>> 2;
    x |= x >>> 4;
    x |= x >>> 8;
    x |= x >>> 16;
    x |= x >>> 32;
    return bitcount(x) - 1;
  }
}

Bitwise tricks in C

This blog post shows some tricky uses of bitwise operators in C, mostly to compute some functions quickly, processing many bits at a time in a parallel way.

See also Bitwise tricks in Java for the same functions in Java.

#include <stdint.h>

See even better tricks in Chapter 2 of Hacker's Delight, in Bit Twiddling Hacks and in HAKMEM.

char is_power_of_two_32(uint32_t x) {
  return !(x & (x - 1));
}

char is_power_of_two_64(uint64_t x) {
  return !(x & (x - 1));
}

int bitcount_32(uint32_t x) {
  x = (x & 0x55555555) + ((x >> 1) & 0x55555555);
  x = (x & 0x33333333) + ((x >> 2) & 0x33333333);
  x = (x & 0x0F0F0F0F) + ((x >> 4) & 0x0F0F0F0F);
  x = (x & 0x00FF00FF) + ((x >> 8) & 0x00FF00FF);
  x = (x & 0x0000FFFF) + (x >> 16);
  return x;
}

int bitcount_64(uint64_t x) {
  x = (x & 0x5555555555555555LL) + ((x >>  1) & 0x5555555555555555LL);
  x = (x & 0x3333333333333333LL) + ((x >>  2) & 0x3333333333333333LL);
  x = (x & 0x0F0F0F0F0F0F0F0FLL) + ((x >>  4) & 0x0F0F0F0F0F0F0F0FLL);
  x = (x & 0x00FF00FF00FF00FFLL) + ((x >>  8) & 0x00FF00FF00FF00FFLL);
  x = (x & 0x0000FFFF0000FFFFLL) + ((x >> 16) & 0x0000FFFF0000FFFFLL);
  x = (x & 0x00000000FFFFFFFFLL) + (x >> 32);
  return x;
}

uint32_t reverse_32(uint32_t x) {
  x = (x & 0x55555555) <<  1 | ((x >> 1) & 0x55555555);
  x = (x & 0x33333333) <<  2 | ((x >> 2) & 0x33333333);
  x = (x & 0x0F0F0F0F) <<  4 | ((x >> 4) & 0x0F0F0F0F);
  x = (x & 0x00FF00FF) <<  8 | ((x >> 8) & 0x00FF00FF);
  x =                x << 16 | x >> 16;
  return x;
}

uint64_t reverse_64(uint64_t x) {
  x = (x & 0x5555555555555555LL) <<  1 | ((x >>  1) & 0x5555555555555555LL);
  x = (x & 0x3333333333333333LL) <<  2 | ((x >>  2) & 0x3333333333333333LL);
  x = (x & 0x0F0F0F0F0F0F0F0FLL) <<  4 | ((x >>  4) & 0x0F0F0F0F0F0F0F0FLL);
  x = (x & 0x00FF00FF00FF00FFLL) <<  8 | ((x >>  8) & 0x00FF00FF00FF00FFLL);
  x = (x & 0x0000FFFF0000FFFFLL) << 16 | ((x >> 16) & 0x0000FFFF0000FFFFLL);
  x =                x << 32 | x >> 32;
  return x;
}

int floor_log2_32(uint32_t x) {
  x |= x >> 1;
  x |= x >> 2;
  x |= x >> 4;
  x |= x >> 8;
  x |= x >> 16;
  return bitcount_32(x) - 1;
}

int floor_log2_64(uint64_t x) {
  x |= x >> 1;
  x |= x >> 2;
  x |= x >> 4;
  x |= x >> 8;
  x |= x >> 16;
  x |= x >> 32;
  return bitcount_64(x) - 1;
}

2012-11-04

How to use Unicode accented characters in PuTTY in UTF-8 mode

This blog post explains how to set up PuTTY to connect to a Linux server in UTF-8 mode, so that all Unicode characters (including symbols and accented characters) will be transferred and interpreted correctly.

Follow these steps:

Download and install the newest PuTTY (0.62 or later).
Before connecting, configure PuTTY like this:
- Window → Translation → Remote character set: UTF-8
- Connection → Data → Environment variables: add Variable: LC_CTYPE, Value: en_US.UTF-8 , and click on Add.
- Save these settings in Session (e.g. in Session → Default Settings → Save).
Connect to the SSH server with Putty.
If everything (including typing, reading and copy-pasting non-ASCII Unicode characters) already works, stop here.

Run the following commands without the leading $ (you will need the root password). These commands set up the UTF-8 en_US locale on the server. The commands have been verified and found working on Debian and Ubuntu servers.

$ if test -f /etc/locale.gen; then sudo perl -pi -e 's@^# *(en_US.UTF-8 UTF-8)$@$1@' /etc/locale.gen; grep -qxF 'en_US.UTF-8 UTF-8' /etc/locale.gen || (echo; echo 'en_US.UTF-8 UTF-8') >>/etc/locale.gen; fi
$ if test -f /var/lib/locales/supported.d; then grep -qxF 'en_US.UTF-8 UTF-8' || (echo; echo 'en_US.UTF-8 UTF-8') >>/var/lib/locales/supported.d/en; fi
$ sudo perl -pi -e 's@^(?=LC_CTYPE=|LC_ALL=)@#@' /etc/environment
$ sudo /usr/sbin/locale-gen
$ sudo /usr/sbin/update-locale LC_CTYPE LC_ALL

Close the SSH connection window and open a new connection.
Verify that everything (including typing, reading and copy-pasting non-ASCII Unicode characters) works.

Contrary to the information found elsewhere on the net, just setting Window → Translation → Received data assumed to be in which character set or Window → Translation → Remote character set to UTF-8 is not always enough. Setting the server-side environment variables (e.g. setting LC_CTYPE to en_US.UTF-8 above) properly is also required (unless they are already correct). Generating the UTF-8 locale definitions (using locale-gen) on the server is also required (unless they are already generated).

2012-09-22

How to change the volume label to lowercase on an USB stick, memory card or any VFAT filesystem on Ubuntu Lucid

This blog post explains how to change the volume label to lower case on an USB stick, memory card or any other VFAT filesystem. The instructions given were verified for Ubuntu Lucid, but they may work on other Linux systems as well.

On the web some people claim it's not possible, but indeed it is. On the web there are instructions how to change the volume label with GParted or mlabel (part of Mtools) and palimpsest (Disk Utility), but none of them work, because these tools always convert lowercase to uppercase before setting the label. Nautilus, the default file manager for Ubuntu doesn't support changing the volume label at all.

mkdosfs (mkfs.vfat) lets the user specify the volume label using the -n flag, and lowercase letters are kept lowercase, but this tool recreates the filesystem, so all data will be lost.

The non-destructive solution below is a combination of the mlabel and dosfslabel command-line tools.

Connect the device to the computer if not already connected.
Open a terminal window.
Run sudo blkid | grep ' TYPE="vfat"' and </proc/mounts grep ' vfat ' to figure out the name of the device (e.g. /dev/sdb1). Look around in /media etc. to confirm you have picked the right device. If unsure, unplug it, run the commands again, see it disappear, plug it back, and run the commands again.
Unmount the device by running umount /dev/sdb1 (substituting /dev/sdb1 with the name of the device found above). If it was mounted, and the unmount failed, then close some windows, kill some programs (e.g. sudo fuser -m /dev/sdb1), and try unmounting again.
Run sudo env MTOOLS_SKIP_CHECK=1 mlabel -i /dev/sdb1 ::x (substituting /dev/sdb1 with the name of the device found above). If the system can't find mlabel, then install it by running sudo apt-get install mtools , and try again.
Run sudo dosfslabel /dev/sdb1 MyLabel (substituting MyLabel with the desired label and /dev/sdb1 with the name of the device found above). Ignore any warnings about boot sector differences. If the system can't find dosfslabel, then install it by running sudo apt-get install dosfstools , and try again.
Run sudo blkid | grep ' TYPE="vfat"' , and examine its output to verify that the label has been changed properly.
Optionally, unplug the device, and then plug it back in. The system will recognize it, and mount it under /media/MyLabel, without converting lowercase letters in the volume label to uppercase.

Please note that there is an 11 character limit on the length of a VFAT volume label. If you specify a longer label, it will be truncated. There is another restriction: the label can contain only (some) ASCII characters: accented letters etc. won't work.

2012-09-04

How to unlock an Android phone using adb if you know the password

This blog post explains how to unlock an Android phone using adb if you know the password. This can be useful if the touch screen of your phone is broken.

If you don't know the password, try any of the 9 methods in the article http://joyofandroid.com/how-to-unlock-android-phone/ instead.

Prerequisites:

You have a computer running Windows, Mac OS X or Linux.
You have a USB cable with which you can connect the phone to the computer.
USB debugging has been enabled on the phone. You can enable it in the Settings / Development menu (but you need a working touch screen for that).

Install the adb command-line tool. It's part of the Platform SDK Tools SDK package. First download the Android SDK, then run the tools/android GUI tool, select Platform SDK Tools and install. The adb binary will be downloaded to platform-tools/adb .
On Linux, follow these steps to make sure that your user has the permission to access the phone.
Don't connect your phone yet to the computer via USB.
Run adb devices and verify that it doesn't see the phone.
If not enabled yet, enable USB debugging on the phone, in the Settings / Development menu.
Connect the phone to your computer using an USB cable.
Run adb devices and verify that it sees your phone.
You will have to run the following two commands very quickly, i.e. faster than the screen blanking time on the phone.
Run adb shell input text PASSWORD, replacing PASSWORD with your Android unlock password.
Run adb shell input keyevent 66 to simulate pressing the Enter key. (See this page for event codes of other keys.)

2012-08-14

How to use duplicity on Ubuntu Lucid to make backups to Google Drive

This blog post is a tutorial explaining how to install duplicity to an Ubuntu Lucid system and how to make backups to Google Drive using duplicity.

Installation

Run these commands (without the leading $) in a terminal window:

$ sudo apt-get install python-setuptools python-dev gcc libc6-dev gnupg librsync-dev screen
$ sudo easy_install gdata
$ sudo easy_install http://code.launchpad.net/duplicity/0.6-series/0.6.19/+download/duplicity-0.6.19.tar.gz
$ duplicity --version
duplicity 0.6.19

Starting a backup

It's recommended to issue the following command from screen, so they don't get aborted if there is a problem with the terminal window.

Example terminal window interaction to start a backup:

$ duplicity ~/Documents gdocs://USERNAME@gmail.com/documents.backup
Password for 'USERNAME@gmail.com':
Local and Remote metadata are synchronized, no sync needed.
Last full backup date: none
GnuPG passphrase:
Retype passphrase to confirm:
No signatures found, switching to full backup.
--------------[ Backup Statistics ]--------------
...
Errors 0
-------------------------------------------------

When duplicity asks you for your Google Account (@gmail.com) password, you have to type your regular password, i.e. the password you use to log in to Gmail and Google Drive, unless you are using 2-step authentication for logging in. In that case you need to generate an application-specific password (at the bottom of this page), and copy-paste it to duplicity.

2012-08-02

How to format a double with 5 digits of precision in pure Java

This blog post shows pure Java code to format a double with 5 digits of precision, i.e. non-scientific notation, rounded to at most 5 digits after the decimal point, can be very long in front of the decimal point.

Java's built-in NumberFormat class can be used (see its invocation in the check method below), however that class not available in all JVMs (e.g. Avian). Another option is to convert the double to a string (e.g. "" +d or Double.toString(d) or String.valueOf(d)), and manually analyze the string to convert the scientific notation (e.g. 123.456e78) to decimal notation (see the numberFormat5assumes method below). However, in some JVMS (e.g. Avian) Double.toString(d) returns only 5 digits at most in total, so it loses lots of precision. To work around this, we can convert the double to a long (but divide large doubles first so they would fit), convert the long to a String, and add the decimal point manually (see the numberFormat5 method in the code below). This last solution is inaccurate for large doubles (because of the divisions by powers of 10 involved).

For a more precise, but much more complicated implementation, see dtoa.java, see more on this StackOverflow page.

import java.text.NumberFormat;
import java.util.Locale;

public class nf {
  // This implementation assumes that Double.toString returns the most accurate
  // possible strings.
  public static String numberFormat5assumes(double d) {
    if (Double.isNaN(d)) return "\ufffd";
    if (Double.isInfinite(d)) return d < 0 ? "-\u221e" : "\u221e";
    if (-1 < d && d < 1) {
      boolean isNegative = d < 0 || (d == 0 && "-0.0".equals("" + d));
      if (isNegative) d = -d;
      String sa = "0." + ((100000 + (int)(.5 + d * 100000)) + "").substring(1);
      int i = sa.length();
      while (i > 0 && sa.charAt(i - 1) == '0') {
        --i;
      }
      if (i == 2) i = 1;
      sa = sa.substring(0, i);
      return isNegative ? "-" + sa : sa;
    } else {
      String s = "" + d;
      char c;
      int i = s.length() - 1;
      while (i > 0 && (c = s.charAt(i - 1)) != 'e' && c != 'E') {
        if ((c < '0' || c > '9') && c != '.' && c != '-') {
          throw new RuntimeException("Bad double: " + s);
        }
        --i;
      }
      int j = i;
      int e = 0;
      if (i > 0) {
        while (j < s.length()) {
          e = 10 * e + s.charAt(j++) - '0';
        }
        --i;
      } else {
        i = s.length();
      }
      char o[] = new char[s.length() + e];
      j = 0;
      int w = 0;
      if (s.charAt(0) == '-') {
        o[w++] = '-';
        ++j;
      }
      int t = j;
      while (j < i) {
        if ((c = s.charAt(j)) == '.') {
          t = j + 1;
        } else {
          o[w++] = c;
        }
        ++j;
      }
      if (j - t > e) {
        i = w++;
        while (j - t > e) {
          o[i] = o[i - 1];
          --i;
          ++t;
        }
        o[i++] = '.';
        if (w - i > 5) {
          w = i + 5;
          if (o[i + 5] >= '5') {  // Round up.  (Should be >5.)
            j = i + 5;
            while (j > 0) {
              --j;
              if (o[j] == '.') continue;
              if (o[j] == '-') break;
              if (o[j] != '9') { ++o[j]; j = -1; break; }
              o[j] = '0';
            }
            if (j == 0) {
              if (o[j] == '-') ++j;
              o[j++] = '1';
              while (j < w) {
                if (o[j] == '.') {
                  o[j++] = '0';
                  o[j++] = '.';
                  i = j;
                  break;
                } else {
                  o[j++] = '0';
                }
              }
              w = j;
            }
          }
        }
        while (w > i && o[w - 1] == '0') {
          --w;
        }
        if (w == i) {  // "." -> "".
          --w;
        }
      } else {
        while (j - t < e) {
          o[w++] = '0';
          --t;
        }
      }
      return new String(o, 0, w);
    }
  }

  // This implementation doesn't use Double.toString at all (but it uses the
  // `(long)aDouble' conversion). It's a bit less accurate (can be as few as
  // 16 correct digits out of 21) for very large doubles (abs(d)>=1e13).
  public static String numberFormat5(double d) {
    if (Double.isNaN(d)) return "\ufffd";
    if (Double.isInfinite(d)) return d < 0 ? "-\u221e" : "\u221e";
    boolean isNegative = d < 0 || (d == 0 &&
        "-0.0".equals("" + d) || "-0".equals("" + d));
    if (isNegative) d = -d;
    String s;
    if (d >= 10000000000000.0) {  // 13 zeros.
      // 9223372036854775807 == Long.MAX_VALUE.
      // 1000000000000000000 has 13+5 zeros.
      // TODO: Instead of 9.223e13, check for 9.223372036854775807e13.
      int c = 0;
      // These divisions below are a bit inaccurate, but doing them accurately
      // would need >1000 lines of code. Example inaccuracies:
      //
      // -5.555333333333333E20: the first 18 digits (out of 21) are correct.
      // 1.7976931348623157E308: the first 16 digits are correct.
      while (d >= 9.223e25)   { d /= 10000000000000.0; c += 13; }
      if (d >= 9.223e24)      { d /= 1000000000000.0; c += 12; }
      else if (d >= 9.223e23) { d /= 100000000000.0; c += 11; }
      else if (d >= 9.223e22) { d /= 10000000000.0; c += 10; }
      else if (d >= 9.223e21) { d /= 1000000000.0; c += 9; }
      else if (d >= 9.223e20) { d /= 100000000.0; c += 8; }
      else if (d >= 9.223e19) { d /= 10000000.0; c += 7; }
      else if (d >= 9.223e18) { d /= 1000000.0; c += 6; }
      else if (d >= 9.223e17) { d /= 100000.0; c += 5; }
      else if (d >= 9.223e16) { d /= 10000.0; c += 4; }
      else if (d >= 9.223e15) { d /= 1000.0; c += 3; }
      else if (d >= 9.223e14) { d /= 100.0; c += 2; }
      else if (d >= 9.223e13) { d /= 10.0; c += 1; }
      char cs[] = new char[c];
      while (c > 0) {
        cs[--c] = '0';
      }
      double e = d * 100000.0 + 0.5;
      s = (long)e + new String(cs);
    } else {
      // We have to introduce a temporary variable (e) here for i386 gcj-4.4
      // on Ubuntu Lucid (4.4.3-1ubuntu4.1), without optimization flags.
      // Without this temporary variable it would convert 0.834375 to 83437
      // instead of the correct 83438.
      //
      //   gcj-4.4 -o nf --main=nf nf.java && ./nf
      double e = d * 100000.0 + 0.5;
      s = (long)e + "";
    }
    int i = s.length();
    int j = s.length() - 5;
    while (i > j && i > 0 && s.charAt(i - 1) == '0') {
      --i;
    }
    if (i == 0) {
      s = "0";
    } else if (i == j) {  // Found an integer.
      s = s.substring(0, j);
    } else if (j <= 0) {  // Found a number between 0 and 1.
      s = "0.00000".substring(0, 2 - j) + s.substring(0, i);
    } else {
      s = s.substring(0, j) + "." + s.substring(j, i);
    }
    return isNegative ? "-" + s : s;
  }

  public static void check(double d) {
    NumberFormat nf = NumberFormat.getInstance(Locale.US);
    nf.setMinimumFractionDigits(0);
    nf.setMaximumFractionDigits(5);
    nf.setGroupingUsed(false);
    String a = nf.format(d);
    String b = numberFormat5(d);
    if (!(a.equals(b))) {
      System.err.println(d + ": " + a + " != " + b);
    }
    System.out.println("    check2(" + d + ", \"" + a + "\");");
  }

  public static void main(String[] args) {
    check(42.0);
    check(42.7);
    check(-42.7);
    check(-42.7654321);
    check(-555533333333333333342.7654321);  // numberFormat5 is inaccurate.
    check(Double.NaN);  // FYI gcj-4.4 NumberFormat emits "NaN', openjdk-6 emits "\ufffd".
    check(Double.MIN_VALUE);
    check(Double.MAX_VALUE);  // numberFormat5 is inaccurate.
    check(Double.NEGATIVE_INFINITY);
    check(Double.POSITIVE_INFINITY);
    check(-0.000000034);
    check(0.0);
    check(-0.0);  // FYI gcj-4.4 NumberFormat emits "0", openjdk-6 emits "-0".
    check(-0.7654321);
    check(-0.3456789);
    check(-0.34);
    check(-0.056);
    check(0.0078);
    check(123.456);
    check(-123.456);
    check(-123.456789);
    check(-123.450009);
    check(123.450005);  // NumberFormat is inaccurate: 123.45 != 123.45001.
    check(123.450006);
    check(123.499996);
    check(-123.450003);
    check(-99.999995);
    check(999.999995);
    check(-123.999999);
    check(-123.899999);
    check(0.834375);
    check(-0.834375);
  }
}

pts.blog

Flattr this blog

Blog Archive

Recommended