How to read a whole file to String in Java

Reading a whole file to a String in Java is tricky, one has to pay attention to many aspects:

  • Read with the proper character set (encoding).
  • Don't ignore the newline at the end of the file.
  • Don't waste CPU and memory by adding String objects in a loop (use a StringBuffer or an ArrayList<String> instead).
  • Don't waste memory (by line-buffering or double-buffering).

See my solution at http://stackoverflow.com/questions/1656797/how-to-read-a-file-into-string-in-java/1708115#1708115

For your convenience, here it is my code:

// charsetName can be null to use the default charset.    
public static String readFileAsString(String fileName, String charsetName)    
    throws java.io.IOException {    
  java.io.InputStream is = new java.io.FileInputStream(fileName);    
  try {    
    final int bufsize = 4096;    
    int available = is.available();    
    byte data[] = new byte[available < bufsize ? bufsize : available];    
    int used = 0;    
    while (true) {    
      if (data.length - used < bufsize) {    
        byte newData[] = new byte[data.length << 1];    
        System.arraycopy(data, 0, newData, 0, used);    
        data = newData;    
      int got = is.read(data, used, data.length - used);    
      if (got <= 0) break;    
      used += got;    
    return charsetName != null ? new String(data, 0, used, charsetName)    
                               : new String(data, 0, used);    
  } finally {

1 comment:

pts said...

@m├╝zso: Thanks for your comments. I agree that a larger block size might be faster.

Feel free to write your implementation, and to measure the speed difference.

I guess you mean FileReader instead of InputStreamReader -- never mind, it doesn't make much difference.

I'd never use a *Reader for reading the entire file, because there might be an UTF-8 multibyte sequence at the buffer boundary. Detecting and cutting that would make buffering inefficient.

I'd never use a StringBuilder for reading the entire file, because it involves copying the data unnecessarily.

I doubt that there is a code faster than in my blog post to read an entire file. If you can contradict that, please give an example.