Reading a whole file to a String in Java is tricky, one has to pay attention to many aspects:
- Read with the proper character set (encoding).
- Don't ignore the newline at the end of the file.
- Don't waste CPU and memory by adding String objects in a loop (use a StringBuffer or an ArrayList<String> instead).
- Don't waste memory (by line-buffering or double-buffering).
See my solution at http://stackoverflow.com/questions/1656797/how-to-read-a-file-into-string-in-java/1708115#1708115
For your convenience, here it is my code:
// charsetName can be null to use the default charset. public static String readFileAsString(String fileName, String charsetName) throws java.io.IOException { java.io.InputStream is = new java.io.FileInputStream(fileName); try { final int bufsize = 4096; int available = is.available(); byte data[] = new byte[available < bufsize ? bufsize : available]; int used = 0; while (true) { if (data.length - used < bufsize) { byte newData[] = new byte[data.length << 1]; System.arraycopy(data, 0, newData, 0, used); data = newData; } int got = is.read(data, used, data.length - used); if (got <= 0) break; used += got; } return charsetName != null ? new String(data, 0, used, charsetName) : new String(data, 0, used); } finally { is.close(); } }
@müzso: Thanks for your comments. I agree that a larger block size might be faster.
ReplyDeleteFeel free to write your implementation, and to measure the speed difference.
I guess you mean FileReader instead of InputStreamReader -- never mind, it doesn't make much difference.
I'd never use a *Reader for reading the entire file, because there might be an UTF-8 multibyte sequence at the buffer boundary. Detecting and cutting that would make buffering inefficient.
I'd never use a StringBuilder for reading the entire file, because it involves copying the data unnecessarily.
I doubt that there is a code faster than in my blog post to read an entire file. If you can contradict that, please give an example.