Daily Archives: 2007-11-12

Reading the contents of a file in Java is a very straightforward operation. The java tutorial explains completely how you can find your way to read different types of streams. However when you read an UTF-8 encoded file your fight will start. Most of the UTF-8 and UTF-16 encoded files contain a character at the beginning called BOM (byte-order mark). The BOM consists of a character  (U+FEFF) at the beginning of the stream used to define the byte order and the encoding of the stream. UTF encoded files may or may not contain the BOM. The problem with Java comes when […]

Reading UTF-8 encoded documents in java