StringBufferInputStream -> ByteArrayInputStream and other changes
Created by: sudharsh
StringBufferInputStream
has issues converting chars into bytes and is a deprecated class
http://docs.oracle.com/javase/7/docs/api/java/io/StringBufferInputStream.html. This meant that parsing from strings returned wrong results (in many cases, empty content). In my case, using FileInputStream
wasn't an option. Anything that would make use of String
at some point would screw up the raw data.
Therefore, I have replaced StringBufferInputStream
with ByteArrayInputStream
in the jcc
args.
As you can see, I have reorganized the directory structure and bumped up tika to 1.1. I have also added a new module called parser
exposing from_file
and from_buffer
functions for the lazy ones out there.
Have tested the changes on Mac and Linux.