Skip to content

StringBufferInputStream -> ByteArrayInputStream and other changes

decentral1se requested to merge cruns:master into master

Created by: sudharsh

StringBufferInputStream has issues converting chars into bytes and is a deprecated class http://docs.oracle.com/javase/7/docs/api/java/io/StringBufferInputStream.html. This meant that parsing from strings returned wrong results (in many cases, empty content). In my case, using FileInputStream wasn't an option. Anything that would make use of String at some point would screw up the raw data.

Therefore, I have replaced StringBufferInputStream with ByteArrayInputStream in the jcc args.

As you can see, I have reorganized the directory structure and bumped up tika to 1.1. I have also added a new module called parser exposing from_file and from_buffer functions for the lazy ones out there.

Have tested the changes on Mac and Linux.

Merge request reports