OCR Xpress for Linux Lets You Convert Text and Images to PDF
|Richard Harris in Programming Tuesday, September 15, 2015|
Accusoft has released OCR Xpress for Linux which offers text extraction and conversion. OCR Xpress provides developers with a streamlined version of its OCR SDK offering an accurate and easy-to-use SDK that simplifies the extraction of text from images and documents into searchable PDFs or text.
OCR Xpress is Accusoft’s first Linux-based OCR offering which provides the opportunity to integrate text recognition and extraction into applications using a Linux C/C++ API. OCR Xpress can recognize and extract text from black and white or color images and convert the images to searchable PDFs or text for document indexing.
OCR Xpress is fast and accurate, reducing manual input and providing confidence values for each character, as well as providing versatility in output via PDF image over text, text, or in-memory data structure files.
Highlights of the platform include:
- Simple, straightforward setup, with a clean, easy to use API for quick integration into your applications. High level API allows developers to easily convert an image to text or searchable PDF with only 9 lines of code. For other more complex implementations, developers can easily access document information in data structures, such as page, paragraph, text line and character data.
- Convert images into text and reduce manual data entry with high levels of accuracy that meet or exceed industry best standards. Confidence values are returned with each character, enabling you to check your extraction results.
- Convert full-page and multi-page images to text output. Allows output to multi-page, searchable PDF files.
- Compatible with various page layouts, including interspersed photos and graphics within text. Image-over-text PDF output has searchable text that aligns with the text on the image, and adjusts for varying font sizes on the page.
- For images to be searchable, export to PDF image-over-text documents. Export to text files for use in updating metadata or tag data for image based documents in your ECM system.
- Access results through an output structure that provides information about layout, content and recognition confidence. Developers can access the text at multiple levels including a text line, a word within that line, or a character within that line or word.
Accusoft is offering a free trial of the SDK to allow developers to test out the platform’s functionality.
Read more: https://www.accusoft.com/products/ocr-xpress/overv...