We changed our name from IT Central Station: Here's why
2021-11-22T18:00:00Z

What is your recommended RPA tool for complex document data extraction?


Hi peers,

I'm working as the VP of Business Development at a Tech Services Company. Which is your recommended tool for cross-industry (the use case starts with Accounts payable (AP))?

Also, what are the purchasing costs (approximate ones) associated with this tool?

Thanks

ITCS user
Guest
33 Answers

author avatar
Top 20Vendor

Hello George,


My tool recommendation is EMMA RPA (Robotic Process Automation - see https://www.wianco.com/en/emma... for further details). 


Why is it my recommendation? As Managing Director of WIANCO OTT Robotics, it might look obvious, why I am recommending our solution EMMA RPA, but I also can bring you in contact with a "Big4" customer of us who performed a major research and tool testing on this topic in order to extract data from energy passports for real estate properties in various scan qualities.


Their result is, that EMMA RPA extracted the data correctly of 94% of all scanned documents that had very bad quality. Of course, the ones with good scan quality the data was extracted correctly in 100% of the cases.


Please let me know if you are interested in getting together with that contact. I can also send you the presentation slides of EMMA RPA that contain further interesting benchmarks that are important when evaluating such a solution.


Have a great week and kind regards,


Michael

2021-11-23T21:56:53Z
author avatar
ExpertModeratorReal User

Hi @George Bennett ​,


Have used Jiffy.ai at one of my customer place with good feedback and closure of project as expected by the end users.


Would present some of the feature Jiffy provides for document extraction below : 


Document Processing is the conversion of paper-based and electronic documents into digital information using the combination of Intelligent Character Recognition (ICR), Optical Character Recognition (OCR), Machine Learning (ML) algorithm, and necessary manual interventions.


Types of Documents


The types of documents and the nodes which are used to process them are listed below.



  • Documents in PDF format:


    • Use Doc Reader node to process Structured and Semi-Structured documents.






    • JIFFY.ai is not handling unstructured documents currently.




  • Use Excel node to process documents in Excel format.


If the document contains image, install ABBY Fine Reader to convert image to editable text and pass it through the Doc Reader node to extract the data.


Out of the Box Capabilities


In JIFFY.ai, Invoice and Bill of Lading are provided as predefined schemas for ease of use. Invoice schema comes with thirty-five predefined fields and Bill of Lading schema with twelve predefined fields. Jiffy.ai automatically extracts information from these documents without any training and provides out-of-the-box machine learning models for these document types.


The model is already trained for the predefined schemas. When an Invoice or Bill of Lading is processed through the Doc Reader node, you do not need to train the ML. The data is extracted automatically from the documents using the built-in extraction modules.


For other documents, you may have to train using the point and click familiarization environment provided.


How is Document Processing Done in JIFFY.ai?


Document processing is achieved in four phases:



  • Create a Document Table with the required columns for the fields being extracted from the document. Document Table is the persistence layer to store, track and present extracted contents of the document being processed.



  • Design the task using the Doc Reader node to extract the fields from the document.



  • Execute the task to:


    • Categorize the documents: classify the document type and identify the classification group that the document falls in, based on the format of the document.





    • Populate the data into Document Table to store, track and present extracted contents of the document being processed. If Document Table is created using predefined schema, ML auto-extracts the required data and assigns a category based on the template of the document.




  • Familiarize the document: A user-friendly interface is provided to:


    • Point the labels and data to be extracted from the document, thereby training the model for the category of document being processed.





    • Verify and approve the fields extracted by the model.



If Document Table is created using custom schema, the fields are auto-extracted based on the existing trained model.


In an Invoice Processing HyperApp:


  1. A Document Table with name InvoiceTable is created using Invoice schema.

  1. A Task is designed with Doc Reader node to extract the fields from the Invoice.

  1. The Task is executed to extract the fields.

The document is familiarized, saved, and approved to train the ML engine for the category of document being processed. The approved fields are populated into InvoiceTable for further processing.



2021-11-26T02:55:09Z
author avatar
Top 5Reseller

I would recommend evaluating HelpSystems' RPA product AutoMate! 

2021-11-24T16:29:22Z
Find out what your peers are saying about UiPath, Automation Anywhere, Microsoft and others in Robotic Process Automation (RPA). Updated: January 2022.
564,599 professionals have used our research since 2012.