Various estimates indicate that 80% of enterprise data is unstructured and it is difficult to automate processes having unstructured data with traditional automation tools. There is a rising need for enterprises to process large volumes of semi-structured and unstructured documents with greater accuracy and speed. While RPA enables automation of information from legacy, third-party and web apps (i.e. surface automation), is not a good fit with unstructured information sources (e.g., documents, emails, and attachments). Unstructured data in simple terms is information that is not arranged according to a pre-set data model or schema, and which cannot be readily stored in a traditional relational database. So how do enterprise tackle the issue of processing unstructured data in document-centric processes? While optical character recognition (OCR) helps with digitization of paper-based information assets, its inherent quality issues are hard to ignore (i.e. accuracy with legacy OCR is <60%). That is where intelligent document processing (IDP) solutions can process semi-structured & unstructured data and convert it to structured format to be further processed by RPA or other downstream systems.
Organizations across all vertical industries continue to use documents as a major source of data input. These documents have unstructured data and require knowledge workers for manual data entry, exception management and quality checks, making document processing a cumbersome, time-consuming and costly proposition. Everest Group estimates that enterprises of all sizes (large and small-and medium-sized enterprises) spent about $400 million on IDP software in 2018 and that figure increased to about $550 million in 2019. It is easy to visualize that unorganized market for machine learning (ML)-enabled document processing is large enough for packaged IDP solutions to make inroads.
Know Your Customer (KYC), invoice processing, insurance claims, patient onboarding, patient records, proof of delivery, and order forms are some of the key use cases for IDP solutions. IDP software is of good use in industry-specific processes, such as customer on boarding, mortgage processing, trade finance, and legal documents. Within finance and accounting domain, accounts payable and accounts receivable are common use cases for IDP, understandably given the high volume and error-prone nature of such processes.
In general, the expectation with IDP software is that users should need to do only minimal training for minor template changes. However, enterprises that deal with hundreds to thousands of vendors on a monthly basis realize that creating and maintaining templates for invoices is a cumbersome process. The number of consulting hours devoted to get up and running with templates for disparate document types can quickly add to overall costs. In such cases, it is easy to realize that a template-free approach to IDP can significantly reduce total cost of ownership (TCO) of IDP and enables faster time-to-automation. No need to wait for months and months just for creating templates, without even doing any real document automation.
Intelligent automation, in simpler terms combines artificial intelligence (e.g., natural language processing, machine learning, and computer vision) with RPA and document capture and processing capabilities. For end-to-end automation, IDP solutions are used to ingest unstructured data in workflows, and AI/ML capabilities are used to achieve a greater degree of straight-through processing (STP) with accuracy.
Pre-built AI/ML capabilities and business rules enable automated verification & validation of data and continuous learning & improvements based on AI/ML algorithms and user inputs. IDP combines OCR, data capture, and AI/ML to automate the retrieval, understanding, and integration of documents required for executing a business process. Automation tools, such as RPA, IDP, and APIs can be used together to achieve end-to-end process automation. While RPA offers a process-centric approach to automation, IDP enables data-led automation of documents containing unstructured and semi-structured data.