Get in Touch
2019 Gartner Peer Insights ‘Voice of the Customer’: Robotic Process Automation Software
     Analyst Report

2019 Gartner Peer Insights ‘Voice of the Customer’: Robotic Process Automation Software

Read More →

Everest Group PEAK Matrix For Robotic Process Automation (RPA) Technology Vendors - 2019
     Analyst Report

Everest Group PEAK Matrix For Robotic Process Automation (RPA) Technology Vendors - 2019

Read More →

Subscribe to Datamatics Updates

 

Datamatics Blogs

Top 38 pre-processing must haves for Intelligent Data Capture

by Rajesh Agarwal, on Jul 19, 2019 7:37:36 PM

Estimated reading time: 5 mins

Paper-based processing still exists. It is going to stay for quite some time. Yes, not in just small business pockets but in a good 25-30% of business operation scenarios. When converted into monetary form, this aspect of business processing amounts to double digit millions annually in terms of revenue. The theme is majorly recurrent in BFSI and Finance & Accounts and Procurement sections of almost all Manufacturing, Telecom, Supply Chain, and Research & Analytics companies. Yet when it comes to business, you cannot compromise on speed, efficiency, and quality. 

Top 38 pre-processing must haves for Intelligent Data Capture

As a matter of fact, Automation cannot take place without digitization. Simply put, Digitization is the stepping stone to your Digital Transformation journey.

It is interesting to note that, Document Processing or Optical Character Recognition (OCR) as it is popularly known, helps to digitize paper based enterprise assets. This actually leads to the materialization and fulfillment of complex business use cases. 

However, the fact remains that OCR has inherent quality issues. A hindrance in the form of quality of the digitized asset renders even hi-tech technologies, such as Robotic Process Automation (RPA) and Intelligent Automation (IA), simply ineffective. Here, Intelligent Document Processing, more popularly known as Intelligent Data Capture, is the way ahead. It enables you to read and ingest text from an image, thus making use cases such as Tab Banking, On-mobile Onboarding, and faster claim processing a matter of few minutes as against hours and days required in the bygone years.  

Intelligent Data Capture is an integrated solution, which has features of Optical Character Recognition (OCR), Optical Mark Recognition (OMR), as well as Intelligent Character Recognition (ICR). Learn how Intelligent Data Capture helps in processing health claims by seamlessly reading characters, tick marks, and hand-written characters. Watch now > 

Use Case_Artificial Intelligence Powered Optical Character Recognition (OCR) Solution

 

Technology is increasingly transforming insurance and healthcare processes to achieve savings and cost reductions. Read more >

Robotic Process Automation – A holistic approach to operations in Healthcare Insurance

 

What is Intelligent Data Capture?

Intelligent Data Capture is the process of capturing data from all types of documents including “unstructured ones” such as email, text, PDF, scanned documents, etc., classifying it into categories, and extracting relevant information for further processing. The software solutions for Intelligent Data Capture use Artificial Intelligence algorithms to extract the data in a template free mode, process it, and then feed it into different applications, databases, and downstream systems. 

However, at times the image itself is not clear, has carbon smudges, is skewed, and not properly oriented. At times, it could be a dot matrix print or have high noise and contrast. All this results in an inefficient data capture output as per the popular concept “Garbage in Garbage out” or “GIGO”.   

Intelligent Data Capture complements RPA to achieve total automation. Read more >

img123

 

It is interesting to note that the reliability and authenticity of the data captured depends on the clarity and effectiveness of the image captured. This calls for pre-processing of the image prior to data capture in order to enhance the image quality and improve the capturing process. It also requires certain post-processing to improve the quality of the data captured.  

Top 38 pre-processing features for an accurate and efficient OCR:  

  • De-skew
  • Sub-image
  • Noise removal
  • Lines
  • Vertical registration
  • Resize
  • Smoothing & completion
  • Inverse text correction
  • Horizontal registration
  • AutoRotate
  • Intelligent crop
  • Manual rotate
  • Manual crop and pad
  • Contrast
  • Brightness
  • Hue
  • RGB separation
  • Dotted line
  • Test registration
  • In painting
  • Stamp removal
  • Edge smoothening
  • Character smoothening
  • Character thinning
  • Character separation
  • Back ground cleaning
  • Perimeter recognition
  • Contouring
  • Remove handwritten noise
  • Page recognition
  • Form bursting
  • Color drop out
  • Remove grey
  • Carbon cleaning
  • Grow
  • Filter
  • Gamma
  • Mirror

OCR issues negate the benefits reaped through automation. The aforementioned 38 functionalities work together in tandem and help you generate a 99.0% perfect Intelligent Data Capture.

  1. De-Skew: Straightens skewed images
  2. Sub-Image: Separates out an area from the original document image prior to processing
  3. Noise Removal: Removes isolated specks and machine dot shading
  4. Lines: Offers settings for horizontal and vertical line removal and reporting
  5. Vertical Registration: Registers to a particular point using vertical lines
  6. Resize: Use these settings to "stretch" or "shrink" an image to a new size
  7. Smoothing & completion: Smoothens characters for better OCR reading
  8. Inverse Text Correction: Converts white text on black background to normal black-on-white text and makes OCR reading of such text possible
  9. Horizontal registration: Registers to a particular point using horizontal lines
  10. Auto-rotate: Performs automatic image rotation
  11. Intelligent crop: Automatically removes thick black or white borders from an image
  12. Manual rotate: Offers manual rotation to get correct orientation
  13. Manual crop and pad: Performs manual crop to add or delete pixles on image size
  14. Contrast: To increase or decrease contrast
  15. Brightness: To increase or decrease brightness
  16. Hue: Improves color depth
  17. RGB separation: Removes RGB color one by one
  18. Dotted line: Removes dotted lines for better OCRing
  19. Test registration: Aligns all images at a particular text
  20. In painting: Removes water marks incorporated as a separate layer
  21. Stamp removal: Removes stamp marks, which are in specific pre-defined color
  22. Edge smoothening: Makes lines perfect
  23. Character smoothening: Makes characters perfect
  24. Character thinning: Makes characters thin
  25. Character separation: Separates machine print words for better readability
  26. Back ground cleaning: Removes background
  27. Perimeter recognition: Allows boundary recognition for box type shapes
  28. Contouring: Allows boundary recognition for non-standard shapes
  29. Remove handwritten noise: Removes handwritten characters
  30. Page recognition: Allows to recognize the page
  31. Form bursting: Explodes a page into multiple sub section
  32. Color drop-out: Removes color that is redundant - RGB/CMK, etc
  33. Remove grey: Removes grey shaded background
  34. Carbon cleaning: Removes carbon marks and smudges to the maximum extent possible
  35. Grow: Makes the lighter text dark
  36. Filter: Offers filter for Blurr/Dilate/Median
  37. Gamma: Allows to set relation between the black and white pixels
  38. Mirror: Flips the image so that text can be visible

These 38 pre-processing Intelligent Data Capture functionalities prove to be the deciding factor between bad OCR output and good OCR output after image enhancement, thereby determining the success of the overall automation effort or otherwise. These features are instrumental in not only enhancing the image quality but also making total automation and a paperless office a business reality. 

Intelligent Data Capture takes OCR to a new level with its image post-processing features. Read more >

Intelligent Data Capture – Take OCR to a new level with image post-processing

 

White paper on "The A-Z of Intelligent Data Capture and why it is more than just OCR" explains how Intelligent Data Capture brings enterprises having a paper-based work environment on the same level as their digital-born counterparts. Download now >

White paper_The A-Z of Intelligent Data Capture and why it is more than just OCR

 

Intelligent Data Capture and allied technologies help to extend RPA in many ways. Learn about the 6 smart ways in which an enterprise-grade RPA product can be extended. Read more >

What can RPA be used for – 6 smart ways in which an enterprise grade RPA product can be extended to achieve maximum mileage

 

Learn how enterprises are taking their first step towards Digital Transformation with Intelligent Document Processing. Watch now >

Use Case_Intelligent Document Processing - Your First Step Towards Digital Transformation

 

Get a detailed view of the Intelligent Document Processing (IDP) technology vendor landscape. Read the Everest Group Peak Matrix Report for Intelligent Document Processing (IDP) Technology Vendors. Download now >

Everest Group Peak Matrix for Intelligent Document Processing (IDP) Technology Vendors

 

In summary:

Intelligent Data Capture along with RPA and IA provide a phenomenal success in many use cases, which were rendered simply impossible till a few years ago. The very fact that information from unstructured data sources such as a PDF, a printout, or even an image can be read and captured to update databases and downstream systems was highly unbelievable. Today, Intelligent Data Capture is a strong business enabler. It makes 3-minute on-boarding a digital reality, not only saving revenue in order of millions but also allowing you to do more with the same number of resources. Having said this, Intelligent Data Capture is just a milestone in the RPA and IA journey while leaving scope for more high-tech advancement in the near future.

Next reading: 

Topics:Optical Character Recognition (OCR)Intelligent Data CaptureDocument Processing