Top 38 pre-processing must haves for Intelligent Data Capture
by Rajesh Agarwal, on Jul 19, 2019 7:37:36 PM
Estimated reading time: 5 mins
Paper-based processing still exists. It is going to stay for quite some time. Yes, not in just small business pockets but in a good 25-30% of business operation scenarios. When converted into monetary form, this aspect of business processing amounts to double digit millions annually in terms of revenue. The theme is majorly recurrent in Finance & Accounts and Procurement sections of almost all BFSI, Manufacturing, Telecom, Supply Chain, and Research & Analytics companies. Yet when it comes to business, you cannot compromise on speed, efficiency, and quality.
As a matter of fact, Automation cannot take place without digitization. Simply put, Digitization is the stepping stone to your Digital Transformation journey.
|Read blog on “Complement RPA with Intelligent Data Capture to achieve total automation”|
It is interesting to note that Optical Character Recognition (OCR) helps to digitize paper based enterprise assets. This actually leads to the materialization and fulfillment of complex business use cases.
However, the fact remains that OCR has inherent quality issues. A hindrance in the form of quality of the digitized asset renders even hi-tech technologies, such as Robotic Process Automation (RPA) and Intelligent Automation (IA), simply ineffective. Here, Intelligent Document Processing, more popularly known as Intelligent Data Capture, is the way ahead. It enables you to read and ingest text from an image making use cases such as Tab Banking, On-mobile Onboarding, and faster claim processing a matter of few minutes as against hours and days required in the bygone years.
Intelligent data capture is the process of capturing data from all types of documents including “unstructured ones” such as email, text, PDF, scanned documents, etc., classifying it into categories, and extracting relevant information for further processing. The software solutions for Intelligent Data Capture use Artificial Intelligence algorithms to extract the data in a template free mode, process it and then feed it into different applications, databases, and downstream systems.
However, at times the image itself is not clear, has carbon smudges, it is skewed, and not properly oriented. At times, it could be a dot matrix print or have high noise and contrast. All this results in an inefficient data capture output as per the popular concept “Garbage in Garbage out” or “GIGO”.
It is interesting to note that the reliability and authenticity of the data captured depends on the clarity and effectiveness of the image captured. This calls for pre-processing of the image prior to data capture in order to enhance the image quality and improve the capturing process. It also requires certain post-processing to improve the quality of the data captured.
Top 38 pre-processing features for an accurate and efficient OCR:
OCR issues negate the benefits reaped through automation. The aforementioned 38 functionalities work together in tandem and enable you to generate a 99.0% perfect Intelligent Data Capture.
- De-Skew: Straightens skewed images
- Sub-Image: Separates out an area from the original document image prior to processing
- Noise Removal: Removes isolated specks and machine dot shading
- Lines: Offers settings for horizontal and vertical line removal and reporting
- Vertical Registration: Registers to a particular point using vertical lines
- Resize: Use these settings to "stretch" or "shrink" an image to a new size
- Smoothing & completion: Smoothens characters for better OCR reading
- Inverse Text Correction: Converts white text on black background to normal black-on-white text and makes OCR reading of such text possible
- Horizontal registration: Registers to a particular point using horizontal lines
- Auto-rotate: Performs automatic image rotation
- Intelligent crop: Automatically removes thick black or white borders from an image
- Manual rotate: Offers manual rotation to get correct orientation
- Manual crop and pad: Performs manual crop to add or delete pixles on image size
- Contrast: To increase or decrease contrast
- Brightness: To increase or decrease brightness
- Hue: Improves color depth
- RGB separation: Removes RGB color one by one
- Dotted line: Removes dotted lines for better OCRing
- Test registration: Aligns all images at a particular text
- In painting: Removes water marks incorporated as a separate layer
- Stamp removal: Removes stamp marks, which are in specific pre-defined color
- Edge smoothening: Makes lines perfect
- Character smoothening: Makes characters perfect
- Character thinning: Makes characters thin
- Character separation: Separates machine print words for better readability
- Back ground cleaning: Removes background
- Perimeter recognition: Allows boundary recognition for box type shapes
- Contouring: Allows boundary recognition for non-standard shapes
- Remove handwritten noise: Removes handwritten characters
- Page recognition: Allows to recognize the page
- Form bursting: Explodes a page into multiple sub section
- Color drop-out: Removes color that is redundant - RGB/CMK, etc
- Remove grey: Removes grey shaded background
- Carbon cleaning: Removes carbon marks and smudges to the maximum extent possible
- Grow: Makes the lighter text dark
- Filter: Offers filter for Blurr/Dilate/Median
- Gamma: Allows to set relation between the black and white pixels
- Mirror: Flips the image so that text can be visible
These 38 pre-processing Intelligent Data Capture functionalities prove to be the deciding factor between bad OCR output and good OCR output after image enhancement , thereby determining the success of the overall automation effort or otherwise. These features are instrumental in not only enhancing the image quality but also making total automation and a paperless office a business reality.
Intelligent Data Capture along with RPA and IA provide a phenomenal success in many use cases, which were rendered simply impossible till a few years ago. The very fact that information from unstructured data sources such as a PDF, a printout, or even an image can be read and captured to update databases and downstream systems was highly unbelievable. Today, Intelligent Data Capture is a strong business enabler. It makes 3-minute on-boarding a digital reality, not only saving revenue in terms of millions but also allowing you to do more with the same number of resources. This is definitely just a milestone in the RPA and IA journey while leaving scope for more high-tech advancement in the near future.
Related resources -