Get in Touch


Please suggest which of the following industries best represent your business and we can help you find the right solution for your business.

Reinventing Business Models With Digital Technologies

Reinventing Business Models With Digital Technologies

Watch Now →
How To Undertake Digital Transformation For Mid-Sized Enterprises

How To Undertake Digital Transformation For Mid-Sized Enterprises

Download Now →

TruBot Improves Transaction Processing Time For A Logistics Giant
     Case Study

TruBot Improves Transaction Processing Time For A Logistics Giant

Read More →

Subscribe to Datamatics Updates


Datamatics Blogs

Top 38 pre-processing must haves for Intelligent Data Capture

by Rajesh Agarwal, on Jul 19, 2019 7:37:36 PM

Estimated reading time: 5 mins

Paper-based processing still exists. It is going to stay for quite some time. Yes, not in just small business pockets but in a good 25-30% of business operation scenarios. When converted into monetary form, this aspect of business processing amounts to double digit millions annually in terms of revenue. The theme is majorly recurrent in Finance & Accounts and Procurement sections of almost all BFSI, Manufacturing, Telecom, Supply Chain, and Research & Analytics companies. Yet when it comes to business, you cannot compromise on speed, efficiency, and quality. 

As a matter of fact, Automation cannot take place without digitization. Simply put, Digitization is the stepping stone to your Digital Transformation journey.

Read blog on “Complement RPA with Intelligent Data Capture to achieve total automation”


It is interesting to note that Optical Character Recognition (OCR) helps to digitize paper based enterprise assets. This actually leads to the materialization and fulfillment of complex business use cases.

However, the fact remains that OCR has inherent quality issues. A hindrance in the form of quality of the digitized asset renders even hi-tech technologies, such as Robotic Process Automation (RPA) and Intelligent Automation (IA), simply ineffective. Here, Intelligent Document Processing, more popularly known as Intelligent Data Capture, is the way ahead. It enables you to read and ingest text from an image making use cases such as Tab Banking, On-mobile Onboarding, and faster claim processing a matter of few minutes as against hours and days required in the bygone years.

What is Intelligent Data Capture?

Intelligent data capture is the process of capturing data from all types of documents including “unstructured ones” such as email, text, PDF, scanned documents, etc., classifying it into categories, and extracting relevant information for further processing. The software solutions for Intelligent Data Capture use Artificial Intelligence algorithms to extract the data in a template free mode, process it and then feed it into different applications, databases, and downstream systems.

However, at times the image itself is not clear, has carbon smudges, it is skewed, and not properly oriented. At times, it could be a dot matrix print or have high noise and contrast. All this results in an inefficient data capture output as per the popular concept “Garbage in Garbage out” or “GIGO”.

It is interesting to note that the reliability and authenticity of the data captured depends on the clarity and effectiveness of the image captured. This calls for pre-processing of the image prior to data capture in order to enhance the image quality and improve the capturing process. It also requires certain post-processing to improve the quality of the data captured.

Top 38 pre-processing features for an accurate and efficient OCR:  

  • De-skew
  • Sub-image
  • Noise removal
  • Lines
  • Vertical registration
  • Resize
  • Smoothing & completion
  • Inverse text correction
  • Horizontal registration
  • AutoRotate
  • Intelligent crop
  • Manual rotate
  • Manual crop and pad
  • Contrast
  • Brightness
  • Hue
  • RGB separation
  • Dotted line
  • Test registration
  • In painting
  • Stamp removal
  • Edge smoothening
  • Character smoothening
  • Character thinning
  • Character separation
  • Back ground cleaning
  • Perimeter recognition
  • Contouring
  • Remove handwritten noise
  • Page recognition
  • Form bursting
  • Color drop out
  • Remove grey
  • Carbon cleaning
  • Grow
  • Filter
  • Gamma
  • Mirror

OCR issues negate the benefits reaped through automation. The aforementioned 38 functionalities work together in tandem and enable you to generate a 99.0% perfect Intelligent Data Capture.

  1. De-Skew: Straightens skewed images
  2. Sub-Image: Separates out an area from the original document image prior to processing
  3. Noise Removal: Removes isolated specks and machine dot shading
  4. Lines: Offers settings for horizontal and vertical line removal and reporting
  5. Vertical Registration: Registers to a particular point using vertical lines
  6. Resize: Use these settings to "stretch" or "shrink" an image to a new size
  7. Smoothing & completion: Smoothens characters for better OCR reading
  8. Inverse Text Correction: Converts white text on black background to normal black-on-white text and makes OCR reading of such text possible
  9. Horizontal registration: Registers to a particular point using horizontal lines
  10. Auto-rotate: Performs automatic image rotation
  11. Intelligent crop: Automatically removes thick black or white borders from an image
  12. Manual rotate: Offers manual rotation to get correct orientation
  13. Manual crop and pad: Performs manual crop to add or delete pixles on image size
  14. Contrast: To increase or decrease contrast
  15. Brightness: To increase or decrease brightness
  16. Hue: Improves color depth
  17. RGB separation: Removes RGB color one by one
  18. Dotted line: Removes dotted lines for better OCRing
  19. Test registration: Aligns all images at a particular text
  20. In painting: Removes water marks incorporated as a separate layer
  21. Stamp removal: Removes stamp marks, which are in specific pre-defined color
  22. Edge smoothening: Makes lines perfect
  23. Character smoothening: Makes characters perfect
  24. Character thinning: Makes characters thin
  25. Character separation: Separates machine print words for better readability
  26. Back ground cleaning: Removes background
  27. Perimeter recognition: Allows boundary recognition for box type shapes
  28. Contouring: Allows boundary recognition for non-standard shapes
  29. Remove handwritten noise: Removes handwritten characters
  30. Page recognition: Allows to recognize the page
  31. Form bursting: Explodes a page into multiple sub section
  32. Color drop-out: Removes color that is redundant - RGB/CMK, etc
  33. Remove grey: Removes grey shaded background
  34. Carbon cleaning: Removes carbon marks and smudges to the maximum extent possible
  35. Grow: Makes the lighter text dark
  36. Filter: Offers filter for Blurr/Dilate/Median
  37. Gamma: Allows to set relation between the black and white pixels
  38. Mirror: Flips the image so that text can be visible

These 38 pre-processing Intelligent Data Capture functionalities prove to be the deciding factor between bad OCR output and good OCR output after image enhancement , thereby determining the success of the overall automation effort or otherwise. These features are instrumental in not only enhancing the image quality but also making total automation and a paperless office a business reality.

In summary:

Intelligent Data Capture along with RPA and IA provide a phenomenal success in many use cases, which were rendered simply impossible till a few years ago. The very fact that information from unstructured data sources such as a PDF, a printout, or even an image can be read and captured to update databases and downstream systems was highly unbelievable. Today, Intelligent Data Capture is a strong business enabler. It makes 3-minute on-boarding a digital reality, not only saving revenue in terms of millions but also allowing you to do more with the same number of resources. This is definitely just a milestone in the RPA and IA journey while leaving scope for more high-tech advancement in the near future.

Related resources - 

Topics:Robotic Process AutomationTruBotDigitalTruCap+Intelligent Data Capture