Intelligent Capture | Knowledge Base | Definition

What Is Intelligent Capture?

Intelligent capture is software that can convert complex and variable document-based information into structured data. Intelligent Capture software uses different techniques to locate and extract data including pattern matching using regular expressions, definition of keyword/value pairs and location of tabular data column headers and rows.

Intelligent Capture Key Capabilities

When it comes to needs that go beyond transcription of images into text or conversion of TIFF files into editable Word documents, capabilities often associated with intelligent capture solutions should be considered. Let’s review those key capabilities: image processing, document classification, data extraction and data quality.

Image Processing

This capability is needed if you deal with scanned documents or pictures of documents taken with a mobile device. Delivering solid OCR data is heavily impacted by the quality of the image. Aspects like distortion, contrast, lighting, background/watermark removal, scaling correction and geometry correction are typically employed to ensure that an incoming document is optimized before OCR is applied. Intelligent Capture applies advanced image perfection functions.

Document Classification

Intelligent capture differentiates between different types of documents, which is a typical need in intelligent capture tasks. This is the province of classification. The ability to easily train a system to output reliable document class assignments is an important capability of intelligent capture.

Data Extraction

Here we are not talking about conversion of an image of a document into text, but to satisfy the business need to turn documents into structured tag-value pairs that can be used by various systems. Intelligent capture offers a broader set of capabilities from the simplest ability to locate data by supplying field-level X-Y coordinates to more sophisticated location techniques such as presence of keywords, relative proximity of one data element to another or pattern matching so that the right data is accurately extracted.

Data Quality

This last one is probably the most difficult concept to measure and at the same time, the most important. When it comes to OCR, you receive text along with the character and word-level coordinates as well as character-level confidence scores. These confidence scores are used to convey the overall probability of the software delivering a correct character-level answer. For instance, in a word like “source”, the output might look like “s (45), o (35), u (99), c (85), e (95)” where  the numbers in parentheses represent confidence scores. However, character-level confidence scores don’t provide any value if the real objective is to locate and extract data elements that might consist of several words.

Intelligent capture focuses on data element-level confidence scoring to reliably extract the most data elements, not data characters. Additionally, intelligent capture provides a number of data validation capabilities to improve data results.