Part 4 of our Advanced Capture Stack Series
Delving into the key technologies involved in cognitive capture and what areas they support requires some context. We will use the standard document capture workflow of input, preprocessing, document classification and separation, data location and extraction, validation, verification, and then output. That’s a lot so we’ll cover these areas at a higher level and limit the discussion to areas that are specific to cognitive capture, leaving out the input and output stages.
Part 3 of the Advanced Capture Stack series covered the role AI—and in particular, machine learning—plays in cognitive capture. Most practical areas to apply machine learning are the areas associated with document classification and data extraction tasks. Machine learning is not all the same and each task drives the selection of the proper machine learning techniques.
This Document Preprocessing stage is traditionally limited to handling documents that are captured with cameras such as scanners or mobile phones. The reality is that any time a document moves from digital (presumably created by Microsoft Word or another application), to analog (printed out), and then back to digital again (via a camera), there is loss of fidelity. This loss in fidelity reduces the amount of automation that can be achieved. If a system must use OCR or another form of recognition, it generally performs best when the data on which it operates is in pristine condition.
To be in pristine condition means that data is very clear, and there is no “noise” in the form of blotches or even tiny speckles that can be introduced by the camera or by problems with the paper such as bends or folds, dirt or ink smudging, or the occasional spilled beverage. The ideal scenario is that the data to be processed is unadulterated. There are the other problems associated with the image quality such as low contrast, excessive tilting (often called “skew”) one direction or another, images that are upside-down, or images that are stretched, not the proper size or resolution.
All of the technologies in this stage are designed to rectify common (or uncommon) problems to get the document to as close to that of the original, pristine version as possible. Specific technologies or techniques involved include:
- De-skewing. This is the process of taking an image of a document that is scanned at an angle and reorienting it so that all text is horizontal. The algorithms involved can be simple, relying upon the image boundaries, or it can involve analyzing the text in order to orient the image.
- De-speckling. This is the process of analyzing an image and removing pixels that, through image analysis, do not appear to be part of the original document.
- Scaling. This is the process of changing the size of the image as it is displayed. This function allows a document to resemble the same size as what is expected. Some software does a better job than others in terms of correcting the degree of scaling.
- Binarization. This is the process of converting a color or grayscale image to black-and-white. Black-and-white images provide the highest level of contrast for the recognizers.
- Resolution setting. This is the process of changing the resolution of the image so that it can conform to the parameters of a given recognition process.
Document Classification and Separation
If you have more than one document type that you need to process within a single workflow, some type of document classification is required to allow for documents to be identified and routed to different processes, for instance an accounts receivable process that involves a remittance document and check, each of which has data that needs to be extracted, validated and exported into an accounting system. Most document classification in production uses a rules-based approach where a subject matter expert analyzes key attributes of each document type, such as presence of keywords or specific data, and then constructs rules that dictate document type assignment.
Text and Visual Classification
The cognitive capture variant of document classification, involves machine learning, where instead of someone manually encoding rules, a set of algorithms parse and analyze documents to identify key “features” that are reliable-enough to distinguish one document type from another. There are two basic types of classifiers: text and visual.
For text classification, the algorithms analyze different characteristics beyond just keywords. They analyze frequency of terms, proximity of one term to another, and more. For visual classifiers, they evaluate the graphical elements of the document, ignoring text altogether (OCR is not required here). Aspects such as pictures, layout of paragraphs or tabular data and even logos can be considered.
Overall, the benefit of machine learning algorithms apart from the fact that they relieve us from manual work and upkeep, is that it can analyze far more data and identify features that you or I might easily miss. Also, the algorithms can be continuously updated resulting in more-stable, reliable performance.
Closely related to document classification is document separation. Traditionally, documents (which were scanned) were separated by the presence of blank pages, barcodes or some other identifier that the system could use to discern between one document and another. These identifiers are typically applied manually during what is called batch preparation. Increasingly, more documents are arriving into an organization already digitized, whether already scanned or born digital. For documents that exist as individual files (e.g., a Word or PDF file), there is really no need to separate them.
Many cases exist where multiple documents are stored as a single file. For instance, a patient claim often has the claim form and supplemental documentation, or a mortgage loan file could have from 50 to 500 or more documents stored within a single PDF. In these cases, it is impractical for manual insertion of document separators so something else must be done. A rules-based method is often the favored approach because it is simple to understand and implement. However, as with any rules-based system, there is an unfortunate tradeoff between comprehensiveness and cost with most organizations opting to minimize cost resulting in a lot of errors.
Enter machine learning-based separation. Just as with document classification, we hand-over the tasks of analyzing documents to find features to computer algorithms. This time, instead of identifying attributes that identify a document type, the underlying analysis focuses on features that indicate first, middle and last pages. Page numbers, titles, headers and footers all come into play in addition to other attributes. The result is a higher level of fidelity due to a more comprehensive analysis without the significant attendant costs.
Data Location and Extraction
The next technology area involved in cognitive capture is data location and extraction. While many organizations are thrilled to automate document classification and separation, organizations often also require additional metadata that is stored within the documents themselves. For instance, an insurance claim requires claimant data such as name, social security number, address, services rendered and so on. There may also be the need to locate and verify information regarding the claim in supporting documentation such as a provider invoice.
Handling Unstructured Data
The objective in any process is to take unstructured data in these documents and use them in a more structured manner to shepherd a process from beginning to end. Historically, metadata was manually entered with other techniques introduced in the intervening years such as using templates for forms and regular expressions or keyword/value pairs for more complex data that is typically found in invoices, remittances, receipts and explanation of benefits documents. As with document classification, these data location and extraction techniques rely upon creation of rules manually.
More recently, software vendors have introduced “loopback” mechanisms that allow for the creation of rules gradually during production by having staff handle errors and tell the system exactly where the data is located; often called a knowledge base. This method, while reducing the amount of upfront effort, does have the same limitations of any rules-based system. Here machine learning can improve the process. Instead of creating rules manually, the system comprehensively analyzes documents to create a data model that reliably locates needed information, at a field or data element level, and then extracts and validates it. The result is a more flexible model than a brittle rules-based approach that also allows continuous updates to occur.
OCR and Handwriting Recognition (ICR) Software
You might be wondering about OCR or where other forms of recognition such as handwriting (often referred to as “ICR”) come into play within the bevy of technologies involved in cognitive capture.
The reality is that OCR or ICR is a necessary prerequisite for image-based documents where either content-based classification or data extraction is required. However, the use of either is not mandatory. For example, if digital documents are involved, such as a searchable PDF, there is no need for OCR at all (or image preprocessing for that matter). Remember that OCR/ICR is simply the transcription of image-based document information into something that a computer can read.
This information is produced as plain text without any concept of document types, document boundaries or data fields. So, OCR/ICR is a necessary prerequisite for image-based documents, but the real heavy lifting involves the critical steps of document classification, document separation and data location. Another point is that the majority of OCR and ICR packages come fully-baked. This means that even though they use machine learning techniques to perform image-to-text transcription, the run-time software used in intelligent capture is not able to continue to learn. There are some cases of deploying OCR/ICR that is able to learn in production environments, but these deployments are not the norm.
Cognitive Capture: Its Essence
By now you’re probably realizing there is a theme to all of this “cognitive stuff” – the use of machine learning applied to specific tasks. The reality is that machine learning unto itself is a tool just like any other type of tool. Without the proper application, it is meaningless.
A hammer in a drawer is just a bunch of heavy atoms. When held correctly and used with force against the head of a nail, it becomes extraordinarily useful. It’s the same with machine learning. No vendor can offer machine learning without specifically applying it to a particular problem.
If you found this article interesting, you may want to attend our upcoming webinar, “FormXtra.AI 7.6 Overview & Demo” on December 11; you can REGISTER HERE.
Discover New Capabilities, Watch the Demo and Get Your Questions Answered
See FormXtra.AI Smart Learning in action, how it extracts data from highly variable, multi-page documents—eliminating time-consuming, brittle templates and rules-based approaches. Discover the new capabilities in advanced capture including our multi-line handwriting recognition.