March 1, 2019

Google, Amazon, Microsoft & Advanced Capture: OCR Software or Something Else?

What OCR software offers and what advanced capture provides are different in terms of capabilities: actually interpreting a document to find the needed relevant data and doing so accurately. For instance, if you apply OCR to an invoice, you get a bunch of text with no information about the specific data that you need. To accurately extract specific data such as the invoice date, invoice number, line items and total requires a significant amount of effort that is beyond the scope of typical OCR software.

Getting the Most Out of Your OCR or Capture Deployment

Unfortunately, the market and information sources covering these markets constantly use “OCR” or Optical Character Recognition to mean many different things. This results in a lot of confusion and frustration. We see this all the time. New customers come to us because they are unhappy with their existing deployments of leading OCR software, which are decidedly NOT advanced capture solutions. These deployments fail to meet the expectations of the business. Simply put, there is a lot of effort to go from getting results of full-page OCR to accurately finding specific data.

Similarly, we have already encountered Google and Microsoft in situations where the prospective client mistakenly believes that they can simply plug-in the Google Vision API or Microsoft’s Azure Computer Vision API and get something similar to a traditional advanced capture deployment.

Whether or not organizations and vendors actually use traditional OCR software or the new offerings from Google, Amazon or Microsoft is dependent upon what their needs are. In general, there is still a lot going on purely in page transcription and document conversion that these offerings can adequately cover and that businesses operating on AWS, Azure, or Google Cloud will naturally consider.

Advanced Capture: Key Capabilities

When it comes to needs that go beyond transcription of images into text or conversion of TIFF files into editable Word documents, capabilities often associated with advanced capture solutions should be considered. Let’s review those key capabilities: image processing, document classification, data extraction and data quality.

Image Processing

This capability is needed if you deal with scanned documents or pictures of documents taken with a mobile device. Delivering solid OCR data is heavily impacted by the quality of the image. Aspects like distortion, contrast, lighting, background/watermark removal, scaling correction and geometry correction are typically employed to ensure that an incoming document is optimized before OCR is applied. Traditional packaged OCR software does contain a good amount of image perfection functions, but the intelligent application of these requires a lot of work.

Document Classification

OCR cannot inherently differentiate between two types of documents, which is a typical need in advanced capture tasks. This is the province of classification. While OCR is definitely a prerequisite when it comes to text-based classifiers, the ability to easily train a system to output reliable document class assignments is not the function of OCR. There are many ways to tackle classification from rules-based approaches, which depend upon locating specific keywords all the way to non-OCR approaches, which “look” at documents in much the same way that humans do. The upshot is that OCR is used to supply partial input to a classifier.

Data Extraction

Here we are not talking about conversion of an image of a document into text, but to satisfy the business need to turn documents into structured tag-value pairs that can be used by various systems. Here we need access to a broader set of capabilities from the simplest ability to locate data by supplying field-level X-Y coordinates to more sophisticated location techniques such as presence of keywords, relative proximity of one data element to another or pattern matching. Organizations often believe they can build these capabilities and apply them to OCR after-the-fact.

The reality is that constructing a custom solution most often results in a brittle, complex system that requires development skills to modify. And by working with OCR results, organizations either lose a lot of visual-related information or must piece it back together. The upshot is that while OCR is a necessary input, it stops far short of delivering the necessary data.

Data Quality

This last one is probably the most difficult concept to measure and at the same time, the most important. When it comes to OCR, you receive text along with the character and word-level coordinates as well as character-level confidence scores. These confidence scores are used to convey the overall probability of the software delivering a correct character-level answer. For instance, in a word like “source”, the output might look like “s (45), o (35), u (99), c (85), e (95)” where the numbers in parentheses represent confidence scores.

However, character-level confidence scores don’t provide any value if the real objective is to locate and extract data elements that might consist of several words. Advanced capture solutions are focused on data element-level confidence scoring in order to reliably extract the most data elements, not data characters. Additionally, advanced capture solutions provide a number of data validation capabilities to improve answers beyond what OCR software provides. Ultimately, OCR software can only do a partial job so that the organization must review 100% of the output to ensure adequate data quality. At this point, any objective of automation goes out the window.

A Place for Both OCR & Advanced Capture

Ultimately, OCR software has value and is a necessary tool. However, it stops well short of providing the ultimate value of straight-through processing of document-based information. The big companies like Microsoft, Google, and Amazon certainly help democratize OCR itself, which leads to increased use of document-based information. Also, advanced capture solutions can benefit by having additional toolsets to use, potentially improving results and reducing solution costs. There is ample room for both OCR software and advanced capture powered by machine learning, but it’s important to know the differences to ensure your technology deployment meets your business needs.

###