What does it take to build a cognitive capture solution from scratch?

Building Cognitive Capture from Scratch

What it takes to build a cognitive capture solution from scratch is an important discussion because, even after reviewing all of the various moving parts, some IT groups within organizations prefer to develop their own solution. They look to develop a tailored solution to address their own needs vs. acquiring and implementing an off-the-shelf product.

With the availability of cloud-based capabilities that provide elements like OCR, classification and some level of data extraction – all offered as discrete capabilities – many organizations are seduced into believing the ability to design their own solution is simple. There are many aspects to the Build/Buy decision process that won’t be covered here because they are too generic. Let’s focus instead on the hidden elements of creating your own solution.

Hidden Elements to Custom Solutions

There are two primary hidden costs associated with developing a custom capture solution: staff skills and OCR performance. The staff skills issue might seem like a traditional development skills acquisition problem. When it comes to creating software where the primary objective is high levels of comprehensive data accuracy, knowledge of software development is a necessary prerequisite, but is only a small factor. It may be easy to develop software that uses third-party capabilities such as Google Document Understanding or Amazon’s Textract to perform certain operations.

There are two primary hidden costs associated with developing a custom capture solution: staff skills and OCR performance.

Data Science and Machine Learning Expertise

However, most decisions to go with a customized solution are based on the need to handle specific problems where no ready-made solutions exist. In this case, the most offerings on the market force a trade-off between out-of-the-box capabilities and a custom project to deliver specific capabilities. There is no in-between. This results in development projects, which start out seemingly small, turning into large custom projects that often cost more than commercial software alternatives.

The skills required to bring these complex projects to fruition require expertise with data science and a deep understanding at a core level of machine learning algorithms including when to choose one technique over another. Commercial alternatives offer flexibility with configuring the systems to meet very specific needs without the same significant investment in data science and machine learning skills.

OCR Performance

Unbeknownst to most – even those with solid technical backgrounds – are the relative peculiarities associated with OCR toolkits and their cloud-based brethren. OCR is largely designed and used to convert image-based text into machine readable form. In order to perform that function, OCR software has been tuned at the character and word level to achieve high levels of reliability. The problem arises when an organization needs to find specific data within documents and output it in a structured format.

There is a lot to consider. Start with the ability to simply reliably locate data. Many programmers would falsely believe that it is simply a matter of applying regular expressions to the text. If you need a date, simply look for a format of XX/XX/XXX. However, what if there are many different date formats? Going with these obvious routes neglects a lot of key contextual data that significantly aids with this task such as spatial proximity of targeted data to other data, fonts of needed data and many other typically visual aspects.

And then, there are the issues with data output, especially with data called, “confidence scores.” Confidence Scores for OCR are different from those in cognitive capture solutions. OCR provides confidence scores at the character and word level while cognitive capture solutions provide confidence scores at the data field level. Analyzing scores at the field level is essential to successful cognitive capture projects. There are even cognitive capture solutions that cannot overcome the OCR confidence score problem when it comes to data field level outputs. This results in the need to manually verify every single data output.

Where It Makes Sense to Build Solution or Purchase One

There are many use cases where it makes sense to build a solution vs. purchase one. While the availability of many different toolkits, SDKs and Web Services focused on OCR, classification and handwriting recognition are available to developers, the reality is that a cognitive capture solution is more than the sum of its parts. Rather a lot that goes into creating a solution that converts document-based information into structured data in a reliable, accurate manner.

Most of these services perform better when they can be applied as out-of-the-box capabilities that do not require significant data science skills. That means that where organizations require solutions to specific problems of their organization, an off-the-shelf cognitive capture software solution is almost always the best option.