When using advanced capture, are templates such a bad thing? These days “template” is a taboo word to both software vendors and those looking to use advanced capture. “Template-less” describes solutions that no longer require a user to create what are often called “zones,” which establish very specific rules for the location of data.
This level of specificity results in limiting the use of templates to mostly “structured forms.” That is, documents where the data can nearly always be reliably located in the same place. So the answer to whether templates are useful or not is really that it depends.
Dealing with High Variance Documents
Most industry professionals would agree that templates are a bad thing if you have a lot of variance with documents in terms of their layouts. If you have the need to automate extraction of data from invoices—and you have 500 different vendors (or worse, an unknown number) to deal with—then creating templates for each vendor layout is exceedingly time-consuming. Avoidance of templates is a justified approach. The desire for a template-less approach is also driven by situations in which an organization has literally hundreds of different structured forms.
However, when an enterprise is dealing with simply 20 or so vendors, the best approach can be to create templates. This is mostly due to the inherent weaknesses of the most widely adopted alternative approach—use of regular expressions and keyword/value pairs—both used to locate the data without use of explicit X/Y coordinates. Simply put, the generalized approach almost always has more data location error than a template approach due to the inability to specifically tell the software where to look. So using templates is more accurate in the long run for small numbers of layouts or low variance documents.
Why Go Template-less
The rationale for using a template-less approach is to generalize the rules to meet the demands of high volume, high variance production environments. For example, when you have a large number of invoice layouts, instead of creating 500 different templates, the majority of invoice layouts can be accommodated in a single to a few number of configurations. However, what you gain from generalization, you can lose in performance.
Creation of template-less configurations also requires a bit more sophistication in terms of technical ability. You are no longer just having staff “draw” boxes around data like you would with a template. A generalized approach, if done well, requires significant analysis of different varieties of invoice layouts in order to optimize performance. In practice, few organizations perform this analysis due to time and staff constraints. To get accurate results, a template-less approach still requires analysis. This is less overall effort than constructing and maintain hundreds of templates.
Kicking the Can Down the Road
Advanced capture vendors often enable use of templates even for invoices, but do so by gradually building a library of these specific rules for each invoice type. They do this by routing new layouts to staff who then instruct the software where each data field is located. Once this information is collected, it is added to the library so that the next time the specific invoice is encountered, it can be processed using the template.
This is a nice way of “kicking the can down the road” so to speak in that it reduces the upfront cost of creating hundreds of templates. The organization gets to enjoy the higher levels of accuracy of templates, but it also results in a very sizable knowledge base that can become unwieldy and slow.
Identifying templates as a “bad thing” is the wrong focus. Instead, the focus should be on the nature of the project, the level of data variance and the accuracy requirements. Together these factors should drive the selection of one approach over the other. Or maybe even a new approach altogether…
At Parascript, we make use of template and template-free approaches. We also focus on improving the performance of each while reducing the typically associated costs. For templates, we offer the unique ability to automatically convert any searchable PDF into a ready-to-use configuration. With this capability, even if you have hundreds of structured forms, the time required to create a template for each reduces from hours to literally minutes.
For semi-structured data such as the case where hundreds or thousands of layouts are involved, we offer our Smart Learning capability that automatically configures and optimizes performance to a level that is even better than the performance of the venerable template.
In a way, we’re working on letting you have your cake and eat it, too.
If you found this article interesting you may find this eBook useful. The Automation Strategies for Dealing with High Variance and Unstructured Documents eBook examines the types of document variance & options for automation.