Interest in artificial intelligence (AI), especially the branch known as “machine learning” continues to accelerate. When we meet with existing and prospective clients, questions are often asked about solutions that are able to be trained or can learn. The underlying rationale for the question is the belief that machine learning will be more adaptive and easier to configure than traditional rules-based forms of AI.
Deciding When to Use It
Decisions to use rules or machine learning for automation projects are going on everywhere. For instance, Amazon’s Alexa Price in 2017 included several different universities all competing to create the most conversant chat bot. While many of the teams chose a machine learning approach to start, they found that rules were very useful with some choosing a blend of both machine learning and rules-based expert systems approaches.
Advantages Based on Project
There are cases where machine learning AI has advantages over expert system AI and vice versa. It depends upon the nature of the documents that need to be processed. That is, it comes down to what are the unknowns vs. the knowns with regard to the nature of your documents and low vs. high variance.
The real benefit of machine learning is its ability to create abstract rules from a large amount of input and then to apply those rules in a more-general and less-strict manner. For instance, with document classification, if you only have five document types to classify, and you know what attributes make each distinct from the other, it is easier and more precise to just encode rules that govern the classification of your documents.
However, if you have a large number of documents and/or the variance of each document type is unknown, it is better to use a machine learning process to automate the identification of key attributes and then auto-create those rules. While the auto-generated rules are not as specific as those that were built-by-hand for the previous five document types, they can accommodate the large amount of variance that will be encountered in production.
For data extraction, the approach is similar. Machine learning analyzes documents along with the needed data to identify where the data is located and how best to extract it. Machine learning for data extraction is best suited, again, to projects where the targeted documents have a high and/or unknown amount of variance where a general approach is to cover a larger percentage of production documents.
On the flip side, in situations where the level of unknown and/or variance is low, an expert system AI based upon user specified rules is likely to yield the best results. For classification, it is essentially a binary action – if rule is met, classify, else, don’t. For data location, if field has a value, extract, else, leave blank.
The reason why results will be better than a machine learning approach is because of the ability to specify, with precision, the rules that are used on documents that are static and defined. Such specificity allows errors associated with a more “abstract” approach of machine learning to be removed. For example, if you have a project where you need to process a number of structured forms, it is easier and more precise to define those forms. For classification, a project will fare better by using specific examples that would be used to match with incoming documents.
If your project has semi-structured document such as invoices, but you only need to process a few vendors, classification can use keywords. For those small number of vendor invoices, you can even use coordinate-based fields instead of more-complex and abstracted machine learning.
However, rules-based approaches fall short when dealing with many different document types or variance within document types that require an extensive amount of analysis and development.
The Right Approach
Selecting the right approach is important to achieve the best possible data results. It is not about selecting the coolest technology, but about understanding the strengths and weaknesses of each AI.
If your project has a large number of document types or has a significant amount of variance within document types (e.g., invoices from many different vendors along with other incoming documents), a machine learning approach is the easiest method and provides the largest amount of coverage.
If your project deals with structured data or has a small set of known document types with low variance, go with a rules-based approach to mitigate any errors associated with abstracted machine learning.
If you found this article interesting, you might find our Data Interpretation eBook helpful. It focuses on advanced data interpretation systems powered by machine learning that offer document classification, data extraction and interpretation and what precisely that means to the business: