May 21, 2015

Auto-classification Meets the Information Management Challenge Head-on

information-management-parascript-1170x893

To address the information management problem, look at the quick wins, or as Joe Sheply of Doculabs calls it, the “first 100 pounds.” (Check out Joe’s recent article in CMSWire regarding cleaning-up shared file drives.) Just like the importance of having a healthy lifestyle, most businesses understand that it’s important to improve their control over information. And just like the difficulty of keeping to a nutritious diet, most businesses don’t have a practical place to start where they can realize a success. Demonstrating success with anything, whether personal or for business is critical for many reasons. The key takeaway: look for the simplest, quickest win to get going.

Classification is a great place to start. Unfortunately, as Joe points out, “auto classification software was going to be the silver bullet answer to our information management woes — and they’ve been far from it.” I agree with him. A lot of focus has been placed on the whiz-bang technology and not enough of the process that is required alongside it. Classification is complex. Results do not always meet expectations. Also, as Joe says, it has often proved expensive; often well into the six figures.

As a vendor of document classification capabilities, I believe the onus is on the software industry to correct the classification misconceptions. Sure, auto-classification can be overkill for many projects – it is not a cure-all. However, when companies are in-need of automating the classification of documents based upon characteristics other than file attributes, the only alternative to auto-classification is brute-force. The analog to that is a water-diet; businesses will only undertake that as a last resort.

Finding the Solution

So what is the answer? Start with simpler, more practical projects, as Joe suggests. Select the most appropriate technology. When it comes to auto-classification, consider several things:

The information requirements for relevant classification. Do you need to classify text? Images? Is document layout important?
The scope of your real needs. Often organizations do not need every document classified. The range of documents and departments involved will have a significant impact on both cost and success.
The sensitivity to error. There is always error in automated classification. How sensitive is your project to documents that are incorrectly classified? Do you require 100% accuracy? Is it okay to have a small percentage of documents that have incorrect classifications? This sensitivity directly affects costs associated with manual review vs. straight-through processing where there document classes might be incorrect.

Investigating Beyond the Usual Suspects

Lastly, automated classification can be very expensive, but does it have to be? Sure, a project that involves hundreds of millions of documents will be more expensive than a smaller department-level project, but the software itself doesn’t have to be costly to provide good performance. Carefully select a classification capability, not on just features, but on ability to prove solid results. And expand your investigation beyond the usual large solutions providers as there are lots of innovative technologies that are affordable.

Learn more: