Are You Really Getting the OCR Accuracy You Expect?

Adequate Measurement?

In a recent survey conducted by AIIM, it found that over 60% of respondents have more than 10 document types to support within their document automation systems. Additionally 43% of respondents measured performance only on a small production run. And 22% measured on a few samples. A separate survey of BPOs found a similar process with 61% of respondents answering that they measure performance on spot checks of samples.

What does it really take to verify your OCR accuracy? And how can you measure a vendor’s claim of a particular accuracy rate?

Take, for instance, the popular 99 percent accuracy claim provided by many vendors. If you regularly process 100,000 documents, to test that accuracy percentage with any reliable measure, you would have to measure the results on 1000 documents at a minimum and perform it regularly. But that measure would still have a larger margin of error that would not tell you if you are getting exactly 99 percent accuracy. To reduce the margin of error below .9 percent, you would need to measure over 90,000 documents from that pool of 100,000. That level of measurement is a major effort and far beyond the small production runs and small samples typically used.

Not There Yet…

The reality we face is that the expectations of automation systems far outstrips the confidence in the ability to truly deliver accurate data without manual intervention. In the case of BPOs, 85% verify all data. Even though document automation has been around for more than two decades, we still have a long way to go to realize the promise of true straight-through processing. It’s one reason why Parascript offers a data quality guarantee along with a pricing model where you only pay for accurate data.