Abstract
A method for word recognition based on the use of hidden Markov models (HMMs) is described. An evaluation of its performance is presented using a test set of real printed documents that have been subjected to severe photocopy and fax transmission distortions. A comparison with a commercial OCR package highlights the inherent advantages of a segmentation-free recognition strategy when the word images are severely distorted, as well as the importance of using contextual knowledge. The HMM method makes only one quarter of the number of word errors made by the commercial package when tested on word images taken from faxed pages. © 1998 Springer-Verlag Berlin Heidelberg.