Abstract
A method is described for representing character images which involves the extraction of features from the columns and rows of pixels which constitute each bitmap image. Discrete hidden Markov models (HMMs) are used to represent sequences of quantised features from the training images. The motivation is to recognise word images without their prior segmentation into characters. Segmentation is difficult to achieve when the characters become joined or split due to handwriting style or image noise. Such noise can be present due to a number of physical processes e.g. faxing, photocopying, handling and aging. Word recognition is achieved by matching the HMMs which represent the columns of each character class to the word's feature sequence - maximising the joint likelihood of the segmentation and the classification of each character segment. The method has been evaluated on a number of different types of text image. On artificial images of noisy machine-printed characters in 13 fonts and 5 point sizes a HMM classifier using one model for each character class in all fonts achieved an average recognition rate of 94% for the top-choice classification result and 99% for the correct result in the top-3 choices. On scanned images of real documents the performance of the HMM classifier was compared with OmniPage, a commercial OCR package. On clean documents of isolated characters the performance of the HMM classifier was slightly worse than OmniPage but on noisy faxed documents the HMM classifier performed significantly better. On faxed word images which are difficult to segment by traditional means the HMM classifier was similar to that of OmniPage. With dictionary contextual knowledge it would be considerably better. On a database of hand-printed numerals taken from U.S. mail pieces, the HMM classifier compared well to existing methods. A recognition rate of 96% was achieved. On a database of ZIP code images, the HMM word classifier recognised 50% correctly.