Intelligent character recognition using hidden Markov models
thesisposted on 09.12.2013, 13:46 by Kamran Kordi
In order to distinguish essays and pre-prints from academic theses, we have a separate category. These are often much longer text based documents than a paper.
Recognition of printed and hand printed characters has received much attention over the past decade as the need for automated 'document entry' systems assumes a commanding role in office automation. Although, present Optical Character Recognition(OCR) systems have reached a high degree of sophistication as compared to early systems, the design of a robust system which can separate text from images accurately and cope reliably with noisy input and frequent change of font is a formidable task. In this thesis, a novel method of character recognition based on Hidden Markov Modelling (HMM) is initially described. The scheme first describes a training set of characters by their outer contours using Freeman codes; next, the HMM method is applied to capture topological variation of the characters automatically, by looking at typical samples of the different characters. Fonts of similar topology can also be incorporated in one hidden Markov model. Once the model of a character in upright position is derived, the character can be recognized, even, when it has been rotated by multiples of 90 degrees. This technique is further extended to combine structural analysis/description of characters with hidden Markov modelling. In this scheme, a character is first skeletonized and then split to primitives; each primitive is described by hidden Markov models while its Corresponding position with respect to nodes(junctions) where the primitives meet, are recorded. This scheme is virtually font and size independent. A new document classification algorithm based on Fuzzy theory is also proposed which provides an indication of a document's contents in terms of 'text' and 'nontext' portions.
- Mechanical, Electrical and Manufacturing Engineering