posted on 2013-12-09, 13:46authored byKamran Kordi
Recognition of printed and hand printed characters has received much
attention over the past decade as the need for automated 'document entry'
systems assumes a commanding role in office automation. Although, present
Optical Character Recognition(OCR) systems have reached a high degree of
sophistication as compared to early systems, the design of a robust system
which can separate text from images accurately and cope reliably with noisy
input and frequent change of font is a formidable task. In this thesis, a novel
method of character recognition based on Hidden Markov Modelling (HMM) is
initially described. The scheme first describes a training set of characters by
their outer contours using Freeman codes; next, the HMM method is applied
to capture topological variation of the characters automatically, by looking at
typical samples of the different characters. Fonts of similar topology can also
be incorporated in one hidden Markov model. Once the model of a character
in upright position is derived, the character can be recognized, even, when it
has been rotated by multiples of 90 degrees. This technique is further extended
to combine structural analysis/description of characters with hidden Markov
modelling. In this scheme, a character is first skeletonized and then split to primitives;
each primitive is described by hidden Markov models while its Corresponding
position with respect to nodes(junctions) where the primitives meet,
are recorded. This scheme is virtually font and size independent. A new document
classification algorithm based on Fuzzy theory is also proposed which
provides an indication of a document's contents in terms of 'text' and 'nontext'
portions.
History
School
Mechanical, Electrical and Manufacturing Engineering