Recognition of off-line handwritten cursive text
thesisposted on 24.11.2010, 11:53 by Ibrahim S.I. Abuhaiba
The author presents novel algorithms to design unconstrained handwriting recognition systems organized in three parts: In Part One, novel algorithms are presented for processing of Arabic text prior to recognition. Algorithms are described to convert a thinned image of a stroke to a straight line approximation. Novel heuristic algorithms and novel theorems are presented to determine start and end vertices of an off-line image of a stroke. A straight line approximation of an off-line stroke is converted to a one-dimensional representation by a novel algorithm which aims to recover the original sequence of writing. The resulting ordering of the stroke segments is a suitable preprocessed representation for subsequent handwriting recognition algorithms as it helps to segment the stroke. The algorithm was tested against one data set of isolated handwritten characters and another data set of cursive handwriting, each provided by 20 subjects, and has been 91.9% and 91.8% successful for these two data sets, respectively. In Part Two, an entirely novel fuzzy set-sequential machine character recognition system is presented. Fuzzy sequential machines are defined to work as recognizers of handwritten strokes. An algorithm to obtain a deterministic fuzzy sequential machine from a stroke representation, that is capable of recognizing that stroke and its variants, is presented. An algorithm is developed to merge two fuzzy machines into one machine. The learning algorithm is a combination of many described algorithms. The system was tested against isolated handwritten characters provided by 20 subjects resulting in 95.8% recognition rate which is encouraging and shows that the system is highly flexible in dealing with shape and size variations. In Part Three, also an entirely novel text recognition system, capable of recognizing off-line handwritten Arabic cursive text having a high variability is presented. This system is an extension of the above recognition system. Tokens are extracted from a onedimensional representation of a stroke. Fuzzy sequential machines are defined to work as recognizers of tokens. It is shown how to obtain a deterministic fuzzy sequential machine from a token representation that is capable'of recognizing that token and its variants. An algorithm for token learning is presented. The tokens of a stroke are re-combined to meaningful strings of tokens. Algorithms to recognize and learn token strings are described. The. recognition stage uses algorithms of the learning stage. The process of extracting the best set of basic shapes which represent the best set of token strings that constitute an unknown stroke is described. A method is developed to extract lines from pages of handwritten text, arrange main strokes of extracted lines in the same order as they were written, and present secondary strokes to main strokes. Presented secondary strokes are combined with basic shapes to obtain the final characters by formulating and solving assignment problems for this purpose. Some secondary strokes which remain unassigned are individually manipulated. The system was tested against the handwritings of 20 subjects yielding overall subword and character recognition rates of 55.4% and 51.1%, respectively.
- Mechanical, Electrical and Manufacturing Engineering