Fast learning of restricted regular expressions and DTDs
journal contributionposted on 19.09.2017 by Dominik Freydenberger, Timo Kotzing
Any type of content formally published in an academic journal, usually following a peer-review process.
© 2014, Springer Science+Business Media New York. We study the problem of generalizing from a finite sample to a language taken from a predefined language class. The two language classes we consider are subsets of the regular languages and have significance in the specification of XML documents (the classes corresponding to so-called chain regular expressions, Chares, and to single-occurrence regular expressions, Sores). The previous literature gives a number of algorithms for generalizing to Sores providing a trade-off between quality of the solution and speed. Furthermore, a fast but non-optimal algorithm for generalizing to Chares is known. For each of the two language classes we give an efficient algorithm returning a minimal generalization from the given finite sample to an element of the fixed language class; such generalizations are called descriptive. In this sense of descriptivity, both our algorithms are optimal.
- Computer Science