Loughborough University
Browse
SubregularLearning.pdf (724.57 kB)

Fast learning of restricted regular expressions and DTDs

Download (724.57 kB)
journal contribution
posted on 2017-09-19, 15:43 authored by Dominik FreydenbergerDominik Freydenberger, Timo Kotzing
© 2014, Springer Science+Business Media New York. We study the problem of generalizing from a finite sample to a language taken from a predefined language class. The two language classes we consider are subsets of the regular languages and have significance in the specification of XML documents (the classes corresponding to so-called chain regular expressions, Chares, and to single-occurrence regular expressions, Sores). The previous literature gives a number of algorithms for generalizing to Sores providing a trade-off between quality of the solution and speed. Furthermore, a fast but non-optimal algorithm for generalizing to Chares is known. For each of the two language classes we give an efficient algorithm returning a minimal generalization from the given finite sample to an element of the fixed language class; such generalizations are called descriptive. In this sense of descriptivity, both our algorithms are optimal.

History

School

  • Science

Department

  • Computer Science

Published in

Theory of Computing Systems

Volume

57

Issue

4

Pages

1114 - 1158

Citation

FREYDENBERGER, D.D. and KOTZING, 2015. Fast Learning of Restricted Regular Expressions and DTDs. Theory of Computing Systems, 57 (4), pp.1114-1158

Publisher

© Springer Science+Business Media New York

Version

  • AM (Accepted Manuscript)

Publisher statement

This work is made available according to the conditions of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) licence. Full details of this licence are available at: https://creativecommons.org/licenses/by-nc-nd/4.0/

Publication date

2014-08-14

Notes

The final publication is available at Springer via http://dx.doi.org/10.1007/s00224-014-9559-3

ISSN

1432-4350

eISSN

1433-0490

Language

  • en