16.pdf (577.89 kB)
Download file

Document spanners: from expressive power to decision problems

Download (577.89 kB)
conference contribution
posted on 10.03.2017, 11:22 authored by Dominik FreydenbergerDominik Freydenberger, Mario Holldack
© 2016 Dominik D. Freydenberger and Mario Holldack.We examine document spanners, a formal framework for information extraction that was introduced by Fagin et al. (PODS 2013). A document spanner is a function that maps an input string to a relation over spans (intervals of positions of the string). We focus on document spanners that are defined by regex formulas, which are basically regular expressions that map matched subexpressions to corresponding spans, and on core spanners, which extend the former by standard algebraic operators and string equality selection. First, we compare the expressive power of core spanners to three models - namely, patterns, word equations, and a rich and natural subclass of extended regular expressions (regular expressions with a repetition operator). These results are then used to analyze the complexity of query evaluation and various aspects of static analysis of core spanners. Finally, we examine the relative succinctness of different kinds of representations of core spanners and relate this to the simplification of core spanners that are extended with difference operators.

History

School

  • Science

Department

  • Computer Science

Published in

Leibniz International Proceedings in Informatics, LIPIcs

Volume

48

Citation

FREYDENBERGER, D.D. and HOLLDACK, M., 2016. Document spanners: from expressive power to decision problems. Presented at the 19th International Conference on Database Theory (ICDT 2016), Bordeaux, France, Mar 15-18th.

Publisher

Schloss Dagstuhl – Leibniz Center for Informatics

Version

VoR (Version of Record)

Publisher statement

This work is made available according to the conditions of the Creative Commons Attribution 4.0 International (CC BY 4.0) licence. Full details of this licence are available at: http://creativecommons.org/licenses/ by/4.0/

Publication date

2016

Notes

This is an Open Access Article. It is published by Schloss Dagstuhl under the Creative Commons Attribution 4.0 Unported Licence (CC BY). Full details of this licence are available at: http://creativecommons.org/licenses/by/4.0/

ISBN

9783959770026

ISSN

1868-8969

Book series

Leibniz International Proceedings in Informatics, (LIPIcs);48

Language

en