Freydenberger2019_Article_ALogicForDocumentSpanners.pdf (2.08 MB)
Download file

A logic for document spanners

Download (2.08 MB)
journal contribution
posted on 26.06.2018, 09:04 by Dominik FreydenbergerDominik Freydenberger
Document spanners are a formal framework for information extraction that was introduced by Fagin, Kimelfeld, Reiss, and Vansummeren (PODS 2013, JACM 2015). One of the central models in this framework are core spanners, which formalize the query language AQL that is used in IBM’s SystemT. As shown by Freydenberger and Holldack (ICDT 2016, ToCS 2018), there is a connection between core spanners and ECreg, the existential theory of concatenation with regular constraints. The present paper further develops this connection by defining SpLog, a fragment of ECreg that has the same expressive power as core spanners. This equivalence extends beyond equivalence of expressive power, as we show the existence of polynomial time conversions between SpLog and core spanners. Consequences and applications include an alternative way of defining relations for spanners, a pumping lemma for core spanners, and insights into the relative succinctness of various classes of spanner representations and their connection to graph querying languages. We also briefly discuss the connection between SpLog with negation and core spanners with a difference operator.

Funding

Partly supported by Deutsche Forschungsgemeinschaft (DFG) under grant FR 3551/1-1.

History

School

  • Science

Department

  • Computer Science

Published in

Theory of Computing Systems

Volume

63

Issue

7

Pages

1679 - 1754

Citation

FREYDENBERGER, D.D., 2018. A logic for document spanners. Theory of Computing Systems, 63(7), pp. 1679–1754.

Publisher

Springer Verlag (© The Authors)

Version

VoR (Version of Record)

Publisher statement

This work is made available according to the conditions of the Creative Commons Attribution 4.0 Unported Licence (CC BY). Full details of this licence are available at: http://creativecommons.org/licenses/by/4.0/

Acceptance date

12/06/2018

Publication date

2018-09-11

Copyright date

2018

Notes

This is an Open Access Article. It is published by Springer under the Creative Commons Attribution 4.0 Unported Licence (CC BY). Full details of this licence are available at: http://creativecommons.org/licenses/by/4.0/

ISSN

1432-4350

eISSN

1433-0490

Language

en