Freydenberger2019_Article_ALogicForDocumentSpanners.pdf (2.08 MB)
Download fileA logic for document spanners
journal contribution
posted on 2018-06-26, 09:04 authored by Dominik FreydenbergerDominik FreydenbergerDocument spanners are a formal framework for information extraction that was introduced by Fagin, Kimelfeld, Reiss, and Vansummeren (PODS 2013, JACM 2015). One of the central models in this framework are
core spanners, which formalize the query language AQL that is used in IBM’s
SystemT. As shown by Freydenberger and Holldack (ICDT 2016, ToCS 2018), there is a connection between core spanners and ECreg, the existential theory of concatenation with regular constraints. The present paper further develops this connection by defining SpLog, a fragment of ECreg that has the same expressive
power as core spanners. This equivalence extends beyond equivalence of expressive power, as we show the existence of polynomial time conversions between SpLog and core spanners. Consequences and applications include an alternative way of defining relations for spanners, a pumping lemma for core spanners, and insights into the relative succinctness of various classes of spanner representations and their connection to graph querying languages. We also briefly discuss the connection between SpLog with negation and core spanners with a difference operator.
Funding
Partly supported by Deutsche Forschungsgemeinschaft (DFG) under grant FR 3551/1-1.
History
School
- Science
Department
- Computer Science
Published in
Theory of Computing SystemsVolume
63Issue
7Pages
1679 - 1754Citation
FREYDENBERGER, D.D., 2018. A logic for document spanners. Theory of Computing Systems, 63(7), pp. 1679–1754.Publisher
Springer Verlag (© The Authors)Version
- VoR (Version of Record)
Publisher statement
This work is made available according to the conditions of the Creative Commons Attribution 4.0 Unported Licence (CC BY). Full details of this licence are available at: http://creativecommons.org/licenses/by/4.0/Acceptance date
2018-06-12Publication date
2018-09-11Copyright date
2018Notes
This is an Open Access Article. It is published by Springer under the Creative Commons Attribution 4.0 Unported Licence (CC BY). Full details of this licence are available at: http://creativecommons.org/licenses/by/4.0/ISSN
1432-4350eISSN
1433-0490Publisher version
Language
- en