Conjunctive queries for logic-based information extraction
This thesis offers two logic-based approaches to conjunctive queries in the context of information extraction. The first and main approach is the introduction of conjunctive query fragments of the logics FC and FC[REG], denoted as FC-CQ and FC[REG]-CQ respectively. FC is a first-order logic based on word equations, where the semantics are defined by limiting the universe to the factors of some finite input word. FC[REG] is FC extended with regular constraints. Our first results consider the comparative expressive power of FC[REG]-CQ in relation to document spanners (a formal framework for the query language AQL), and various fragments of FC[REG]-CQ – some of which coincide with well-known language generators, such as patterns and regular expressions. Then, we look at decision problems. We show that many decision problems for FC-CQ and FC[REG]-CQ (such as equivalence and regularity) are undecidable. The model checking problem for FC-CQ and FC[REG]-CQ is NP-complete even if the FC-CQ is acyclic – under the definition of acyclicity where each word equation in an FC-CQ is an atom. This leads us to look at the “decomposition” of an FC word equation into a conjunction of binary word equations (i.e., of the form x =˙ y · z). If a query consists of only binary word equations and the query is acyclic, then model checking is tractable and we can enumerate results efficiently. We give an algorithm that decomposes an FC-CQ into an acyclic FC-CQ consisting of binary word equations in polynomial time, or determines that this is not possible. The second approach is to consider the dynamic complexity of FC. This uses the common way of encoding words in a relational structure using a universe with a linear order along with symbol predicates. Then, each element of the universe can carry a symbol if the predicate for said symbol holds for that element. Instead of the “usual way” (looking at first-order logic over these structures), we study the dynamic complexity, where symbols can be modified. As each of these modifications only changes one position, the result of a query before and after the modification is likely to be related. This gives rise to dynamic descriptive complexity classes based on the logic needed to incrementally maintain a query. For such an approach, we show that conjunctive queries are sufficient to maintain the so-called core spanners. In fact, we show that dynamic conjunctive queries (DynCQ) are actually more expressive than core spanners, and dynamic first-order logic (DynFO) is more expressive than generalized core spanners.
History
School
- Science
Department
- Computer Science
Publisher
Loughborough UniversityRights holder
© Sam M. ThompsonPublication date
2022Notes
A Doctoral Thesis. Submitted in partial fulfilment of the requirements for the award of the degree of Doctor of Philosophy of Loughborough University.Language
- en
Supervisor(s)
Dominik D. Freydenberger ; Daniel ReidenbachQualification name
- PhD
Qualification level
- Doctoral
This submission includes a signed certificate in addition to the thesis file(s)
- I have submitted a signed certificate