Loughborough University
Browse

Conjunctive queries for logic-based information extraction

Download (1.6 MB)
thesis
posted on 2022-11-30, 13:42 authored by Sam M Thompson

 This thesis offers two logic-based approaches to conjunctive queries in the context of information extraction. The first and main approach is the introduction of conjunctive query fragments of the logics FC and FC[REG], denoted as FC-CQ and FC[REG]-CQ respectively. FC is a first-order logic based on word equations, where the semantics are defined by limiting the universe to the factors of some finite input word. FC[REG] is FC extended with regular constraints. Our first results consider the comparative expressive power of FC[REG]-CQ in relation to document spanners (a formal framework for the query language AQL), and various fragments of FC[REG]-CQ – some of which coincide with well-known language generators, such as patterns and regular expressions. Then, we look at decision problems. We show that many decision problems for FC-CQ and FC[REG]-CQ (such as equivalence and regularity) are undecidable. The model checking problem for FC-CQ and FC[REG]-CQ is NP-complete even if the FC-CQ is acyclic – under the definition of acyclicity where each word equation in an FC-CQ is an atom. This leads us to look at the “decomposition” of an FC word equation into a conjunction of binary word equations (i.e., of the form x =˙ y · z). If a query consists of only binary word equations and the query is acyclic, then model checking is tractable and we can enumerate results efficiently. We give an algorithm that decomposes an FC-CQ into an acyclic FC-CQ consisting of binary word equations in polynomial time, or determines that this is not possible. The second approach is to consider the dynamic complexity of FC. This uses the common way of encoding words in a relational structure using a universe with a linear order along with symbol predicates. Then, each element of the universe can carry a symbol if the predicate for said symbol holds for that element. Instead of the “usual way” (looking at first-order logic over these structures), we study the dynamic complexity, where symbols can be modified. As each of these modifications only changes one position, the result of a query before and after the modification is likely to be related. This gives rise to dynamic descriptive complexity classes based on the logic needed to incrementally maintain a query. For such an approach, we show that conjunctive queries are sufficient to maintain the so-called core spanners. In fact, we show that dynamic conjunctive queries (DynCQ) are actually more expressive than core spanners, and dynamic first-order logic (DynFO) is more expressive than generalized core spanners. 

History

School

  • Science

Department

  • Computer Science

Publisher

Loughborough University

Rights holder

© Sam M. Thompson

Publication date

2022

Notes

A Doctoral Thesis. Submitted in partial fulfilment of the requirements for the award of the degree of Doctor of Philosophy of Loughborough University.

Language

  • en

Supervisor(s)

Dominik D. Freydenberger ; Daniel Reidenbach

Qualification name

  • PhD

Qualification level

  • Doctoral

This submission includes a signed certificate in addition to the thesis file(s)

  • I have submitted a signed certificate

Usage metrics

    Computer Science Theses

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC