The edit distance to k-subsequence universality
conference contributionposted on 21.05.2021, 10:25 authored by Joel DayJoel Day, Pamela Fleischmann, Maria Kosche, Tore Koss, Florin Manea, Stefan Siemer
A word u is a subsequence of another word w if u can be obtained from w by deleting some of its letters. In the early 1970s, Imre Simon defined the relation ∼k (called now Simon-Congruence) as follows: two words having exactly the same set of subsequences of length at most k are ∼k-congruent. This relation was central in defining and analysing piecewise testable languages, but has found many applications in areas such as algorithmic learning theory, databases theory, or computational linguistics. Recently, it was shown that testing whether two words are ∼k-congruent can be done in optimal linear time. Thus, it is a natural next step to ask, for two words w and u which are not ∼k-equivalent, what is the minimal number of edit operations that we need to perform on w in order to obtain a word which is ∼k-equivalent to u. In this paper, we consider this problem in a setting which seems interesting: when u is a k-subsequence universal word. A word u with alph(u) = Σ is called k-subsequence universal if the set of subsequences of length k of u contains all possible words of length k over Σ. As such, our results are a series of efficient algorithms computing the edit distance from w to the language of k-subsequence universal words.
- Mathematical Sciences