Loughborough University
Browse

Advancing continual lifelong learning in neural information retrieval: Definition, dataset, framework, and empirical evaluation

Download (1.06 MB)
journal contribution
posted on 2025-03-26, 14:38 authored by Jingrui Hu, Georgina CosmaGeorgina Cosma, Axel Finke
Continual learning refers to the capability of a machine learning model to learn and adapt to new information, without compromising its performance on previously learned tasks. Although several studies have investigated continual learning methods for neural information retrieval (NIR) tasks, a well-defined task definition is still lacking, and it is unclear how typical learning strategies perform in this context. To address this challenge, a systematic task definition of continual NIR is presented, along with a multiple-topic dataset that simulates continuous information retrieval. A comprehensive continual neural information retrieval framework consisting of typical retrieval models and continual learning strategies is then proposed. Empirical evaluations illustrate that the proposed framework can successfully prevent catastrophic forgetting in neural information retrieval and enhance performance on previously learned tasks. The results also indicate that embedding-based retrieval models experience a decline in their continual learning performance as the topic shift distance and dataset volume of new tasks increase. In contrast, pretraining-based models do not show any such correlation. Adopting suitable learning strategies can mitigate the effects of topic shift and data augmentation in continual neural information retrieval.

Funding

CSC (China Scholarship Council, No. 202208060371)

Loughborough University

History

School

  • Science

Published in

Information Sciences

Volume

687

Publisher

Elsevier Inc

Version

  • VoR (Version of Record)

Rights holder

© The Author(s)

Publisher statement

This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

Acceptance date

2024-08-15

Publication date

2024-08-22

Copyright date

2024

ISSN

0020-0255

eISSN

1872-6291

Language

  • en

Depositor

Prof Georgina Cosma. Deposit date: 24 October 2024

Article number

121368

Usage metrics

    Loughborough Publications

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC