Loughborough University
Browse
- No file added yet -

Improving visual-semantic embeddings by learning semantically-enhanced hard negatives for cross-modal information retrieval

Download (2.95 MB)
journal contribution
posted on 2023-07-19, 15:47 authored by Yan Gong, Georgina CosmaGeorgina Cosma
Visual Semantic Embedding (VSE) networks aim to extract the semantics of images and their descriptions and embed them into the same latent space for cross-modal information retrieval. Most existing VSE networks are trained by adopting a hard negatives loss function which learns an objective margin between the similarity of relevant and irrelevant image–description embedding pairs. However, the objective margin in the hard negatives loss function is set as a fixed hyperparameter that ignores the semantic differences of the irrelevant image–description pairs. To address the challenge of measuring the optimal similarities between image–description pairs before obtaining the trained VSE networks, this paper presents a novel approach that comprises two main parts: (1) finds the underlying semantics of image descriptions; and (2) proposes a novel semantically-enhanced hard negatives loss function, where the learning objective is dynamically determined based on the optimal similarity scores between irrelevant image–description pairs. Extensive experiments were carried out by integrating the proposed methods into five state-of-the-art VSE networks that were applied to three benchmark datasets for cross-modal information retrieval tasks. The results revealed that the proposed methods achieved the best performance and can also be adopted by existing and future VSE networks.

History

School

  • Science

Department

  • Computer Science

Published in

Pattern Recognition

Volume

137

Publisher

Elsevier

Version

  • VoR (Version of Record)

Rights holder

© The Authors

Publisher statement

This is an Open Access Article. It is published by Elsevier under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Licence (CC BY-NC-ND). Full details of this licence are available at: https://creativecommons.org/licenses/by-nc-nd/4.0/

Acceptance date

2022-12-17

Publication date

2022-12-18

Copyright date

2022

ISSN

0031-3203

eISSN

1873-5142

Language

  • en

Depositor

Deposit date: 12 July 2023

Article number

109272

Usage metrics

    Loughborough Publications

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC