Loughborough University
Browse

A genetically-optimised artificial life algorithm for complexity-based synthetic dataset generation

Download (2.79 MB)
journal contribution
posted on 2023-01-03, 15:11 authored by Andrew Houston, Georgina CosmaGeorgina Cosma

Algorithmic evaluation is a vital step in developing new approaches to machine learning and relies on the availability of existing datasets. However, real-world datasets often do not cover the necessary complexity space required to understand an algorithm’s domains of competence. As such, the generation of synthetic datasets to fill gaps in the complexity space has gained attention, offering a means of evaluating algorithms when data is unavailable. Existing approaches to complexity-focused data generation are limited in their ability to generate solutions that invoke similar classification behaviour to real data. The present work proposes a novel method (Sy:Boid) for complexity-based synthetic data generation, adapting and extending the Boid algorithm that was originally intended for computer graphics simulations. Sy:Boid embeds the modified Boid algorithm within an evolutionary multi-objective optimisation algorithm to generate synthetic datasets which satisfy predefined magnitudes of complexity measures. Sy:Boid is evaluated and compared to labelling-based and sampling-based approaches to data generation to understand its ability to generate a wide variety of realistic datasets. Results demonstrate Sy:Boid is capable of generating datasets across a greater portion of the complexity space than existing approaches. Furthermore, the produced datasets were observed to invoke very similar classification behaviours to that of real data.

Funding

DMRC

Loughborough University

History

School

  • Science

Department

  • Computer Science

Published in

Information Sciences

Volume

619

Pages

540 - 561

Publisher

Elsevier

Version

  • VoR (Version of Record)

Rights holder

© The Authors

Publisher statement

This is an Open Access Article. It is published by Elsevier under the Creative Commons Attribution 4.0 International Licence (CC BY). Full details of this licence are available at: https://creativecommons.org/licenses/by/4.0/

Acceptance date

2022-11-05

Publication date

2022-11-11

Copyright date

2022

ISSN

0020-0255

Language

  • en

Depositor

Dr Georgina Cosma. Deposit date: 29 December 2022

Usage metrics

    Loughborough Publications

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC