Loughborough University
Browse
1-s2.0-S0020025522012865-mmc1.pdf (86.24 kB)

Supplementary information files for A genetically-optimised artificial life algorithm for complexity-based synthetic dataset generation

Download (86.24 kB)
dataset
posted on 2023-03-29, 14:40 authored by Andrew HoustonAndrew Houston, Georgina CosmaGeorgina Cosma

Supplementary files for article A genetically-optimised artificial life algorithm for complexity-based synthetic dataset generation


Algorithmic evaluation is a vital step in developing new approaches to machine learning and relies on the availability of existing datasets. However, real-world datasets often do not cover the necessary complexity space required to understand an algorithm’s domains of competence. As such, the generation of synthetic datasets to fill gaps in the complexity space has gained attention, offering a means of evaluating algorithms when data is unavailable. Existing approaches to complexity-focused data generation are limited in their ability to generate solutions that invoke similar classification behaviour to real data. The present work proposes a novel method (Sy:Boid) for complexity-based synthetic data generation, adapting and extending the Boid algorithm that was originally intended for computer graphics simulations. Sy:Boid embeds the modified Boid algorithm within an evolutionary multi-objective optimisation algorithm to generate synthetic datasets which satisfy predefined magnitudes of complexity measures. Sy:Boid is evaluated and compared to labelling-based and sampling-based approaches to data generation to understand its ability to generate a wide variety of realistic datasets. Results demonstrate Sy:Boid is capable of generating datasets across a greater portion of the complexity space than existing approaches. Furthermore, the produced datasets were observed to invoke very similar classification behaviours to that of real data. 



Funding

DMRC

Loughborough University

History

School

  • Science

Department

  • Computer Science

Usage metrics

    Computer Science

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC