Supplementary information files for A genetically-optimised artificial life algorithm for complexity-based synthetic dataset generation
Supplementary files for article A genetically-optimised artificial life algorithm for complexity-based synthetic dataset generation
Algorithmic evaluation is a vital step in developing new approaches to machine learning and relies on the availability of existing datasets. However, real-world datasets often do not cover the necessary complexity space required to understand an algorithm’s domains of competence. As such, the generation of synthetic datasets to fill gaps in the complexity space has gained attention, offering a means of evaluating algorithms when data is unavailable. Existing approaches to complexity-focused data generation are limited in their ability to generate solutions that invoke similar classification behaviour to real data. The present work proposes a novel method (Sy:Boid) for complexity-based synthetic data generation, adapting and extending the Boid algorithm that was originally intended for computer graphics simulations. Sy:Boid embeds the modified Boid algorithm within an evolutionary multi-objective optimisation algorithm to generate synthetic datasets which satisfy predefined magnitudes of complexity measures. Sy:Boid is evaluated and compared to labelling-based and sampling-based approaches to data generation to understand its ability to generate a wide variety of realistic datasets. Results demonstrate Sy:Boid is capable of generating datasets across a greater portion of the complexity space than existing approaches. Furthermore, the produced datasets were observed to invoke very similar classification behaviours to that of real data.
- Computer Science