Loughborough University
Browse
Petzing_pdajpst.2021.012659.full.pdf (1.4 MB)

Systematic design, generation, and application of synthetic datasets for flow cytometry

Download (1.4 MB)
journal contribution
posted on 2022-01-20, 09:11 authored by Melissa Cheung, Jonathan J Campbell, Robert J. Thomas, Julian Braybrook, Jon PetzingJon Petzing
Application of synthetic datasets in training and validation of analysis tools have led to improvements in many decision-making tasks in a range of domains from computer vision to digital pathology. Synthetic datasets overcome the constraints of real-world datasets, namely difficulties in collection and labelling, expense, time and privacy concerns. In flow cytometry, real cell-based datasets are limited by properties such as size, number of parameters, distance between cell populations and distributions, and are often focused on a narrow range of disease or cell types. Researchers in some cases have designed these desired properties into synthetic datasets, however operators have implemented them in inconsistent approaches and there is a scarcity of publicly available, high-quality synthetic datasets. In this research, we propose a method to systematically design and generate flow cytometry synthetic datasets with highly controlled characteristics. We demonstrate the generation of two-cluster synthetic datasets with specific degrees of separation between cell populations, and of non-normal distributions with increasing levels of skewness and orientations of skew pairs. We apply our synthetic datasets to test the performance of a popular automated cell populations identification software, SPADE3, and define the region where the software performance decreases as the clusters get closer together. Application of the synthetic skewed dataset suggests the software is capable of processing non-normal data. We calculate the classification accuracy of SPADE3 with robustness not achievable with real-world datasets. Our approach aims to advance research towards generation of high-quality synthetic flow cytometry datasets, and to increase their awareness among the community. The synthetic datasets can be utilised in benchmarking studies that critically evaluate cell population identification tools and help illustrate potential digital platform inconsistencies. These datasets have the potential to improve cell characterisation workflows that integrate automated analysis in clinical diagnostics and cell therapy manufacturing.

History

School

  • Mechanical, Electrical and Manufacturing Engineering

Published in

PDA Journal of Pharmaceutical Science and Technology

Volume

76

Issue

3

Pages

200 - 215

Publisher

Parenteral Drug Association, Inc.

Version

  • AM (Accepted Manuscript)

Rights holder

© 2022, Parenteral Drug Association

Publisher statement

Reproduced with kind permission of the publisher

Acceptance date

2021-12-07

Publication date

2022-01-14

Copyright date

2022

ISSN

1079-7440

eISSN

1948-2124

Language

  • en

Depositor

Dr Jon Petzing. Deposit date: 14 January 2022

Usage metrics

    Loughborough Publications

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC