Defining confidence in flow cytometry automated data analysis software platforms
The development of flow cytometry data analysis computational tools in recent yearshas the potential to reduce the variation arising from manual gating and improve the quality of cell characterisations performed within academic, clinical and biomanufacturing settings. However, there is a need to understand the uncertainty of measurements from automated tools, alongside a need for benchmarking datasets with ground truth that enable systematic comparisons to be made between these tools. This thesis investigates the cell identification outputs of software that utilise different classes of clustering algorithms, with a focus on implementing highly controlled synthetic datasets for performance evaluation.
A literature survey was conducted to identify the most cited tools, enabling the selection of the most relevant ones representative of different unsupervised clustering techniques: Flock2, flowMeans, FlowSOM, PhenoGraph, SPADE1, SPADE3 and SWIFT. Synthetic flow cytometry datasets were designed and generated with specific data characteristics of separation, normal/skew distributions, rarity and noise elements, and demonstrated to be credible substitutes for real cell data. These synthetic datasets were applied to the different software tools to determine the accuracy and repeatability of absolute cell counts. The results demonstrated how outputs from software analysing the same reference synthetic dataset vary considerably with accuracy deteriorating as the clusters overlapped and the separation index fell below zero. Moreover, SWIFT was found to be more negatively affected than other software in the presence of skewed cell populations. Assessment of rare cell detection revealed most software failed to consistently achieve a limit of detection of 100 cells in 106 events (0.01%). The addition of noise events resulted in a decrease in performance from all software, most significantly for FlowSOM. Furthermore, an automated versus manual comparison study carried out using a CD34+ stem cell dataset revealed higher variability from automated outputs compared to manual ones (mean coefficient of variations of 96% and 54%, respectively), and a weak correlation between the two methods (r=0.33) when analysing less well-separated cell populations.
This work has illustrated how the generation of novel synthetic flow cytometry datasets, and their application in comparison studies, has allowed the performance limitations of different automated software tools to be uncovered. The synthetic datasets benefit from having known ground truth not obtainable from real world datasets, therefore have potential utility as digital reference materials, possibly leading to enhanced measurement confidence in automated cell characterisations and enumerations in fields such as diagnostics and cell therapy productions.
EPSRC/MRC Doctoral Training Centre for Regenerative Medicine at Loughborough University. Grant Number: EP/L105072/1
- Mechanical, Electrical and Manufacturing Engineering
Rights holder© Melissa Cheung
NotesA Doctoral Thesis. Submitted in partial fulfilment of the requirements for the award of the degree of Doctor of Philosophy of Loughborough University.
Supervisor(s)Jon Petzing ; Robert J Thomas ; Julian Braybrook ; Jonathan J Campbell
This submission includes a signed certificate in addition to the thesis file(s)
- I have submitted a signed certificate