Loughborough University
Browse
Thesis_Jingzhi_Gong.pdf (13.08 MB)

Pushing the boundary: Specialising deep configuration performance learning

Download (13.08 MB)
thesis
posted on 2024-05-20, 12:36 authored by Jingzhi Gong

Software systems often come with a multitude of configuration options that can be adjusted to adapt their performance (e.g., latency, execution time, and energy consumption) to various requirements. However, their combined influence on performance is often unknown, resulting in potential issues for software maintenance.

Worse, the rapid growth in scale and complexity of modern software systems has made performance measurement increasingly resource-intensive, leaving limited datasets in most real-world scenarios. Consequently, it has become a major challenge to build accurate performance prediction models based on these limited measurements. To address this, deep learning approaches have gained popularity in recent years, due to their capabilities of capturing intricate representations and interactions even with only a few samples.

To facilitate this subject, this thesis starts by conducting a systematic literature review specialising the latest deep learning techniques employed for configuration performance modeling, covering 948 searched papers spanning six indexing services, based on which 85 primary papers were extracted and analyzed. The results disclose both positive and negative trends in the literature and reveal potential future directions to explore. Subsequently, three key knowledge gaps are incorporated and formalized as three objectives for this thesis to pursue.

Therein, the first knowledge gap observed is that, despite the presence of different encoding schemes, there is still little understanding of which is better and under what circumstances, which could be harmful to the community. To bridge this gap, this thesis performs an empirical study on three of the most popular encoding schemes for configuration performance learning, namely label, scaled label, and one-hot encoding. The results demonstrate that choosing the encoding scheme is non-trivial, and thereafter, a list of actionable suggestions is provided to enable more reliable decisions.

Meanwhile, the survey also reveals a crucial yet unaddressed knowledge gap, namely, the sparsity inherited from the configuration landscape. To handle this matter, this thesis presents a model-agnostic and sparsity-robust framework based on “divide-and-learn”, dubbed DaL. To mitigate the sample sparsity, the samples from the configuration landscape are divided into distant divisions, for each of which a deep learning model, e.g., Hierarchical Interaction Neural Network, is built to deal with the feature sparsity. Experiment results from 12 real-world systems and five sets of training data reveal that DaL performs better than the state-of-the-art approaches on 44 out of 60 cases with up to 1.61 times improvement in accuracy.

Nonetheless, similar to the majority of the studies reviewed, DaL is limited to predicting under static environments (e.g., hardware, version, and workload), which contradicts the dynamic nature of software. To address this concern, a sequential meta-learning framework is proposed in this thesis, named SeMPL, which significantly enhances the prediction accuracy of state-of-the-art models in multi-environment scenarios. What makes it unique is that unlike common meta-learning frameworks (e.g., MAML) that train the meta environments in parallel, they are trained in a specialised order for deep neural networks. Through comparing with 15 state-of-the-art models under nine systems, it is demonstrated that SeMPL performs considerably better on 89% of the systems with up to 99% accuracy improvement.

Through the extensive studies conducted in this thesis, the critical knowledge gaps within the existing literature specialising deep performance learning have been identified and effectively addressed. As a result, the accuracy of performance learning has been significantly advanced to a new level of precision.

History

School

  • Science

Department

  • Computer Science

Publisher

Loughborough University

Rights holder

© Jingzhi Gong

Publication date

2024

Notes

A Doctoral Thesis. Submitted in partial fulfilment of the requirements for the award of the degree of Doctor of Philosophy of Loughborough University.

Language

  • en

Supervisor(s)

John Woodward ; Baihua Li

Qualification name

  • PhD

Qualification level

  • Doctoral

This submission includes a signed certificate in addition to the thesis file(s)

  • I have submitted a signed certificate