Modelling atmospheric ozone concentration using machine learning algorithms
2017-05-22T15:07:13Z (GMT) by
Air quality monitoring is one of several important tasks carried out in the area of environmental science and engineering. Accordingly, the development of air quality predictive models can be very useful as such models can provide early warnings of pollution levels increasing to unsatisfactory levels. The literature review conducted within the research context of this thesis revealed that only a limited number of widely used machine learning algorithms have been employed for the modelling of the concentrations of atmospheric gases such as ozone, nitrogen oxides etc. Despite this observation the research and technology area of machine learning has recently advanced significantly with the introduction of ensemble learning techniques, convolutional and deep neural networks etc. Given these observations the research presented in this thesis aims to investigate the effective use of ensemble learning algorithms with optimised algorithmic settings and the appropriate choice of base layer algorithms to create effective and efficient models for the prediction and forecasting of specifically, ground level ozone (O3). Three main research contributions have been made by this thesis in the application area of modelling O3 concentrations. As the first contribution, the performance of several ensemble learning (Homogeneous and Heterogonous) algorithms were investigated and compared with all popular and widely used single base learning algorithms. The results have showed impressive prediction performance improvement obtainable by using meta learning (Bagging, Stacking, and Voting) algorithms. The performances of the three investigated meta learning algorithms were similar in nature giving an average 0.91 correlation coefficient, in prediction accuracy. Thus as a second contribution, the effective use of feature selection and parameter based optimisation was carried out in conjunction with the application of Multilayer Perceptron, Support Vector Machines, Random Forest and Bagging based learning techniques providing significant improvements in prediction accuracy. The third contribution of research presented in this thesis includes the univariate and multivariate forecasting of ozone concentrations based of optimised Ensemble Learning algorithms. The results reported supersedes the accuracy levels reported in forecasting Ozone concentration variations based on widely used, single base learning algorithms. In summary the research conducted within this thesis bridges an existing research gap in big data analytics related to environment pollution modelling, prediction and forecasting where present research is largely limited to using standard learning algorithms such as Artificial Neural Networks and Support Vector Machines often available within popular commercial software packages.