View transformation and novel view synthesis based on deep learning
Novel View Synthesis (NVS) allows the generation of target images with different views from a single input image or multiple images while view transformation is a key intermediate step for the task. Recent years have seen an increasing interest in NVS due to its wide applications, whilst with the development of deep learning, methods of NVS based on deep learning have become mainstream.
This thesis makes its efforts onto NVS with four main aspects as follows. First, extensive related works are reviewed about view transformation and NVS based on both the traditional and deep learning approaches. Second, instead of utilising vanishing points, this thesis introduces a novel deep learning based approach to automatically correct camera rotation in images, without prior knowledge of the camera pose. Specifically, a referential 3D ground plane is first derived from the RGB image through two convolution neural networks and a rotation matrix can be obtained by transforming the normal vector of the ground plane to (0, 1, 0). A novel projection mapping algorithm based on this rotation matrix is then developed to achieve automatic view transformation. Third, a novel view synthesis approach is proposed that incorporates a Neural Image Refinement Network (NIRN) and generates both depth and colour images for the target view in an end-to-end manner. Since the direct application of geometric projection mapping will result in empty regions and/or distortions, the proposed approach embeds a novel refinement network into the view synthesis pipeline for improved performance. The appearance of the colour image greatly benefits from the generated depth image as it provides an intermediate projection relationship for the object in the 3D world. Last not least, it is found that relying solely on features from the input image in the source view might not be sufficient to generate a good target image, especially when only a single input image is available. To address this problem, features from both an input and a warped image are fused to collaboratively generate pixels in the new view. The warped image is an intermediate output generated by projecting pixels of the input image onto the target view via an estimated depth. In addition, the deployment of channel attention blocks and multi-scale depth estimation network further improve the synthesised image quality.
All experiments are implemented and evaluated thoroughly on our own dataset or public dataset, demonstrating robust and superior view transformation and synthesis results.
Funding
EPSRC Centre for Doctoral Training in Embedded Intelligence
Engineering and Physical Sciences Research Council
Find out more...SukeIntel Co., Ltd.
History
School
- Science
Department
- Computer Science
Publisher
Loughborough UniversityRights holder
© Lei JiangPublication date
2023Notes
A Doctoral Thesis. Submitted in partial fulfilment of the requirements for the award of the degree of Doctor of Philosophy of Loughborough University.Language
- en
Supervisor(s)
Qinggang Meng ; Gerald SchaeferQualification name
- PhD
Qualification level
- Doctoral
This submission includes a signed certificate in addition to the thesis file(s)
- I have submitted a signed certificate