Loughborough University
Browse

Framework for rapid hardware prototyping using custom floating-point arithmetic

Download (45.23 MB)
thesis
posted on 2022-12-02, 12:32 authored by Nelson De-Sousa-Campos

The convolution operator is usually a linear operation in many image processing algorithms, but this linearity compromises essential features and details inherent in the non-linearity present in many applications. However, due to its slow processing, the non-linear spatial filter is a significant bottleneck in many software applications. Due to their complexity, the hardware acceleration of those filters is non-trivial. Typical strategies include implementing those image algorithms in fixed-point arithmetic or using high-level synthesis (HLS) to translate C or C++ into synthesizable hardware. Fixed-point arithmetic has many advantages, including simplified architectures and low hardware resource usage, but fixed-point architectures have limited accuracy and precision. High-level synthesis usually leads to fast, functionally correct implementations but at the cost of abstract hardware architectures. Floating-point implementations tend to be complex and infer large silicon or resource usage areas on Field Programmable Gate Arrays (FPGAs). However, the precision and dynamic range gains often justify the hardware implementation in floating-point arithmetic. The customization of mantissa and exponent widths enables the design of hardware architectures with good precision and dynamic range while still achieving hardware compactness. This work explores the hardware implementations of image and video processing applications in custom floating-point arithmetic. Multiple operations (including addition, multiplication, division, logarithm, square root and conversion between floating-point and fixed-point) are implemented and automatically generated using custom floating-point with parameterizable bit-width for exponent and mantissa. Those operations are building blocks for algorithms from pixel-wise to line-wise operations, including non-linear filter operations with generic functions. The implementations are tested in FPGA processing real-time high-resolution video. Hardware acceleration results show a speed-up factor of up to 810 times compared to software implementations. Finally, the hardware autogeneration is accomplished with a domain-specific language that translates untimed code with a syntax similar to Python into pipelined hardware described in SystemVerilog. This autogeneration enables non-experts to quickly develop complex and efficient real-time image and video processing algorithms.

Funding

EPSRC Centre for Doctoral Training in Embedded Intelligence

Engineering and Physical Sciences Research Council

Find out more...

History

School

  • Science

Department

  • Computer Science

Publisher

Loughborough University

Rights holder

© Nelson Carlos de Sousa Campos

Publication date

2022

Notes

A Doctoral Thesis. Submitted in partial fulfilment of the requirements for the award of the degree of Doctor of Philosophy of Loughborough University.

Language

  • en

Supervisor(s)

Syeda Fatima ; Qinggang Meng

Qualification name

  • PhD

Qualification level

  • Doctoral

This submission includes a signed certificate in addition to the thesis file(s)

  • I have submitted a signed certificate

Usage metrics

    Computer Science Theses

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC