posted on 2022-12-02, 12:32authored byNelson De-Sousa-Campos
<p dir="ltr">The convolution operator is usually a linear operation in many image processing algorithms, but this linearity compromises essential features and details inherent in the non-linearity present in many applications. However, due to its slow processing, the non-linear spatial filter is a significant bottleneck in many software applications. Due to their complexity, the hardware acceleration of those filters is non-trivial. Typical strategies include implementing those image algorithms in fixed-point arithmetic or using high-level synthesis (HLS) to translate C or C++ into synthesizable hardware. Fixed-point arithmetic has many advantages, including simplified architectures and low hardware resource usage, but fixed-point architectures have limited accuracy and precision. High-level synthesis usually leads to fast, functionally correct implementations but at the cost of abstract hardware architectures. Floating-point implementations tend to be complex and infer large silicon or resource usage areas on Field Programmable Gate Arrays (FPGAs). However, the precision and dynamic range gains often justify the hardware implementation in floating-point arithmetic. The customization of mantissa and exponent widths enables the design of hardware architectures with good precision and dynamic range while still achieving hardware compactness. This work explores the hardware implementations of image and video processing applications in custom floating-point arithmetic. Multiple operations (including addition, multiplication, division, logarithm, square root and conversion between floating-point and fixed-point) are implemented and automatically generated using custom floating-point with parameterizable bit-width for exponent and mantissa. Those operations are building blocks for algorithms from pixel-wise to line-wise operations, including non-linear filter operations with generic functions. The implementations are tested in FPGA processing real-time high-resolution video. Hardware acceleration results show a speed-up factor of up to 810 times compared to software implementations. Finally, the hardware autogeneration is accomplished with a domain-specific language that translates untimed code with a syntax similar to Python into pipelined hardware described in SystemVerilog. This autogeneration enables non-experts to quickly develop complex and efficient real-time image and video processing algorithms.</p>
Funding
EPSRC Centre for Doctoral Training in Embedded Intelligence
Engineering and Physical Sciences Research Council