Loughborough University
Browse

SOFIA: An automated framework for early soft error assessment, identification, and mitigation

Download (1.96 MB)
journal contribution
posted on 2022-08-31, 14:49 authored by Jonas Gava, Vitor Bandeira, Felipe Rosa, Rafael Garibotti, Ricardo Reis, Luciano OstLuciano Ost

The occurrence of radiation-induced soft errors in electronic computing systems can either affect non-essential system functionalities or violate safety-critical conditions, which might incur life-threatening situations. To reach high safety standard levels, reliability engineers must be able to explore and identify efficient mitigation solutions to reduce the occurrence of soft errors at the initial design cycle. This paper presents SOFIA, a framework that integrates: (i) a set of fault injection techniques that enable bespoke inspections, (ii) machine learning methods to correlate soft error results and system architecture parameters, and (iii) mitigation techniques, including: full and partial triple modular redundancy (TMR) as well as a register allocation technique (RAT), which allocates the critical code (e.g., application’s function, machine learning layer) to a pool of specific processor registers. The proposed framework and novel variations of the RAT are validated through more than 1739k fault injections considering a real Linux kernel, benchmarks from different domains and a multi-core Arm processor.

History

School

  • Mechanical, Electrical and Manufacturing Engineering

Published in

Journal of Systems Architecture

Volume

131

Issue

2022

Publisher

Elsevier

Version

  • VoR (Version of Record)

Rights holder

© The Author(s)

Publisher statement

This is an Open Access Article. It is published by Elsevier under the Creative Commons Attribution 4.0 International Licence (CC BY). Full details of this licence are available at: https://creativecommons.org/licenses/by/4.0/

Acceptance date

2022-08-17

Publication date

2022-08-23

Copyright date

2022

ISSN

1383-7621

Language

  • en

Depositor

Dr Luciano Ost. Deposit date: 23 August 2022

Article number

102710

Usage metrics

    Loughborough Publications

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC