Loughborough University
Browse

Using machine learning techniques to evaluate multicore soft error reliability

Download (6.96 MB)
journal contribution
posted on 2019-06-06, 10:38 authored by Felipe Rocha da Rosa, Rafael Garibotti, Luciano OstLuciano Ost, Ricardo Reis
Virtual platform frameworks have been extended to allow earlier soft error analysis of more realistic multicore systems (i.e., real software stacks, state-of-the-art ISAs). The high observability and simulation performance of underlying frameworks enable to generate and collect more error/failurerelated data, considering complex software stack configurations, in a reasonable time. When dealing with sizeable failure-related data sets obtained from multiple fault campaigns, it is essential to filter out parameters (i.e., features) without a direct relationship with the system soft error analysis. In this regard, this paper proposes the use of supervised and unsupervised machine learning techniques, aiming to eliminate non-relevant information as well as identify the correlation between fault injection results and application and platform characteristics. This novel approach provides engineers with appropriate means that able are able to investigate new and more efficient fault mitigation techniques. The underlying approach is validated with an extensive data set gathered from more than 1.2 million fault injections, comprising several benchmarks, a Linux OS and parallelization libraries (e.g., MPI, OpenMP), as well as through a realistic automotive case study.

History

School

  • Mechanical, Electrical and Manufacturing Engineering

Published in

IEEE Transactions on Circuits and Systems I: Regular Papers

Volume

66

Issue

6

Pages

2151 - 2164

Citation

DA ROSA, F.R. .... et al., 2019. Using machine learning techniques to evaluate multicore soft error reliability. IEEE Transactions on Circuits and Systems I: Regular Papers, 66(6), pp. 2151 - 2164.

Publisher

© Institute of Electrical and Electronics Engineers (IEEE)

Version

  • AM (Accepted Manuscript)

Acceptance date

2019-03-11

Publication date

2019-04-17

Notes

Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

ISSN

1549-8328

eISSN

1558-0806

Language

  • en

Usage metrics

    Loughborough Publications

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC