posted on 2019-06-06, 10:38authored byFelipe Rocha da Rosa, Rafael Garibotti, Luciano OstLuciano Ost, Ricardo Reis
Virtual platform frameworks have been extended
to allow earlier soft error analysis of more realistic multicore
systems (i.e., real software stacks, state-of-the-art ISAs). The
high observability and simulation performance of underlying
frameworks enable to generate and collect more error/failurerelated data, considering complex software stack configurations,
in a reasonable time. When dealing with sizeable failure-related
data sets obtained from multiple fault campaigns, it is essential to
filter out parameters (i.e., features) without a direct relationship
with the system soft error analysis. In this regard, this paper proposes the use of supervised and unsupervised machine learning
techniques, aiming to eliminate non-relevant information as well
as identify the correlation between fault injection results and
application and platform characteristics. This novel approach
provides engineers with appropriate means that able are able to
investigate new and more efficient fault mitigation techniques.
The underlying approach is validated with an extensive data set
gathered from more than 1.2 million fault injections, comprising
several benchmarks, a Linux OS and parallelization libraries
(e.g., MPI, OpenMP), as well as through a realistic automotive
case study.
History
School
Mechanical, Electrical and Manufacturing Engineering
Published in
IEEE Transactions on Circuits and Systems I: Regular Papers
Volume
66
Issue
6
Pages
2151 - 2164
Citation
DA ROSA, F.R. .... et al., 2019. Using machine learning techniques to evaluate multicore soft error reliability. IEEE Transactions on Circuits and Systems I: Regular Papers, 66(6), pp. 2151 - 2164.
Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.