Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ml 70% Match Research Paper Hardware Engineers,Computer Architects,AI Hardware Designers,Researchers in Reliable Computing,Embedded Systems Developers 20 hours ago

FORTALESA: Fault-Tolerant Reconfigurable Systolic Array for DNN Inference

generative-ai › diffusion
📄 Abstract

Abstract: The emergence of Deep Neural Networks (DNNs) in mission- and safety-critical applications brings their reliability to the front. High performance demands of DNNs require the use of specialized hardware accelerators. Systolic array architecture is widely used in DNN accelerators due to its parallelism and regular structure. This work presents a run-time reconfigurable systolic array architecture with three execution modes and four implementation options. All four implementations are evaluated in terms of resource utilization, throughput, and fault tolerance improvement. The proposed architecture is used for reliability enhancement of DNN inference on systolic array through heterogeneous mapping of different network layers to different execution modes. The approach is supported by a novel reliability assessment method based on fault propagation analysis. It is used for the exploration of the appropriate execution mode--layer mapping for DNN inference. The proposed architecture efficiently protects registers and MAC units of systolic array PEs from transient and permanent faults. The reconfigurability feature enables a speedup of up to $3\times$, depending on layer vulnerability. Furthermore, it requires $6\times$ fewer resources compared to static redundancy and $2.5\times$ fewer resources compared to the previously proposed solution for transient faults.

Key Contributions

FORTALESA presents a novel run-time reconfigurable systolic array architecture designed for fault-tolerant DNN inference. It offers multiple execution modes and implementation options, enabling heterogeneous mapping of network layers to optimize reliability and performance, supported by a new fault propagation analysis method.

Business Value

Enables the deployment of reliable AI systems in safety-critical domains (e.g., autonomous vehicles, medical devices), reducing risks and ensuring consistent performance.