Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: As AI systems enter high-stakes domains, evaluation must extend beyond
predictive accuracy to include explainability, fairness, robustness, and
sustainability. We introduce RAISE (Responsible AI Scoring and Evaluation), a
unified framework that quantifies model performance across these four
dimensions and aggregates them into a single, holistic Responsibility Score. We
evaluated three deep learning models: a Multilayer Perceptron (MLP), a Tabular
ResNet, and a Feature Tokenizer Transformer, on structured datasets from
finance, healthcare, and socioeconomics. Our findings reveal critical
trade-offs: the MLP demonstrated strong sustainability and robustness, the
Transformer excelled in explainability and fairness at a very high
environmental cost, and the Tabular ResNet offered a balanced profile. These
results underscore that no single model dominates across all responsibility
criteria, highlighting the necessity of multi-dimensional evaluation for
responsible model selection. Our implementation is available at:
https://github.com/raise-framework/raise.
Authors (2)
Loc Phuc Truong Nguyen
Hung Thanh Do
Submitted
October 21, 2025
Key Contributions
Introduces RAISE, a unified framework for quantifying AI model performance across explainability, fairness, robustness, and sustainability, aggregating these into a single Responsibility Score. This framework is crucial for selecting AI models in high-stakes domains by providing a holistic view beyond just predictive accuracy.
Business Value
Enables organizations to make more informed and ethical decisions when deploying AI in critical sectors like finance and healthcare, reducing risks associated with biased or unreliable AI systems.