arxiv_ai 85% Match Research Paper AI researchers,ML engineers,Data scientists,AI ethicists,Regulators 2 weeks ago

RAISE: A Unified Framework for Responsible AI Scoring and Evaluation

ai-safety › robustness

📄 Abstract

Abstract: As AI systems enter high-stakes domains, evaluation must extend beyond predictive accuracy to include explainability, fairness, robustness, and sustainability. We introduce RAISE (Responsible AI Scoring and Evaluation), a unified framework that quantifies model performance across these four dimensions and aggregates them into a single, holistic Responsibility Score. We evaluated three deep learning models: a Multilayer Perceptron (MLP), a Tabular ResNet, and a Feature Tokenizer Transformer, on structured datasets from finance, healthcare, and socioeconomics. Our findings reveal critical trade-offs: the MLP demonstrated strong sustainability and robustness, the Transformer excelled in explainability and fairness at a very high environmental cost, and the Tabular ResNet offered a balanced profile. These results underscore that no single model dominates across all responsibility criteria, highlighting the necessity of multi-dimensional evaluation for responsible model selection. Our implementation is available at: https://github.com/raise-framework/raise.

Authors (2)

Loc Phuc Truong Nguyen

Hung Thanh Do

Submitted

October 21, 2025

arXiv Category

cs.LG

arXiv PDF

Key Contributions

Introduces RAISE, a unified framework for quantifying AI model performance across explainability, fairness, robustness, and sustainability, aggregating these into a single Responsibility Score. This framework is crucial for selecting AI models in high-stakes domains by providing a holistic view beyond just predictive accuracy.

Business Value

Enables organizations to make more informed and ethical decisions when deploying AI in critical sectors like finance and healthcare, reducing risks associated with biased or unreliable AI systems.

Paper Metadata

Innovation Type

Framework Development

Deployment Feasibility

High, as it's a framework for evaluation, not a deployable model itself. Can be integrated into existing MLOps pipelines.

Limitations Addressed

Lack of unified evaluation for AI responsibility beyond accuracy, difficulty in comparing models across multiple ethical and performance dimensions.

Technical Tags

responsible AImodel evaluationexplainabilityfairnessrobustnesssustainabilitymulti-dimensional scoringdeep learningstructured data

Research Topics

AI EthicsModel EvaluationResponsible AIAI GovernanceMachine Learning

Methods & Architectures

Responsible AI Scoring and Evaluation (RAISE) frameworkMultilayer Perceptron (MLP)Tabular ResNetFeature Tokenizer Transformer Multilayer Perceptron (MLP)ResNetTransformer

Applications & Tasks

Finance Healthcare Socioeconomics AI Model EvaluationResponsible AI DeploymentTrade-off Analysis Scoring AI modelsEvaluating AI responsibilityModel selection

Datasets & Benchmarks

Datasets

structured datasets from finance, healthcare, socioeconomics

explainabilityfairnessrobustnesssustainabilityResponsibility Score

Related Fields

AI EthicsMachine LearningData ScienceSoftware Engineering

Keywords

Responsible AIAI evaluationexplainabilityfairnessrobustnesssustainabilityAI scoringdeep learningfinance AIhealthcare AIsocioeconomics AImodel selectionAI governance

Academic Context

#AI Ethics#Model Evaluation#Responsible AI#AI Governance#Machine Learning

Commercial Potential

Potential Products

AI risk assessment toolsResponsible AI auditing platforms

Target Industries

FinanceHealthcareTechnologyGovernment

Use Case Examples

Evaluating loan application AI for fairness and robustnessAssessing diagnostic AI for explainability and sustainability

Competitive Edge

Offers a unified, multi-dimensional approach to AI evaluation, addressing the limitations of single-metric evaluations.

Market Opportunity

Growing market for AI governance and responsible AI tools.

Revenue Models

Consulting serviceslicensing of the framework/platform.

Resource Requirements

Compute Needs

Standard compute for training and evaluating deep learning models.

Data Requirements

Structured datasets from finance, healthcare, and socioeconomics.

Deployment Constraints

Requires careful definition and measurement of each responsibility dimension.

Scalability

The framework itself is scalable; applying it to models depends on model complexity and data size.

Regulatory Considerations

Directly addresses concerns related to AI Act and similar regulations by providing evaluation metrics.

Production Readiness

Maturity Level

Research

Time to Market

N/A (framework)

Patent Potential

Low, as it's a framework and methodology.

View Full Paper Back to Papers