arxiv_ai 95% Match Research Paper Software engineers,Security analysts,Researchers in software engineering and ML,Tool developers 1 week ago

MAGNET: A Multi-Graph Attentional Network for Code Clone Detection

graph-neural-networks › graph-learning

📄 Abstract

Abstract: Code clone detection is a fundamental task in software engineering that underpins refactoring, debugging, plagiarism detection, and vulnerability analysis. Existing methods often rely on singular representations such as abstract syntax trees (ASTs), control flow graphs (CFGs), and data flow graphs (DFGs), which capture only partial aspects of code semantics. Hybrid approaches have emerged, but their fusion strategies are typically handcrafted and ineffective. In this study, we propose MAGNET, a multi-graph attentional framework that jointly leverages AST, CFG, and DFG representations to capture syntactic and semantic features of source code. MAGNET integrates residual graph neural networks with node-level self-attention to learn both local and long-range dependencies, introduces a gated cross-attention mechanism for fine-grained inter-graph interactions, and employs Set2Set pooling to fuse multi-graph embeddings into unified program-level representations. Extensive experiments on BigCloneBench and Google Code Jam demonstrate that MAGNET achieves state-of-the-art performance with an overall F1 score of 96.5\% and 99.2\% on the two datasets, respectively. Ablation studies confirm the critical contributions of multi-graph fusion and each attentional component. Our code is available at https://github.com/ZixianReid/Multigraph_match

Authors (2)

Zixian Zhang

Takfarinas Saber

Submitted

October 28, 2025

arXiv Category

cs.SE

arXiv PDF

Key Contributions

MAGNET is proposed as a novel multi-graph attentional framework that jointly leverages AST, CFG, and DFG representations for code clone detection. By integrating residual GNNs, self-attention, and a gated cross-attention mechanism, it captures richer syntactic and semantic features and their inter-dependencies, outperforming existing hybrid approaches with handcrafted fusion strategies.

Business Value

Improves software development efficiency and security by automating the detection of redundant or potentially vulnerable code, aiding in refactoring, debugging, and vulnerability analysis.

Paper Metadata

Innovation Type

Multi-graph framework

Deployment Feasibility

Moderate. Requires integration into code analysis tools, potentially demanding significant computational resources for large codebases.

Limitations Addressed

Partial semantic capture by singular code representations (AST, CFG, DFG); ineffective fusion strategies in existing hybrid approaches.

Performance Gains

Achieves state-of-the-art performance on code clone detection tasks by effectively fusing multiple graph representations.

Technical Tags

code clone detectiongraph neural networksabstract syntax trees (AST)control flow graphs (CFG)data flow graphs (DFG)multi-graph representationself-attentioncross-attention

Research Topics

Software EngineeringMachine LearningGraph Neural NetworksProgram AnalysisCode Understanding

Methods & Architectures

multi-graph attentional frameworkresidual graph neural networksnode-level self-attentiongated cross-attention mechanismSet2Set pooling Multi-graph Attentional Network (MAGNET)Graph Neural Networks (GNNs)Residual GNNs

Applications & Tasks

Software Engineering Code Analysis Cybersecurity Code clone detectionSemantic code understandingInformation fusion Identifying similar code snippetsDetecting code plagiarismFinding vulnerabilities

Related Fields

Program ComprehensionStatic AnalysisMachine Learning for Software EngineeringGraph Theory

Keywords

Code clone detectionGraph Neural NetworksASTCFGDFGMulti-graphSelf-attentionCross-attentionSoftware engineeringProgram analysisSource codeVulnerability detection

Academic Context

#Software Engineering#Machine Learning#Graph Neural Networks#Program Analysis#Code Understanding

Commercial Potential

Potential Products

Automated code review toolsPlagiarism detection software for codeVulnerability scanning tools

Target Industries

Software DevelopmentTechnologyCybersecurityIT Services

Use Case Examples

Identifying duplicated code sections during refactoringDetecting potential security vulnerabilities introduced by copy-pasted codeEnsuring code originality in academic projects

Competitive Edge

Outperforms existing methods by effectively fusing multiple graph representations using advanced attention mechanisms, offering a more comprehensive understanding of code semantics.

Market Opportunity

Significant market for software development tools and security solutions.

Revenue Models

Licensing of the MAGNET technologyintegration into IDEs and code analysis platforms.

Resource Requirements

Compute Needs

High, especially for large codebases and complex graph structures.

Data Requirements

Large code repositories with labeled clones.

Deployment Constraints

Computational resources,Integration with existing development workflows

Scalability

Scalability can be a challenge for very large codebases due to the complexity of graph representations and attention mechanisms.

Production Readiness

Maturity Level

Research

Time to Market

2-3 years for integration into commercial code analysis tools.

Patent Potential

Moderate, for the novel MAGNET architecture and fusion techniques.

View Full Paper Back to Papers