arxiv_cv 90% Match Research Paper Machine Learning Researchers,Computer Vision Scientists,Deep Learning Engineers 1 week ago

FARMER: Flow AutoRegressive Transformer over Pixels

generative-ai › flow-models

📄 Abstract

Abstract: Directly modeling the explicit likelihood of the raw data distribution is key topic in the machine learning area, which achieves the scaling successes in Large Language Models by autoregressive modeling. However, continuous AR modeling over visual pixel data suffer from extremely long sequences and high-dimensional spaces. In this paper, we present FARMER, a novel end-to-end generative framework that unifies Normalizing Flows (NF) and Autoregressive (AR) models for tractable likelihood estimation and high-quality image synthesis directly from raw pixels. FARMER employs an invertible autoregressive flow to transform images into latent sequences, whose distribution is modeled implicitly by an autoregressive model. To address the redundancy and complexity in pixel-level modeling, we propose a self-supervised dimension reduction scheme that partitions NF latent channels into informative and redundant groups, enabling more effective and efficient AR modeling. Furthermore, we design a one-step distillation scheme to significantly accelerate inference speed and introduce a resampling-based classifier-free guidance algorithm to boost image generation quality. Extensive experiments demonstrate that FARMER achieves competitive performance compared to existing pixel-based generative models while providing exact likelihoods and scalable training.

Authors (9)

Guangting Zheng

Qinyu Zhao

Tao Yang

Fei Xiao

Zhijie Lin

Jie Wu

+3 more

Submitted

October 27, 2025

arXiv Category

cs.CV

arXiv PDF

Key Contributions

FARMER unifies Normalizing Flows (NF) and Autoregressive (AR) models for tractable likelihood estimation and high-quality image synthesis directly from raw pixels. It uses an invertible autoregressive flow and a self-supervised dimension reduction scheme to handle pixel redundancy and complexity, enabling efficient AR modeling.

Business Value

Enables more accurate and efficient generative models for image creation and understanding, potentially leading to better image compression, anomaly detection, and data augmentation techniques.

Paper Metadata

Innovation Type

Methodological

Deployment Feasibility

Moderate. Requires careful implementation of invertible flows and autoregressive components, and computational resources for training.

Limitations Addressed

The difficulty of direct explicit likelihood modeling for raw visual pixel data due to long sequences and high-dimensional spaces, and redundancy/complexity in pixel-level modeling.

Technical Tags

likelihood estimationimage synthesisnormalizing flowsautoregressive modelsraw pixelsdimension reductionself-supervised learninginvertible models

Research Topics

Generative ModelsDeep LearningComputer VisionLikelihood-based Models

Methods & Architectures

FARMER frameworkInvertible autoregressive flowSelf-supervised dimension reduction Normalizing FlowsAutoregressive Models

Applications & Tasks

Computer Vision Image Generation Machine Learning Research Tractable likelihood estimationHigh-dimensional data modelingLong sequence modelingRedundancy in pixel data Image SynthesisLikelihood Estimation

Related Fields

Machine LearningArtificial IntelligenceStatistical Modeling

Keywords

normalizing flowsautoregressive modelsimage synthesislikelihood estimationraw pixelsdimension reductionself-supervised learninginvertible modelsgenerative AIcomputer visiondeep learningflow models

Academic Context

#Generative Models#Deep Learning#Computer Vision#Likelihood-based Models

Commercial Potential

Potential Products

High-fidelity image generation toolsAdvanced image compression algorithmsAnomaly detection systems

Target Industries

Media and EntertainmentE-commerceMedical ImagingTechnology

Use Case Examples

Generating photorealistic images from learned distributionsAccurately modeling the probability of observing specific image data

Competitive Edge

Offers a principled way to combine the strengths of normalizing flows (tractable likelihoods) and autoregressive models (modeling complex distributions) for direct pixel modeling, potentially outperforming GANs or VAEs in likelihood-based tasks.

Market Opportunity

Large market for generative AI and image processing technologies.

Revenue Models

Licensing of modelsAPI services for image generation.

Resource Requirements

Compute Needs

High (for training deep generative models)

Data Requirements

Large datasets of images.

Deployment Constraints

Computational cost, memory usage for large models.

Scalability

Scalability depends on the efficiency of the flow and autoregressive components, and the effectiveness of dimension reduction.

Production Readiness

Maturity Level

Research

Time to Market

2-4 years

Patent Potential

Moderate (novel combination of NF and AR, dimension reduction technique)

View Full Paper Back to Papers