IFCSR: Inference-Free Fidelity-Realism Control for One-Step Diffusion-based Real-World Image Super-Resolution

BLUEDOT Inc.
CVPR 2026 (Poster)

*Equal Contribution
image/png

Visual results of IFCSR for real-world image super-resolution. Users can flexibly explore the fidelity-realism trade-off via a single parameter with no extra inferece (e.g., the fourth middle images), once fidelity- and realism-specific images (i.e., the two outermost images) are initially inferred.


Abstract

Diffusion models have recently achieved remarkable success in real-world image super-resolution (ISR), typically balancing a trade-off between fidelity (i.e., similarity to HR images) and realism (i.e., perceptual naturalness). To better account for subjective preferences in image quality, controllable diffusion-based methods have been explored, allowing personalized adjustment of this trade-off via tunable parameters. While existing controllable methods have shown effective control, they typically operate in the latent space and require repeated network inference during adjustment, eventually limiting their practicality. In this paper, we propose IFCSR, a simple yet practical approach for one-step diffusion-based real-world ISR that enables inference-free control between fidelity and realism. The key idea is to design a controllable model that adjusts the fidelity-realism trade-off in the image space, rather than in the latent space. Such an image-space control allows users to seamlessly adjust the trade-off without extra inference after an initial inference of fidelity- and realism-specific images. We further introduce a two-stage training scheme and specialized losses that encourage the controllable space to span a broad spectrum of fidelity and realism. Our method achieves quality competitive with state-of-the-art models while providing a practical advantage through inference-free control.

Method

image/png

Overview of the training and the inference processes of our approach


Formulation

Given the fidelity- and realism-specific networks, $f_{\theta}$ and $f_{\phi}$, we define our controllable model as a linear combination of their outputs, i.e., $\hat{x}^{\text{fid}}_{H}$ and $\hat{x}^{\text{real}}_{H}$, using a parameter $\gamma$.

$\hat{x}_{H} = (1-\gamma)f_{\theta}(x_{L}) + {\gamma}f_{\phi}(x_{L}) = (1-\gamma)\hat{x}^{\text{fid}}_{H} + \gamma\hat{x}^{\text{real}}_{H}$

It is worth noting that linear combination in the image space enables inference-free fidelity-realism control via the parameter 𝛾, once $\hat{x}^{\text{fid}}_{H}$ and $\hat{x}^{\text{real}}_{H}$ are obtained.

Two-Stage Training Scheme

We propose a two-stage training scheme, which trains the fidelity-specific network and then optimizes the realism-specific network while freezing the former.

Fidelity- and Realism-Specific Losses

We design depth-dependent feature-space losses that emphasize shallow features for fidelity-specific training and deep features for realism-specific training

$\mathcal{L}_{s}(\hat{x}_{H},x_{H};w)=\frac{1}{\sum_{j=0}^{L} w_j}\sum_{l=0}^{L}w_l\mathcal{D}(\hat{x}_{H},x_{H},l)$

$\mathcal{L}_{\text{1st}} = \mathcal{L}_{s}(\hat{x}^{\text{fid}}_{H},x_H;w^{\text{dec}})$
$\mathcal{L}_{\text{2nd}} = \mathcal{L}_{s}(\hat{x}_{H},x_H;w^{\text{inc}})$

Quantitative Result


Qualitative Result


Video Demo



BibTeX

@InProceedings{Back_2026_CVPR,
    author    = {Back, Jonghee and Kim, Jongju and Kim, Jeong-Uk and Kim, Eunjin and Jeon, Minyong},
    title     = {IFCSR: Inference-Free Fidelity-Realism Control for One-Step Diffusion-based Real-World Image Super-Resolution},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2026},
    pages     = {38187-38197}
   }