ResMaster

ResMaster: Mastering High-Resolution Image Generation via Structural and Fine-Grained Guidance

¹The University of Tokyo, ²The Chinese University of Hong Kong, ³Ant Group,
^‡Corresponding author.

Abstract

Diffusion models excel at producing high-quality images; however, scaling to higher resolutions, such as 4K, often results in over-smoothed content, structural distortions, and repetitive patterns. To this end, we introduce ResMaster, a novel, training-free method that empowers resolution-limited diffusion models to generate high-quality images beyond resolution restrictions. Specifically, ResMaster leverages a low-resolution reference image created by a pre-trained diffusion model to provide structural and fine-grained guidance for crafting high-resolution images on a patch-by-patch basis. To ensure a coherent global structure, ResMaster meticulously aligns the low-frequency components of high-resolution patches with the low-resolution reference at each denoising step. For fine-grained guidance, tailored image prompts based on the low-resolution reference and enriched textual prompts produced by a vision-language model are incorporated. This approach could significantly mitigate local pattern distortions and improve detail refinement. Extensive experiments validate that ResMaster sets a new benchmark for high-resolution image generation and demonstrates promising efficiency.

More Samples

Resolution: 3072x4096; Prompt: a girl astronaut exploring the cosmos, floating among planets and stars, high quality detail, , anime screencap, studio ghibli style, illustration, high contrast, masterpiece, best quality.

Resolution: 2048x4096; Prompt: The mesmerizing northern lights dancing in the night sky over a frozen lake. The ice reflects the vibrant colors of the aurora borealis, adding to the surreal beauty of the scene.

Resolution: 4096x4096; Prompt: a lion, colorful, low-poly, cyan and orange eyes, poly-hd, 3d, low-poly game art, polygon mesh, jagged, blocky, wireframe edges, centered composition.

Resolution: 4096x4096; Prompt: Nike sneaker concept art, (((made out of cotton candy clouds))) , luxury, futurist, stunning unreal engine render, product photography, 8k, hyper-realistic. Surrealism.

Resolution: 3072x3072; Prompt: Happy dreamy owl monster sitting on a tree branch, colorful glittering particles, forest background, detailed feathers.

Resolution: 3072x4096; Prompt: beautiful silhouette shot of a ballerina dancer.

Resolution: 4096x4096; Prompt: An sterotypical alien with glowing eyes in the style of translucent liquid metal filled with swirling galaxies, ray tracing, raw character, 32k uhd, schlieren photography, conceptual portraiture, wet - on - wet blending

Resolution: 4096x4096; Prompt: A cozy winter scene, a snow-covered cabin with smoke rising from the chimney against a backdrop of pine trees, best quality, 4K.

BibTeX

@misc{shi2024resmaster, title={ResMaster: Mastering High-Resolution Image Generation via Structural and Fine-Grained Guidance}, author={Shuwei Shi, Wenbo Li, Yuechen Zhang, Jingwen He, Biao Gong, Yinqiang Zheng}, year={2024}, eprint={2406.16476}, archivePrefix={arXiv}, primaryClass={cs.CV} }