Next-Gen AI Solves Aspect Ratio and Resolution Issues in Generated Imagery

Sep 16, 2024

This post is also available in: עברית (Hebrew)

Generative artificial intelligence (AI) has long faced challenges in producing consistent, high-quality images. Issues such as incorrect details, inconsistent facial symmetry, and problems with image size and resolution have plagued existing models. However, a breakthrough from Rice University computer scientists promises to address these issues and enhance the fidelity of AI-generated images.

Traditional generative AI models, such as Stable Diffusion, Midjourney, and DALL-E, generate images by adding random noise to training images and then removing this noise to create new images. While these models are impressive in generating lifelike and photorealistic images, they struggle with certain issues, particularly when generating images of non-square dimensions. These models often result in repetitive or distorted elements when tasked with creating images of different aspect ratios required for different screens.

Rice University doctoral student Moayed Haji Ali has introduced a new approach called ElasticDiffusion, presented in a peer-reviewed paper at the 2024 Institute of Electrical and Electronics Engineers (IEEE) Conference on Computer Vision and Pattern Recognition (CVPR) in Seattle. ElasticDiffusion aims to resolve the issues associated with non-square images and improve the overall accuracy of generated visuals.

Haji Ali explains that while diffusion models excel at creating square images, they encounter difficulties when tasked with generating images of different aspect ratios. This problem arises because these models combine local and global signals—detailed pixel-level information and overall image outline—into a single data stream. When faced with non-square dimensions, the models often produce visual imperfections due to the repetition or misalignment of these signals.

The ElasticDiffusion method innovates by separating local and global signals into distinct conditional and unconditional generation paths. This separation allows the model to handle the additional space required for non-square images more effectively. By subtracting the conditional model from the unconditional one, ElasticDiffusion produces a score that captures overall image information. Subsequently, it applies detailed pixel-level data in quadrants, ensuring that global information such as the aspect ratio and image content remains intact and avoids confusion.

Despite its promising results, ElasticDiffusion currently requires 6-9 times more processing time compared to other diffusion models. The research team aims to reduce this time to match the efficiency of models like Stable Diffusion and DALL-E while maintaining the enhanced image quality.

The introduction of ElasticDiffusion marks a significant step forward in generative AI, offering the potential for more accurate and versatile image generation. This advancement could transform how AI models are used in various applications, from digital art to real-world object recognition.

Next-Gen AI Solves Aspect Ratio and Resolution Issues in Generated Imagery

Latest

3D-Printed Antennas Bend Without Breaking the Signal

New Method Enhances AI’s Ability to Recognize Personalized Objects

AI-Powered Eye Chip: A New Chapter for the Blind

New Tech Solves Key Weakness in Solid-State Batteries

99% Accuracy: How Never Mine is Shaping the Future of Demining

Stronger Magnets, Smaller Motors: A Boost for Clean Energy Tech

Batteries, Not Flux Capacitors: The Real Future of Urban Flight

Magnetic Origami Bots Take a Step Toward Smart Medicine

Smart, Scalable, Mobile: The Next-Gen Turret System

Tracking Mosquitoes and Floods from Space

Microscopic DNA Petals Mimic Nature to Perform Medical Tasks

The Engine That Breaks the Thermodynamic Rulebook

Smartwatch Breakthrough Enables Centimeter-Level GPS Accuracy

When AI Becomes Your Space Medic

Can’t be Cloned: Hydrogel Gives Products a Unique ID

Bridging the Global Talent Gap with AI: How Iverse Is Redefining...

Print Smarter, Not Harder: Open-Source Multi-Material 3D Printing

Clear Tech, Clear Skin: UV Safety Goes Wearable

Atlas: The AI Browser Trying to Change How You Search Online

Swallowable Bioprinter Targets Internal Wounds Without Surgery