Next-Gen AI Solves Aspect Ratio and Resolution Issues in Generated Imagery

image provided by pixabay

This post is also available in: עברית (Hebrew)

Generative artificial intelligence (AI) has long faced challenges in producing consistent, high-quality images. Issues such as incorrect details, inconsistent facial symmetry, and problems with image size and resolution have plagued existing models. However, a breakthrough from Rice University computer scientists promises to address these issues and enhance the fidelity of AI-generated images.

Traditional generative AI models, such as Stable Diffusion, Midjourney, and DALL-E, generate images by adding random noise to training images and then removing this noise to create new images. While these models are impressive in generating lifelike and photorealistic images, they struggle with certain issues, particularly when generating images of non-square dimensions. These models often result in repetitive or distorted elements when tasked with creating images of different aspect ratios required for different screens.

Rice University doctoral student Moayed Haji Ali has introduced a new approach called ElasticDiffusion, presented in a peer-reviewed paper at the 2024 Institute of Electrical and Electronics Engineers (IEEE) Conference on Computer Vision and Pattern Recognition (CVPR) in Seattle. ElasticDiffusion aims to resolve the issues associated with non-square images and improve the overall accuracy of generated visuals.

Haji Ali explains that while diffusion models excel at creating square images, they encounter difficulties when tasked with generating images of different aspect ratios. This problem arises because these models combine local and global signals—detailed pixel-level information and overall image outline—into a single data stream. When faced with non-square dimensions, the models often produce visual imperfections due to the repetition or misalignment of these signals.

The ElasticDiffusion method innovates by separating local and global signals into distinct conditional and unconditional generation paths. This separation allows the model to handle the additional space required for non-square images more effectively. By subtracting the conditional model from the unconditional one, ElasticDiffusion produces a score that captures overall image information. Subsequently, it applies detailed pixel-level data in quadrants, ensuring that global information such as the aspect ratio and image content remains intact and avoids confusion.

Despite its promising results, ElasticDiffusion currently requires 6-9 times more processing time compared to other diffusion models. The research team aims to reduce this time to match the efficiency of models like Stable Diffusion and DALL-E while maintaining the enhanced image quality.

The introduction of ElasticDiffusion marks a significant step forward in generative AI, offering the potential for more accurate and versatile image generation. This advancement could transform how AI models are used in various applications, from digital art to real-world object recognition.