Apple Unveils New Multimodal Large Language Models

Image by Unsplash

This post is also available in: עברית (Hebrew)

Apple researchers have introduced a groundbreaking family of multimodal large language models (MLLMs) known as MM1.5, ranging in size from 1 billion to an impressive 30 billion parameters. This new development highlights Apple’s commitment to advancing generative AI technologies, aiming to position itself as a key player in the AI field.

While smartphones can effectively run models with a few billion parameters, anything above 10 billion typically requires a computer for optimal performance. The tech industry has seen models surpassing 1 trillion parameters; however, Apple claims that careful curation of data along with innovative training techniques enable its smaller models—specifically those with 1 billion and 3 billion parameters—to deliver strong performance.

The MM1.5 models are specifically designed to handle complex tasks involving text-rich images, visual referencing and grounding, as well as multi-image reasoning. Two distinct variants of MM1.5 have been developed: MM1.5-Video, which focuses on video understanding, and MM1.5-UI, tailored for mobile user interface comprehension.

Building on the MM1 architecture that was introduced in March 2024, MM1.5 showcases significant performance improvements. The dense models, available in 1B and 3B sizes, are described as compact enough for easy deployment on mobile devices, yet powerful enough to outperform larger open-source models in many scenarios.

In comparative testing, the 3 billion parameter MM1.5 model was pitted against Microsoft’s Phi-3-Vision, which contains 4 billion parameters. While neither model emerged as a clear winner, Apple’s model excelled in text-rich understanding, whereas Phi-3-Vision performed better in specific knowledge-based tasks.

Apple asserts that MM1.5 stands out as a ‘state-of-the-art’ model, outpacing a curated list of competitors. Despite these advancements, top models from tech giants like Google and OpenAI remain superior, demonstrating that size still plays a crucial role in AI performance.

While it remains unclear if these new models will be integrated into Apple’s devices, their introduction signifies a strong step forward in enhancing the company’s generative AI capabilities.