Apple Unveils New Multimodal Large Language Models

Oct 7, 2024

This post is also available in: עברית (Hebrew)

Apple researchers have introduced a groundbreaking family of multimodal large language models (MLLMs) known as MM1.5, ranging in size from 1 billion to an impressive 30 billion parameters. This new development highlights Apple’s commitment to advancing generative AI technologies, aiming to position itself as a key player in the AI field.

While smartphones can effectively run models with a few billion parameters, anything above 10 billion typically requires a computer for optimal performance. The tech industry has seen models surpassing 1 trillion parameters; however, Apple claims that careful curation of data along with innovative training techniques enable its smaller models—specifically those with 1 billion and 3 billion parameters—to deliver strong performance.

The MM1.5 models are specifically designed to handle complex tasks involving text-rich images, visual referencing and grounding, as well as multi-image reasoning. Two distinct variants of MM1.5 have been developed: MM1.5-Video, which focuses on video understanding, and MM1.5-UI, tailored for mobile user interface comprehension.

Building on the MM1 architecture that was introduced in March 2024, MM1.5 showcases significant performance improvements. The dense models, available in 1B and 3B sizes, are described as compact enough for easy deployment on mobile devices, yet powerful enough to outperform larger open-source models in many scenarios.

In comparative testing, the 3 billion parameter MM1.5 model was pitted against Microsoft’s Phi-3-Vision, which contains 4 billion parameters. While neither model emerged as a clear winner, Apple’s model excelled in text-rich understanding, whereas Phi-3-Vision performed better in specific knowledge-based tasks.

Apple asserts that MM1.5 stands out as a ‘state-of-the-art’ model, outpacing a curated list of competitors. Despite these advancements, top models from tech giants like Google and OpenAI remain superior, demonstrating that size still plays a crucial role in AI performance.

While it remains unclear if these new models will be integrated into Apple’s devices, their introduction signifies a strong step forward in enhancing the company’s generative AI capabilities.

Apple Unveils New Multimodal Large Language Models

Latest

AI-Generated Images to be Used for Robotics Training

FBI Alerts on Iranian Hackers Targeting Personal Accounts

Pro-Palestinian Hacktivists Leak Alleged Secret Emails of Israeli Officials

Anti-Drone Capabilities Enhanced with Next-Generation HPM System

New Technology Enables Solar Power Generation After Sunset

Misinformation Regarding Cyberattacks Compromising Voter Registration Databases Causes Fear Before the...

ChatGPT-4 Matches Radiologists in Brain Tumor Diagnosis

Throwable Thermal Imaging Camera can Help Safeguard Soldiers

New Technology Precisely Detects Sniper Threats from Far Away

AI Revolutionizes Safety in Coal Mines, Other Fields Could Also Benefit

Leaders of Telegram Terror Group Arrested

IHLS wishes you a happy new year!

Beetles Inspire Advanced Navigation Systems for Drones and Satellites

Scammers Target OpenAI Twitter with Fake Token Scheme

Iraq Secures $2.8 Billion Deal for Missile Defense Systems

Russian UAVs in Ukraine use SpaceX’s Starlink Communications

Lockheed Martin Unveils AGM-158 XR: The Next Generation of Standoff Missiles

Solid Rocket Motor Production Accelerated with 3D-Printing

New Camera Inspired by Cat Eyes Improves Low-Light Imaging

Innovative Hybrid Device Boosts Solar Energy Efficiency