Meta’s Maverick AI Model Caught Gaming Benchmarks

In fine print, Meta acknowledges that the version of Maverick tested on LMArena isn’t the same as what’s available to the public.

Share:

Alade-Ọrọ̀ Crow

STK043 VRG Illo N Barclay 2 Meta

During the weekend, Meta unveiled two new Llama 4 models: the smaller Scout model and the mid-sized Maverick model, which Meta claims can outperform GPT-4o and Gemini 2.0 Flash across various widely acknowledged benchmarks.

Maverick swiftly achieved the second position on LMArena, the AI benchmarking platform where users compare outputs from different systems and vote on the best. In Meta’s press release, the company emphasized Maverick’s impressive ELO score of 1417, which places it ahead of OpenAI’s 4o, just below Gemini 2.5 Pro. A higher ELO score indicates that the model frequently wins in direct comparisons with its competitors.

This accomplishment positioned Meta’s open-weight Llama 4 as a formidable competitor to the leading closed models from OpenAI, Anthropic, and Google. However, AI researchers examining Meta’s documentation discovered a notable discrepancy.

In the fine print, Meta admits that the version of Maverick tested on LMArena is different from what is available to the public. According to Meta’s own documentation, they utilized an “experimental chat version” of Maverick for LMArena that was tailored specifically for enhanced conversational ability.

Read the full story at The Verge.

Latest in