hello[at]sltcreative.com

Salt Creative Website Design Agency Logo

Get Seasoned.

Get Seen.

Three Mile Island Image at sunset

Grok 3 Beta Catapults in the Artificial Analysis Intelligence Index

By Joseph Provence, a news contributor who writes about technology, small business, SEO, and e-commerce.

Feb 23,2025 2:30 PM MST

Artificial Analysis Intelligence Index (Higher is better)

Top-scoring models:
o3: 70
Grok 3 Reasoning Beta: 66
DeepSeek R1: 60
GPT-4o (Nov '24): 41
Claude 3.5 Sonnet (Oct): 44
Llama 3.3 70B: 41

Speed (Output Tokens per Second, Higher is Better)

Fastest models:
o1-mini: 189 tokens/sec
Gemini 2.0 Flash: 152 tokens/sec
o3-mini: 147 tokens/sec
Llama 3.3 70B: 113 tokens/sec
GPT-4o mini: 66 tokens/sec

Price per 1M Tokens (Lower is Better)

Most cost-effective models:
Gemini 2.0 Flash: $0.2
GPT-4o mini: $0.3
Llama 3.3 70B: $0.6
DeepSeek R1: $1.1
GPT-4o (Nov '24): $4.4
Claude 3.5 Sonnet (Oct): $26.3

Multilingual Performance Index (Higher is Better)

Claude 3.5 Sonnet (Oct): 88%
DeepSeek V3: 86%
GPT-4o (Nov '24): 84%
Nova Pro: 83%
Claude 3.5 Haiku: 79%

Math Index (AIME 2024 & MATH-500)

Highest performers:
o3-mini: 97%
DeepSeek R1: 96%
Gemini 2.0 Flash: 93%
GPT-4o (Nov '24): 76%
Llama 3.3 70B: 77%

Coding Index (LiveCodeBench & SciCode)

o3-mini: 56
DeepSeek R1: 49
Claude 3.5 Sonnet (Oct): 37
GPT-4o mini: 23
Llama 3.1 8B: 12

Key Takeaways:
Best intelligence scores: o3 and Grok 3 Reasoning Beta.
Fastest models: o1-mini and Gemini 2.0 Flash.
Most cost-effective: Gemini 2.0 Flash and GPT-4o mini.
Top for multilingual tasks: Claude 3.5 Sonnet and DeepSeek V3.
Best for math and reasoning: o3-mini and DeepSeek R1.
Best for coding: o3-mini and DeepSeek R1.

Share by: