Grok 3 Beta Catapults in the Artificial Analysis Intelligence Index

By Joseph Provence, a news contributor who writes about technology, small business, SEO, and e-commerce.

Feb 23,2025 2:30 PM MST

Artificial Analysis Intelligence Index (Higher is better)

Top-scoring models:
o3: 70
Grok 3 Reasoning Beta: 66
DeepSeek R1: 60
GPT-4o (Nov '24): 41
Claude 3.5 Sonnet (Oct): 44
Llama 3.3 70B: 41

Speed (Output Tokens per Second, Higher is Better)

Fastest models:
o1-mini: 189 tokens/sec
Gemini 2.0 Flash: 152 tokens/sec
o3-mini: 147 tokens/sec
Llama 3.3 70B: 113 tokens/sec
GPT-4o mini: 66 tokens/sec

Price per 1M Tokens (Lower is Better)

Most cost-effective models:
Gemini 2.0 Flash: $0.2
GPT-4o mini: $0.3
Llama 3.3 70B: $0.6
DeepSeek R1: $1.1
GPT-4o (Nov '24): $4.4
Claude 3.5 Sonnet (Oct): $26.3

Multilingual Performance Index (Higher is Better)

Claude 3.5 Sonnet (Oct): 88%
DeepSeek V3: 86%
GPT-4o (Nov '24): 84%
Nova Pro: 83%
Claude 3.5 Haiku: 79%

Math Index (AIME 2024 & MATH-500)

Highest performers:
o3-mini: 97%
DeepSeek R1: 96%
Gemini 2.0 Flash: 93%
GPT-4o (Nov '24): 76%
Llama 3.3 70B: 77%

Coding Index (LiveCodeBench & SciCode)

o3-mini: 56
DeepSeek R1: 49
Claude 3.5 Sonnet (Oct): 37
GPT-4o mini: 23
Llama 3.1 8B: 12

Key Takeaways:
Best intelligence scores: o3 and Grok 3 Reasoning Beta.
Fastest models: o1-mini and Gemini 2.0 Flash.
Most cost-effective: Gemini 2.0 Flash and GPT-4o mini.
Top for multilingual tasks: Claude 3.5 Sonnet and DeepSeek V3.
Best for math and reasoning: o3-mini and DeepSeek R1.
Best for coding: o3-mini and DeepSeek R1.

Subscribe to our newsletter!

Stay ahead of the game with expert tips, insider advice, and fresh ideas to grow your small business—delivered straight to your inbox. Sign up for our newsletter today and let’s build success together!