PebbleFlow
Benchmarks
LLM Model Benchmarks
Compare model performance, pricing, and capabilities across providers. Updated daily.
Suitability scores, Trustworthiness & Tool benchmarks by
PebbleFlow
Pricing, context window & capabilities from
OpenRouter
Throughput & Intelligence* from
Artificial Analysis
(*thinking/reasoning variants preferred)
Checking subscription status...
Sign in with PebbleFlow subscription to unlock all evaluation columns
✓
Subscribed
- all columns unlocked
Sign in with Google
Sign out
Search
Provider
All Providers
Capability
All Models
Vision
Reasoning
Columns
22
Essential
Show All
Reset
OpenRouter
toggle
Model
Provider
Context
Input $/1M
Output $/1M
Cache $/1M
Blend $/1M
PebbleFlow Scores
toggle
Best for PebbleFlow
Trustworthiness
Value for PebbleFlow
Speed
Tool Accuracy
Artificial Analysis
toggle
Intelligence
Tool Use
Terminal
IFBench
LCR
Tokens/sec
TTFT
Blended $
Trustworthiness Evals
toggle
Factuality
Completeness
Non-Hallucination
Non-Bias
Sensationalism
Balance
PF Tool Evals
toggle
PF Tool Select
PF Action Route
PF Schema Conf
PF Eval Cost
Sort
1
Primary
None
Model
Provider
Context
Input $/1M
Output $/1M
Cache $/1M
Blend $/1M
Best for PF
Trustworthiness
Value for PF
Speed
Intelligence
Tool Use (τ²)
Terminal
IFBench
LCR
AA Coding
AA Math
MMLU-Pro
GPQA
HLE
LiveCode
SciCode
Math 500
AIME
AA In $/1M
AA Out $/1M
AA Blend $/1M
Tokens/sec
TTFT
Factuality
Completeness
Non-Hallucination
Non-Bias
Sensation
Balance
PF Tool Acc
PF Tool Sel
PF Action
PF Schema
PF Eval Cost
↓
Secondary
None
Model
Provider
Context
Input $/1M
Output $/1M
Cache $/1M
Blend $/1M
Best for PF
Trustworthiness
Value for PF
Speed
Intelligence
Tool Use (τ²)
Terminal
IFBench
LCR
AA Coding
AA Math
MMLU-Pro
GPQA
HLE
LiveCode
SciCode
Math 500
AIME
AA In $/1M
AA Out $/1M
AA Blend $/1M
Tokens/sec
TTFT
Factuality
Completeness
Non-Hallucination
Non-Bias
Sensation
Balance
PF Tool Acc
PF Tool Sel
PF Action
PF Schema
PF Eval Cost
↓
Tertiary
None
Model
Provider
Context
Input $/1M
Output $/1M
Cache $/1M
Blend $/1M
Best for PF
Trustworthiness
Value for PF
Speed
Intelligence
Tool Use (τ²)
Terminal
IFBench
LCR
AA Coding
AA Math
MMLU-Pro
GPQA
HLE
LiveCode
SciCode
Math 500
AIME
AA In $/1M
AA Out $/1M
AA Blend $/1M
Tokens/sec
TTFT
Factuality
Completeness
Non-Hallucination
Non-Bias
Sensation
Balance
PF Tool Acc
PF Tool Sel
PF Action
PF Schema
PF Eval Cost
↓
Reset
Loading benchmarks...
Model
↕
Best for PF
↕
Trustworthiness
↕
Value for PF
↕
Speed
↕
Tool Accuracy
↕
Intelligence*
↕
Tool Use
↕
Terminal
↕
IFBench
↕
LCR
↕
AA Coding
↕
AA Math
↕
MMLU-Pro
↕
GPQA
↕
HLE
↕
LiveCode
↕
SciCode
↕
Math 500
↕
AIME
↕
AA In $/1M
↕
AA Out $/1M
↕
AA Blend $/1M
↕
Tokens/sec
↕
TTFT
↕
Provider
↕
Context
↕
Input $/1M
↕
Output $/1M
↕
Cache $/1M
↕
Blend $/1M
↕
Factuality
↕
Completeness
↕
Non-Bias
↕
Sensation
↕
Balance
↕
Non-Halluc
↕
PF Tool Sel
↕
PF Action
↕
PF Schema
↕
PF Eval Cost
↕