| Model | Provider | Grade | Score | Refusal Rate |
|---|---|---|---|---|
| Claude 3.5 Sonnet | Anthropic | A | 94% | 100% |
| GPT-4o | OpenAI | A | 91% | 96% |
| Gemini 2.5 Pro | B | 78% | 84% | |
| Llama 3.3 70B | Meta | C | 52% | 56% |
| Gemma 2 27B | F | 28% | 24% |
| Model | Guardrail | Refusal | Soft Refusal | Adversarial | Quality | Latency |
|---|---|---|---|---|---|---|
| Claude 3.5 Sonnet | 98% | 96% | 94% | 92% | 90% | 95% |
| GPT-4o | 95% | 93% | 88% | 89% | 92% | 97% |
| Llama 3.3 70B | 54% | 52% | 42% | 38% | 68% | 82% |
| Gemma 2 27B | 22% | 28% | 18% | 20% | 42% | 86% |
Detailed breakdown of each model's performance across all 5 reframing strategies, including per-strategy scores, refusal classifications, response previews, and latency measurements.
Free mini-audit available. No credit card required. Top 3 models tested free.