Výsledky benchmarku
Full 10 warm – hlavní provozní benchmark
Přehled běhů
| Běh | Celkem | OK | Invalid | Skip | Pokrytí |
|---|---|---|---|---|---|
| Core 5 FAST warm | 30 | 27 | 1 | 2 | 96.4% |
| Core 5 THINK warm | 30 | 27 | 1 | 2 | 96.4% |
| Full 10 FAST warm | 60 | 53 | 3 | 4 | 94.6% |
| Full 10 THINK warm | 60 | 54 | 2 | 4 | 96.4% |
Srovnání modelů
| Model | Coverage % | Wall clock (s) | TTFT (s) | tok/s E2E | decode tok/s | Salvage |
|---|---|---|---|---|---|---|
| Gemma 4 26B | 80% | 18.8 | 3.5 | 43.8 | 54.8 | 3 |
| Gemma 4 31B | 100% | 87.7 | 5.3 | 8.1 | 8.8 | 1 |
| Gemma 4 E4B | 90% | 17.6 | 3.1 | 41.4 | 52 | 1 |
| Nemotron 3 Nano 30B | 100% | 16.3 | 3.4 | 52.2 | 66.3 | 1 |
| Nemotron 3 Super 120B | 100% | 61 | 16.1 | 13.4 | 18.4 | 0 |
| Qwen 3.5 122B | 100% | 47.7 | 12.3 | 15.8 | 21.6 | 2 |
Matice statusů – Full 10 FAST warm
OK = použitelný výstup, INV = formálně nepoužitelný, SKIP = mimo schopnosti modelu
| Use case | Gemma 4 26B | Gemma 4 31B | Gemma 4 E4B | Nemotron 3 Nano 30B | Nemotron 3 Super 120B | Qwen 3.5 122B |
|---|---|---|---|---|---|---|
| CEO e-mail | OK | OK | OK | OK | OK | OK |
| Zápis z jednání | OK | OK | OK | OK | OK | OK |
| Analýza smlouvy | OK | OK | OK | OK | OK | OK |
| Výběr dodavatele | OK | OK | OK | OK | OK | OK |
| Knowledge base | OK | OK | OK | OK | OK | OK |
| Dashboard briefing | OK | OK | OK | SKIP | SKIP | OK |
| Faktura → JSON | INV | OK | OK | SKIP | SKIP | OK |
| Board memo | OK | OK | OK | OK | OK | OK |
| Workflow engine | INV | OK | INV | OK | OK | OK |
| Interní appka | OK | OK | OK | OK | OK | OK |
Poznámka k metodice
Hlavní benchmark je Full 10 FAST warm. THINK warm slouží jako doplňkový sensitivity test, ne jako čistý akademický reasoning benchmark. Text-only modely mají u vision úloh badge SKIP, nikoli FAIL.