Giotto transitions from monolithic "black box" models to a distributed AI model architecture where intelligence is orchestrated through a coordinated system of small models.
Giotto is a portable, configurable model and AI operating system with advanced reasoning capabilities, combining open and proprietary weights, datasets, and tools to deliver high performance, adaptability, robustness, and multi-agency support.
Test-time training extends model capabilities by adapting it on-the-fly to the specific query context.
We re-train the model in real time, improving accuracy and reducing hallucinations in the final response.
Decoding is the process by which a language model generates responses token by token, shaping both quality and diversity of outputs.
At Giotto we move away from the standard next-most-probable token paradigm, and explore token paths as branching structure, dynamically expanding the most informative directions based on uncertainty.
Once multiple candidates are generated during decoding, scoring determines which output is most reliable without relying on external supervision.
Our proprietary scoring system relies on sophisticated ranking methods, based on the intrinsic markers that characterise output quality.
The Abstraction and Reasoning Corpus (ARC) is a benchmark designed to measure progress toward Artificial General Intelligence (AGI). Created by François Chollet in 2019, ARC evaluates a system's ability to acquire new skills and key traits of general intelligence. Unlike typical AI benchmarks that test specific skills, ARC challenges AI to reason and abstract in ways that come naturally to humans but are exceptionally difficult for machines.
In 2025, we achieved unprecedented results on the ARC benchmark leveraging our proprietary approach and technology.
Check out our report →
Go to ARC prize website →
We evaluate Giotto across a set of widely used reasoning and knowledge benchmarks. To provide a fair comparison, we report results against major models that can run on a single GPU.
Across these benchmarks, Giotto achieves leading performance among single-GPU models, combining strong mathematical reasoning, scientific understanding, and broad academic knowledge. These results position Giotto as the smartest model available for single-GPU deployment, delivering frontier-level capabilities without requiring multi-GPU infrastructure.
AIME24
A benchmark based on the 2024 American Invitational Mathematics Examination, measuring advanced mathematical reasoning on olympiad-style problems across algebra, geometry, combinatorics, and number theory.
AIME25
A benchmark based on the 2025 American Invitational Mathematics Examination, evaluating a model’s ability to solve challenging multi-step math problems with exact integer answers.
AIME26
A benchmark based on the 2026 American Invitational Mathematics Examination, testing frontier models on difficult high-school competition math problems requiring structured symbolic reasoning.
GPQA Diamond
A graduate-level, “Google-proof” multiple-choice benchmark of expert-written questions in biology, physics, and chemistry, designed to test deep scientific reasoning rather than simple retrieval.
MATH-500
A 500-problem subset of the MATH dataset, covering competition-level mathematics across domains such as algebra, geometry, number theory, probability, and precalculus.
Humanity's Last Exam (HLE)
A frontier-level multimodal academic benchmark with expert-vetted questions across mathematics, science, humanities, and other disciplines, designed to assess broad expert-level reasoning.
Gemma 4 : Gemma 4 31B
NVIDIA Nemotron 3 : NVIDIA Nemotron 3 Nano 30B-A3B
Ministral 3 : Ministral 3 14B
GPT-OSS-120B : GPT OSS 120B, medium reasoning
DeepSeek R1 32B : DeepSeek-R1-Distill-Qwen-32B
Sources: official model cards reported by providers, or other sources from https://huggingface.co or artificialanalysis.ai.
Have a question for us? Please fill out the form below, and our team will get back to you promptly.