
Researchers at Stanford and University of Washington have developed an open source artificial intelligence (AI) model that performs comparable to OpenAI’s O1 model. The main purpose of the researchers is not to create a powerful reasoning-centric model, but to understand how San Francisco-based AI companies direct their O1 series models to perform test time extensions. It is worth noting that researchers are able to demonstrate the method at an extremely low cost and replicate the behavior of the model with extremely low computing resources.
Researchers develop S1-32B AI model
The researchers detailed the methods and processes for developing the model in a study published in the journal ARXIV Preprint. The process involves creating synthetic datasets from different AI models and using several new technologies such as ablation and supervised fine tuning (SFT). This model is available in the GitHub list.
It should be noted that AI models are not built from scratch. Developers use QWEN2.5-32B to teach and refine it to create the S1-32B Large Language Model (LLM). The model was released in September 2024, but given its scale and lack of reasoning capabilities, it doesn’t match OpenAI’s O1.
In the process, the researchers used the Gemini Flash Thinking Application Processing Interface (API) to generate inference traces and responses. A total of 59,000 questions, traces of reasoning (chain of thought or COT) and answers were extracted from the API. Then, create a dataset called S1K by selecting 1,000 high-quality, diverse and difficult problems as well as inference tracking and response.
After creating the S1K dataset, the researchers conducted supervised fine-tuning of the QWEN2.5-32B-Insustruct model. For this purpose, basic fine-tuning hyperparameters are used. The distillation process requires 26 minutes of training on 16 NVIDIA H100 GPUs.
At this point, researchers don’t know how Openai trains the model to “think” and how to try to stop the thinking process. Without this, a model has the risk of thinking indefinitely, as it second guesses that its output wastes valuable processing power.
While fine-tuning the model, the researchers discovered something interesting. They found that they could manipulate reasoning time by adding
Through the S1-32B model, the researchers added a “wait” command to force it beyond the usual reasoning period. Once added, the model begins guessing and verifying its output. Then, use the tag to shorten this test time to zoom the stage or extend it.
The researchers then tried several other phrases, such as “alternative” and “hmm,” but found that the best performance metrics were achieved when using the “wait” tag. By bringing the model closer to O1’s performance, the researchers claim that this may be the method Openai uses to fine-tune its inference model.
A TechCrunch report claims that researchers were able to create S1-32B AI models under $50 (approximately Rs 4,380), highlighting that post-training structures for inference models can be done at extremely low costs.