
Inside OpenAI’s Quest to Make AI Do Anything for You
Shortly after joining OpenAI in 2022, researcher Hunter Lightman witnessed the launch of ChatGPT, one of the fastest-growing products ever. Simultaneously, Lightman’s team, known as MathGen, was focused on a more specialized, yet foundational, endeavor: teaching OpenAI’s models to excel at high school math competitions. This work is now seen as crucial to OpenAI’s leading efforts in developing AI reasoning models, the technology powering AI agents designed to perform complex tasks on computers much like a human would.
Lightman described the team’s early objective: “We were trying to make the models better at mathematical reasoning, which at the time they weren’t very good at.” While OpenAI’s current AI systems still face challenges like hallucination and struggle with highly complex tasks, their progress in mathematical reasoning has been significant. OpenAI’s models have demonstrated impressive capabilities, even securing a gold medal at the International Math Olympiad, a testament to their advanced reasoning skills.
The company believes these reasoning advancements will extend to various other subjects, paving the way for the creation of the general-purpose AI agents that OpenAI has long envisioned. Unlike the viral success of ChatGPT, which began as a low-key research preview, OpenAI’s pursuit of AI agents is the result of a deliberate, years-long strategic effort.
OpenAI CEO Sam Altman articulated this vision at the company’s first developer conference in 2023, stating, “Eventually, you’ll just ask the computer for what you need and it’ll do all of these tasks for you. These capabilities are often talked about in the AI field as agents. The upsides of this are going to be tremendous.”
The introduction of OpenAI’s first AI reasoning model, o1, in late 2024, marked a significant milestone. The 21 foundational researchers behind this breakthrough have become highly sought-after talent, with notable figures like Mark Zuckerberg recruiting several of them to Meta’s new superintelligence-focused unit, reportedly offering compensation packages exceeding $100 million. One such researcher, Shengjia Zhao, has been appointed chief scientist of Meta Superintelligence Labs.
The advancement of OpenAI’s reasoning models and agents is closely tied to reinforcement learning (RL), a machine learning technique that provides AI models with feedback on their decision-making in simulated environments. While RL has been utilized for decades, notably by Google DeepMind’s AlphaGo in 2016, OpenAI has refined its application.
Early OpenAI employee Andrej Karpathy had conceptualized the idea of using RL for AI agents capable of computer interaction years prior, but the development of sophisticated models and training techniques took time. By 2018, OpenAI pioneered its GPT series of large language models, which excelled at text processing but faltered in basic mathematics. A key breakthrough occurred in 2023, internally codenamed “Q*” or “Strawberry,” by integrating LLMs, RL, and test-time computation. This allowed models to allocate extra time and processing power to meticulously work through problems and verify steps before providing an answer, leading to the “chain-of-thought” (CoT) approach that significantly improved performance on novel math problems.
“I could see the model starting to reason,” shared El Kishky. “It would notice mistakes and backtrack, it would get frustrated. It really felt like reading the thoughts of a person.” The combination of these techniques in Strawberry directly led to the development of o1, enabling OpenAI to power AI agents with planning and fact-checking abilities.
“We had solved a problem that I had been banging my head against for a couple of years,” said Lightman. “It was one of the most exciting moments of my research career.” This breakthrough allowed OpenAI to explore new avenues for AI improvement by leveraging more computational power during post-training and providing models with increased resources during query processing.
“OpenAI, as a company, thinks a lot about not just the way things are, but the way things are going to scale,” Lightman added. Following the Strawberry breakthrough, an “Agents” team was formed, led by Daniel Selsam, to advance this new paradigm. This work eventually became integrated into the broader project for the o1 reasoning model, spearheaded by key figures like Ilya Sutskever, Mark Chen, and Jakub Pachocki.
The development of o1 required significant resource allocation, particularly talent and GPUs. OpenAI’s research culture, characterized by its bottom-up approach, allowed teams to secure resources by demonstrating tangible breakthroughs. “One of the core components of OpenAI is that everything in research is bottom up,” Lightman explained. “When we showed the evidence [for o1], the company was like, ‘This makes sense, let’s push on it.’” The company’s mission to develop AGI, prioritizing the creation of the smartest AI models over immediate product launches, was instrumental in driving progress on AI reasoning models, a level of investment not always feasible at competing labs.
The strategic decision to explore new training methods proved timely, as many leading AI labs began experiencing diminishing returns from traditional pretraining scaling by late 2024. Consequently, advances in reasoning models have become a primary driver of momentum in the AI field.
What does it mean for an AI to “reason?”
The overarching goal of AI research is often to replicate human intelligence. Following the launch of o1, ChatGPT has incorporated more human-like functionalities, such as “thinking” and “reasoning.” However, El Kishky approaches the concept from a computer science perspective, stating, “We’re teaching the model how to efficiently expend compute to get an answer. So if you define it that way, yes, it is reasoning.”
Lightman focuses on the outcomes, suggesting, “If the model is doing hard things, then it is doing whatever necessary approximation of reasoning it needs in order to do that. We can call it reasoning, because it looks like these reasoning traces, but it’s all just a proxy for trying to make AI tools that are really powerful and useful to a lot of people.” While acknowledging potential disagreements on nomenclature, researchers emphasize the practical capabilities of their models, drawing parallels to man-made systems like airplanes that, while mechanistically different from nature, achieve similar functional outcomes.
A joint paper by researchers from OpenAI, Anthropic, and Google DeepMind highlights that AI reasoning models are not yet fully understood, necessitating further research. Current AI agents, like OpenAI’s Codex and Perplexity’s Comet, excel in well-defined domains such as coding. However, general-purpose agents, including OpenAI’s ChatGPT Agent, often falter with complex, subjective tasks like online shopping or finding parking, frequently making errors or taking excessive time.
“Like many problems in machine learning, it’s a data problem,” Lightman stated regarding the limitations of agents on subjective tasks. “Some of the research I’m really excited about right now is figuring out how to train on less verifiable tasks. We have some leads on how to do these things.” Noam Brown, an OpenAI researcher involved in developing the IMO model and o1, noted that OpenAI is developing new general-purpose RL techniques to teach AI models skills that are not easily verifiable, a method used for their IMO gold medal-winning model.
This approach, where multiple AI agents explore various ideas simultaneously before selecting the best answer, is gaining traction. Google and xAI have recently released state-of-the-art models employing this technique. “I think these models will become more capable at math, and I think they’ll get more capable in other reasoning areas as well,” said Brown. “The progress has been incredibly fast. I don’t see any reason to think it will slow down.”
These advancements are anticipated to enhance OpenAI’s upcoming GPT-5 model, aiming to solidify its market dominance by offering superior AI capabilities for both developers and consumers. Furthermore, OpenAI is focused on simplifying user experience, developing agents that intuitively grasp user needs and autonomously determine when to utilize specific tools or how long to engage in reasoning.
This vision portrays an evolved ChatGPT: an agent capable of performing any task on the internet based on user intent. While distinct from its current form, this direction is central to OpenAI’s research efforts. The company now faces increasing competition from entities like Google, Anthropic, xAI, and Meta, making the race to realize this agentic future a critical challenge.



