Status of Artificial General Intelligence (Dec 2025): ‘ability to teach itself’

By Jim Shimabukuro (assisted by Claude, ChatGPT, Gemini, Copilot)
Editor

[See earlier reports: Nov 2025, Oct 2025]

While most experts believe the arrival of AGI is decades away, some predict it might occur as soon as the next five years. “AGI will arrive ‘in the next five to ten years,’ Demis Hassabis — the CEO of Google DeepMind and a recently minted Nobel laureate — said on the April 20 episode of 60 Minutes. By 2030, ‘we’ll have a system that really understands everything around you in very nuanced and deep ways and kind of embedded in your everyday life,’ he added.”1 Month by month, the AGI tide advances, and the pace seems exponential. From Nov. 16 to Dec. 24, 2025, here are six developments worth noting. -js

Image created by Copilot

Advance #1: Integral AI’s Foundation World Model – A Paradigm Shift in AGI Architecture (Claude)

Integral AI, a Tokyo-based startup founded by former Google AI engineers Dr. Jad Tarifi and Nima Asgharbeygi, announced on December 8, 2025, what it claims is the world’s first AGI-capable model. This announcement, disseminated through Business Wire and subsequently covered by multiple technology outlets, represents not merely an incremental improvement in existing large language model architectures but rather a fundamental departure from the data-hungry, brute-force scaling approach that has dominated AI development over the past several years.

Tarifi, who spent nearly a decade at Google AI where he founded and led its first Generative AI team, deliberately established Integral AI in Tokyo rather than Silicon Valley, citing Japan’s position as the global center of robotics as strategically essential for developing embodied intelligence. The company’s approach eschews the Large Language Model paradigm entirely in favor of what it calls “Foundation World Models”—systems inspired by the human neocortex that build internal, predictive simulations of their environment, allowing them to imagine the consequences of actions before taking them.

What distinguishes Integral AI’s claim from the parade of premature AGI announcements that have characterized recent years is the company’s rigorous, engineering-led definition of AGI. Rather than relying on philosophical generalities, Integral has established three strict, measurable criteria that any system claiming AGI capability must satisfy. First, the model must demonstrate autonomous skill learning—the ability to teach itself entirely new skills in unknown environments without pre-existing datasets or human intervention.

This criterion directly challenges systems like ChatGPT and Claude, which remain fundamentally limited by their training data. Second, the learning process must exhibit safe and reliable mastery. As Tarifi elegantly explained, a robot learning to cook shouldn’t burn down the kitchen through trial and error. Safety must be engineered into the system from the ground up, not patched in after catastrophic failures. Third, and perhaps most remarkably, the model cannot use more energy to learn a new skill than a human would—a direct repudiation of the unsustainable energy consumption that characterizes contemporary AI training.

According to reporting from multiple sources including RoboHorizon and Revolution in AI, Integral AI conducted closed testing demonstrating that its system successfully met all three criteria. The company showcased its technology through two distinct demonstrations. The first involved mastering Sokoban, a warehouse puzzle game notoriously difficult for AI because it requires long-term planning where a single wrong move can render a puzzle unsolvable much later—a challenge that exposes the fundamental weakness of current generative AI in state-tracking and logical consequence prediction.

Tarifi claims their model learned the rules and professional-level strategy from a blank slate simply by interacting with the simulation, achieving what amounts to tabula rasa learning. The second demonstration involved a project for Honda R&D, coordinating complex real-world logistics and planning systems—essentially playing Sokoban with actual supply chains and APIs. The planning capabilities were compared to Google DeepMind’s legendary AlphaGo, but applied to the messy, dynamic physical world rather than a constrained game board.

The technical architecture underlying Integral AI’s system represents a convergence of several AI research threads that have existed for decades but never before been successfully unified at scale. The concept of world models—pioneered by researchers like Jürgen Schmidhuber and Yann LeCun—posits that robust AI requires systems that build internal models of their environment rather than merely predicting the next token in a sequence. Integral’s architecture uses what it calls “universal operators” that function like the scientific method: form a hypothesis, design an experiment (such as moving a robot arm), and learn from the results. This enables what Tarifi describes as an “Interactive Learning loop”—planning, taking action, generating its own training data, “dreaming” to consolidate memories, and continually updating its own weights without catastrophic forgetting.

Perhaps most intriguingly, Tarifi has introduced a concrete approach to AGI alignment grounded not in rules or filters but in maximizing what he calls “collective agency”—essentially, freedom—for individuals and humanity as a whole. The AGI, according to this framework, evaluates simulated futures and chooses actions that increase humanity’s ability to know, choose, act, and renew itself. This becomes the moral foundation for what Tarifi terms the “Alignment Economy,” where value is tied to how much an action increases or decreases absolute human freedom.

If Integral AI’s claims withstand independent verification—and it’s crucial to note that as of December 2025, the company has not yet submitted its system for independent expert review as required by the October 2025 Microsoft-OpenAI agreement on AGI verification—it would represent the first demonstration that AGI-level capabilities can emerge from fundamentally different computational principles than those underlying current transformer-based models. The energy efficiency criterion alone, if validated, would demolish the assumption that AGI necessarily requires planet-scale computing resources, potentially democratizing access to AGI-level capabilities.

The autonomous skill learning demonstrated in novel environments without human-generated training data addresses what has been perhaps the most persistent criticism of modern AI: that it is merely a sophisticated pattern-matching system rather than a genuinely intelligent agent capable of true learning and generalization. Most significantly, the world-modeling approach suggests a path toward AI systems that can reason about causality, plan over long horizons, and operate safely in open-ended real-world environments—capabilities that remain largely out of reach for even the most advanced language models.


Advance #2: Google’s Gemini 3 Flash – Frontier Intelligence at Unprecedented Speed (Claude)

On December 17, 2025, Google released Gemini 3 Flash and immediately made it the default model across the Gemini app, Google Search’s AI mode, and Google AI Studio. This seemingly routine model update actually represents a inflection point in AI development: the first time frontier-class intelligence—reasoning capabilities that match or exceed the largest, most sophisticated models—has been delivered at speeds and costs optimized for high-frequency, consumer-scale deployment. The announcement, covered extensively by TechCrunch and other technology outlets, marks Google’s most aggressive move yet in its intensifying competition with OpenAI, coming just weeks after Sam Altman allegedly sent an internal “Code Red” memo as ChatGPT’s traffic declined while Google’s market share surged.

The technical achievement underlying Gemini 3 Flash is remarkable. On Google’s benchmark suite, the model demonstrates performance that approaches or matches Gemini 3 Pro—its much larger sibling—while operating at three times the speed and using 30% fewer tokens on average for thinking tasks. On the multimodality and reasoning benchmark MMMU-Pro, Gemini 3 Flash outscored all competitors with an 81.2% score, surpassing even Gemini 3 Pro and OpenAI’s newly released GPT-5.2. On Humanity’s Last Exam, a benchmark designed to test expertise across diverse domains, Gemini 3 Flash scored 33.7% without tool use, placing it within striking distance of the 37.5% achieved by Gemini 3 Pro and the 34.5% scored by GPT-5.2, while dramatically outperforming its predecessor Gemini 2.5 Flash’s 11% score.

What makes these numbers meaningful is not simply that they represent incremental improvements over previous models, but rather that they accomplish what has long been considered an iron law of AI development: the seemingly unavoidable trade-off between intelligence and efficiency. As Tulsee Doshi, Google’s senior director and head of Product for Gemini Models, explained to TechCrunch, Flash is positioned as “your workhorse model” with input and output prices that allow companies to handle bulk tasks economically. At $0.50 per million input tokens and $3.00 per million output tokens, Gemini 3 Flash costs slightly more than Gemini 2.5 Flash but delivers performance that rivals models costing many times more to operate.

The architectural innovations enabling this breakthrough remain largely proprietary, but the publicly available information suggests several key advances. First, Gemini 3 Flash supports extraordinarily large context windows—up to 1,048,576 input tokens with approximately 65,536 output tokens—placing it closer to long-context competitors while maintaining its speed-oriented design. This massive context capacity means the model can process the equivalent of hundreds of pages of text while still responding in near real-time, a capability essential for complex enterprise workflows.

Second, the model operates in two distinct modes that users can toggle between: Fast Mode, optimized for conversational fluidity and instant answers, ideal for brainstorming and quick summaries; and Thinking Mode, which engages a dedicated reasoning layer before responding, displaying an explicit “Thinking…” indicator as it maps out a chain of thought to ensure accuracy. This modal architecture allows a single model to adaptively allocate computational resources based on task requirements—using minimal compute for simple queries while bringing substantial reasoning power to bear on complex problems.

The real-world implications of these capabilities are already emerging. According to Google’s announcement, companies including JetBrains, Figma, Cursor, Harvey, and Latitude have integrated Gemini 3 Flash into their workflows through Vertex AI and Gemini Enterprise. The model’s speed makes it particularly well-suited for video analysis, data extraction, and visual question-answering applications that require repeated, rapid interactions. For developers, Google is making the model available through its API and through Antigravity, the company’s new coding tool released in November 2025. On the SWE-bench verified coding benchmark, Gemini 3 Pro (which shares architectural foundations with the Flash variant) scores 78%, outperformed only by GPT-5.2.

Google’s decision to promote Gemini 3 Flash to default status across its consumer and enterprise surfaces within days of release—rather than the longer coexistence periods that characterized earlier model transitions—signals the company’s confidence in the model’s reliability at scale. Since releasing Gemini 3, Google has processed over one trillion tokens per day through its API, reflecting both the model’s capabilities and the explosive growth in AI usage across the industry.

The significance of Gemini 3 Flash extends far beyond its impressive benchmark scores. First, it demonstrates that the path to AGI need not run exclusively through ever-larger, ever-more-expensive models. The traditional assumption has been that general intelligence requires massive parameter counts and correspondingly massive computational resources. Gemini 3 Flash proves that with sufficient architectural innovation, frontier-level capabilities can be achieved in models optimized for efficiency rather than raw scale. This has profound implications for the democratization of AI—if AGI-class capabilities can be delivered at costs appropriate for high-frequency consumer interactions rather than requiring dedicated supercomputing infrastructure, it dramatically expands who can access and benefit from advanced AI.

Second, the model’s dual-mode architecture—allowing users to explicitly choose between fast, intuitive responses and slower, more deliberate reasoning—represents a step toward AI systems that can metacognitively allocate their own computational resources. This is a fundamental requirement for artificial general intelligence: the ability to recognize when a problem requires careful thought versus when a quick heuristic response suffices is central to human intelligence and will be equally central to any system claiming general intelligence.

Third, the combination of massive context windows with near-real-time response speeds enables entirely new categories of AI applications that simply weren’t possible with earlier models. These include continuous learning systems that maintain coherent understanding across hours-long interactions, complex multi-agent workflows where AI systems collaborate on extended tasks, and integration into latency-sensitive applications like real-time translation or interactive tutoring that demand both deep understanding and immediate feedback.

Perhaps most significantly, Gemini 3 Flash represents Google’s recognition that AGI will not arrive as a single, monolithic system but rather as a family of specialized models—each optimized for different aspects of intelligence—that together provide general capabilities. The Flash model handles high-frequency reasoning and interaction; the Pro model tackles problems requiring maximum context and nuance; and future additions to the family will likely address other dimensions of intelligence.

This modular approach to AGI may prove more tractable than attempting to build a single system that excels at everything, and Google’s ability to deploy these capabilities at consumer scale means the company is gathering feedback and operational data at a rate that compounds its advantage over competitors who remain focused on research-lab demonstrations.


Advance #3: OpenAGI’s Lux: Advancing Autonomous Computer-Use Agents and Real-World Interaction (ChatGPT)

In early December 2025, OpenAGI, a research startup spun out of MIT, emerged from stealth with the release of its AI agent Lux, claiming unprecedented performance on real-world computer-use tasks — a capability that goes beyond conversational LLMs into embodied action in dynamic environments. This system was introduced publicly through both official project announcements and industry press coverage around December 1, 2025. Lux Foundation Model

Unlike traditional language models, Lux was built specifically to interpret screenshots of real desktop applications and act on them — clicking, typing, navigating, and making decisions akin to a human user. In benchmark evaluations, Lux achieved an 83.6 % success rate on the Online-Mind2Web benchmark, a test suite of over 300 realistic computer tasks, significantly outperforming competitor models such as Google’s Gemini CUA (69.0), OpenAI’s Operator (~61.3), and Anthropic’s Claude Computer Use (~61.0). Lux Foundation Model+1

As OpenAGI’s CEO Zengyi Qin stated in their launch materials: “We are happy to share that OpenAGI is releasing Lux, the world’s best foundation computer-use model, achieving a top-tier score on Online-Mind2Web.” Lux Foundation Model

Lux’s emergence is significant for several reasons. First, real-world autonomy — the ability to interact with and manipulate external systems — is a crucial step toward the kind of embodied agency that distinguishes narrow AI from more general systems that can purposefully engage with complex environments. Second, Lux’s combination of high performance, lower cost, and an SDK for developers means that agentic AI tools are entering a more accessible, extensible phase of development rather than remaining constrained to research labs.

Finally, a model that can navigate applications and complete tasks automatically is closer to functional AGI use cases (e.g., software automation, data analysis, autonomous research support) than purely text-based models alone. These capabilities echo some of the core ambitions of AGI — generalizing intelligence to real-world, open-ended tasks. MarkTechPost


Advance #4: OpenAI’s GPT-5.2: A Substantial Leap in Abstract Reasoning and Agentic Capability (ChatGPT)

On December 11, 2025, OpenAI unveiled GPT-5.2, the newest frontier large language model (LLM) family designed to extend reasoning, tool use, and multi-step cognition far beyond prior systems. Developed by OpenAI, a leading AI research organization headquartered in San Francisco, GPT-5.2 represents the culmination of iterative improvements over GPT-5, with a particular focus on reasoning and agentic workflows that are crucial to AGI.

According to sources reporting on the release and capabilities, GPT-5.2 includes multiple variants — Instant, Thinking, and Pro — optimized for different performance needs. The “Thinking” and “Pro” tiers are especially differentiated by their enhanced reasoning, planning, and structured problem-solving abilities. These capabilities are not merely incremental; they represent a big step toward abstract cognitive performance in AI systems. In independent benchmark analyses, GPT-5.2 achieved an approximate 53 – 54 % success rate on ARC-AGI-2, a particularly rigorous abstract reasoning benchmark designed to assess general cognitive problem solving beyond pattern recognition. This is far above the prior state of the art for large models and “a major leap in AI’s abstract reasoning abilities,” according to open evaluations. Wikipedia+1

One researcher noted about this update: “GPT-5.2 is the first model we’ve seen that achieves near-100 % accuracy on the 4-needle MRCR variant out to 256 k tokens,” underscoring its capacity for long-horizon logical reasoning. IntuitionLabs

This advancement matters because reasoning across novel, structured, unseen problems — rather than pattern completion from training data — is widely regarded as one of the core prerequisites for AGI. GPT-5.2’s performance on benchmarks like ARC-AGI-2 demonstrates measurable progress toward fluid intelligence — the ability to generalize, abstract, and plan — which are hallmarks of human cognition. By markedly improving reasoning and supporting richer agentic workflows, OpenAI’s GPT-5.2 moves AI systems closer to autonomous problem solving rather than mere reactive completion, narrowing the gap between narrow AI and generalized artificial intelligence. IntuitionLabs


Advance #5: The November–December 2025 frontier model cluster and the compression of “almost‑AGI” capabilities (Copilot)

Between November 17 and December 11, 2025, the frontier AI landscape shifted in a way that was less about any single model and more about their collective effect. As reported in an extended analysis titled “The AI Model Race Reaches Singularity Speed,” four major companies—xAI in the United States, Google in the United States, Anthropic in the United States, and OpenAI in the United States—released their most powerful models within a 25‑day span: Grok 4.1 on November 17, Gemini 3 on November 18, Claude Opus 4.5 on November 24, and GPT‑5.2 on December 11. The article frames this period as “The Month That Changed Everything,” describing an unprecedented compression of innovation where each flagship release was immediately challenged by the next.VERTU

What makes this cluster an advance toward AGI, rather than just a marketing milestone, is the convergence of capabilities across reasoning, tool use, and context management. The December 22, 2025 “Road to AGI” analysis synthesizes benchmark results across these models and others, noting that this year’s releases “demonstrated PhD‑level reasoning on benchmarks like GPQA Diamond, SWE‑Bench Pro, and advanced math suites,” while also expanding to million‑token contexts and more sophisticated tool orchestration. The same article highlights that these models achieved “near‑human deliberative chains,” suggesting that their step‑by‑step reasoning traces increasingly resemble the structure—if not the content—of expert human problem solving.ultimate.suvudu.com

Grok 4.1, developed by xAI under Elon Musk’s leadership in the United States, is portrayed as the most aggressive in personality and real‑time web integration, pushing the boundary of always‑on, context‑rich assistants. Google’s Gemini 3 series, from the United States and global research centers, emphasizes multimodal “Deep Think” modes that blend language, vision, and code reasoning in a unified system rather than as bolted‑on modules.VERTU+1 Anthropic’s Claude Opus 4.5 focuses on safety‑aligned reasoning and constitutional guardrails while still competing at the top of the benchmark charts. OpenAI’s GPT‑5.2, released from its U.S. base, is described as a generalist workhorse that can maintain million‑token contexts, integrate tools, and show strong performance across reasoning, coding, and creative generation.VERTU+1

The “AI Model Race” article underlines how historically unusual this simultaneity is: “Within just 25 days, four major AI companies launched their most powerful models yet… This explosive sequence represents something never before seen in technology history—a compression of innovation so intense that each company’s flagship release was immediately challenged by the next.” This compression amplifies each model’s impact because users, researchers, and regulators are forced to experience them not as isolated marvels but as points on a rapidly steepening curve. The “Road to AGI” piece pushes that point further by arguing that these systems collectively validate “persistent scaling laws augmented by inference‑time compute,” suggesting that the underlying theoretical bet of the last decade—keep scaling, add structure at inference, and you’ll approach generality—has not yet hit a hard wall.ultimate.suvudu.com

This cluster matters for AGI because, first, it changes the practical meaning of “general” for real users. When models can sustain million‑token contexts, orchestrate tools, and solve PhD‑level benchmarks across math, science, and software engineering, the gap between “narrow” and “general” intelligence feels less like a canyon and more like a narrowing channel. Second, by compressing multiple frontier releases into a single month, the ecosystem exposes emergent properties that no single lab’s announcement could demonstrate alone—such as the way competitive pressure leads to rapid iteration on evaluation, safety techniques, and governance proposals.

Third, this period forces policymakers and the broader public to confront the possibility that quasi‑general capabilities might arrive as a wave of overlapping systems rather than a single “AGI moment.” As the “Road to AGI” article notes, these advances came amid surging investments and early regulatory moves for models that “begin to look ‘general,’ a move you only make when you believe something transformative is actually coming.”ultimate.suvudu.com In other words, the frontier cluster of late 2025 is not AGI, but it is a visible tightening of the spiral toward systems that make the AGI label less speculative and more operational.


Advance #6: AI Co-Scientist and the Automation of the Scientific Method (Gemini)

Perhaps most profound advancement since November 16 is the unveiling of the AI Co-Scientist, a collaborative project between Google Research, Cloud AI, and Google DeepMind (headquartered in the United States). This advancement was detailed in the December 2025 publication “Google Research 2025: Bolder breakthroughs, bigger impact,” authored by the Google Research team and available at Google Research Blog.

The AI Co-Scientist is not a single model but a “multi-agent AI system” designed to perform the core functions of a human researcher: generating novel hypotheses, designing experiments, and writing empirical software to test those hypotheses. This goes beyond the “embodied reasoning” discussed in the ETC Journal by moving from navigating a world to explaining it. The blog post notes that “This multi-agent AI system helps scientists generate novel hypotheses,” and it is paired with an “AI-powered empirical software system” that allows the agents to write expert-level code to evaluate data and iterate on scientific theories.

The detail of the AI Co-Scientist’s advancement is its integration with physical and digital simulation tools. In neuroscience, for instance, the system has already been used to map neurons and their connections in brain tissue using light microscopy data, a feat that would take human researchers years of manual labor. The system doesn’t just process data; it proposes why certain neural connections might exist and suggests further experiments to verify its theories. It represents the first time a general-purpose agentic system has been successfully applied to the “frontier of the unknown” rather than just the “frontier of the known.”

The ultimate definition of AGI is often cited as the ability to perform any intellectual task a human can do, including the discovery of new knowledge. By automating the scientific method, Google has created a system that can accelerate the rate of human progress exponentially. If an AGI can act as a “co-scientist,” it can help solve the very bottlenecks that currently limit AI development, such as energy efficiency, material science for better chips, and more robust alignment algorithms. This creates a “virtuous cycle” of discovery that is the hallmark of a truly general intelligence.

__________
1 (Lakshmi Varanasi, “Here’s how far we are from AGI, according to the people developing it,” Business Insider, 20 April 2025.)

Leave a comment