Status of Artificial General Intelligence (Nov 2025): ’embodied reasoning’

By Jim Shimabukuro (assisted by ChatGPT)
Editor

[Also see Status of Artificial General Intelligence (AGI): October 2025, When Will AI Surpass Humanity and What Happens After That?]

These three advances in AGI were announced after ETC Journal’s Oct. 17, 2025, article was published: (1) DeepMind’s SIMA 2, a Gemini-powered agent that “thinks” in 3D virtual worlds, (2) DeepMind’s new work on aligning visual representations, improving how models “see” the world, and (3) Anthropic’s $50 billion US compute / data-center investment, a large infrastructure bet to sustain frontier-model training.

Video created by Grok via image created by Copilot

1. SIMA 2 — DeepMind’s Gemini-powered agent that plays, reasons and improves in 3D virtual worlds

Who: SIMA Team / Google DeepMind (UK / Google parent company Alphabet).
Date & source: November 13, 2025 — DeepMind research blog post, “SIMA 2: An Agent that Plays, Reasons and Learns With You in Virtual 3D Worlds.” (Google DeepMind)

DeepMind’s SIMA 2 represents a concrete step toward embodied, interactive agents that combine language, perception, planning and self-improvement inside richly structured virtual environments. Where the earlier SIMA system focused on following instructions across many games and virtual worlds (learning a catalog of specific skills), SIMA 2 integrates a Gemini reasoning model at its core so the agent can do more than map an instruction to a sequence of actions: it can infer high-level goals, plan multi-step strategies, explain its intentions to a human, converse about what it is doing, and refine its own behavior over time.

Technically, that integration involved training on a mixture of human demonstration videos (paired with language annotations) and Gemini-generated labels; the resulting system operates purely from screen pixels and virtual control inputs (keyboard/mouse), yet generalizes to games and environments it hasn’t seen before. DeepMind shows SIMA 2 succeeding at tasks that earlier systems failed, and demonstrates a qualitative shift from brittle instruction following to deliberative, goal-directed behavior in 3D settings.

The paper and blog stress three capabilities: improved reasoning about multi-step goals, better generalization across environments, and self-improvement and explanation abilities that enable human-agent interaction beyond simple prompts. As DeepMind put it: “By integrating the advanced capabilities of our Gemini models, SIMA is evolving from an instruction-follower into an interactive gaming companion. Not only can SIMA 2 follow human-language instructions in virtual worlds, it can now also think about its goals, converse with users, and improve itself over time.” (Google DeepMind)

SIMA 2 is important because embodied reasoning and generalization in realistic 3D environments are core ingredients of many AGI definitions — the ability to perceive, plan, act, and learn in open-ended settings. Training agents that learn from visual input and can introspect / explain their plans bridges the gap between language-only models and agents that must interact with—and change—their environment. Progress here directly accelerates research in robotics, multi-modal world models, and agentic systems that learn from experience rather than only from static text. That shift lowers technical barriers to building systems that can perform complex, goal-directed tasks in the real world (or in accurate simulators), making SIMA 2 a notable step on the AGI pathway. (Google DeepMind)

2. “Teaching AI to see the world more like we do” — DeepMind’s reorganization of visual representations

Who: Andrew Lampinen, Klaus Greff and DeepMind research teams (UK / Google).
Date & source: November 11, 2025 — DeepMind research blog post summarizing a new Nature paper on aligning model visual representations with human concepts. (DeepMind blog + linked Nature paper). (Google DeepMind)

DeepMind researchers published research showing that neural vision systems often represent the visual world in ways that systematically differ from human conceptual organization, and that these discrepancies undermine robustness, interpretability, and generalization. The team used classic cognitive-science tasks (for example, “odd-one-out” judgments) to reveal where models and humans diverge: models can latch on to superficial cues (color, texture, background) rather than the high-level groupings humans use (functions, conceptual class).

The core advance is a method to reorganize a model’s internal visual representation space so that items grouped as similar by humans also become nearby in the model’s representation — effectively aligning machine perceptual geometry with human perceptual geometry. DeepMind shows that doing this not only makes model judgments align better with human expectations, but also improves robustness (fewer surprising errors on out-of-distribution inputs), generalization to new tasks, and downstream helpfulness when those vision models are used as components in larger systems. The blog summarizes: “New research shows that reorganizing a model’s visual representations can make it more helpful, robust and reliable.” (Google DeepMind)

Perception is the foundation on which cognition and action are built. If visual systems learn brittle or human-misaligned representations, any higher-level agent that depends on vision (for planning, reasoning, or language grounding) inherits those flaws. DeepMind’s method addresses this by aligning representations to human concepts, thereby making downstream reasoning more trustworthy and generalizable — crucial properties for AGI-style systems that must operate safely in human contexts. In short, this work reduces a major gap between raw model competence and human-compatible understanding, increasing the reliability of multimodal agents and lowering a practical safety / alignment hurdle on the road to more general intelligence. (Google DeepMind)

3. Anthropic’s $50 billion U.S. data-center / compute investment — scaling the compute foundation for frontier models

Who: Anthropic PBC (CEO/co-founder Dario Amodei referenced in coverage); partners include Fluidstack (UK infrastructure partner); U.S. locations (Texas, New York).
Date & source: November 11, 2025 — Anthropic blog post and broad press coverage (Anthropic announcement; Reuters, TechCrunch, AP, Guardian, etc.). (Anthropic)

On November 11, 2025 Anthropic publicly announced a plan to invest roughly $50 billion in U.S. computing infrastructure — a multi-year buildout of custom data centers (initially in Texas and New York) executed with Fluidstack and other partners. The announcement is not an algorithmic paper but a strategic, operational move: Anthropic says the investment will supply the long-term compute, co-location and efficiency gains needed to sustain training and experimentation at frontier scale for the Claude model family.

Their statement framed the investment as essential for continued frontier research: “We’re getting closer to AI that can accelerate scientific discovery and help solve complex problems in ways that weren’t possible before,” Anthropic’s communications (and reporting on the announcement) quoted company leadership as saying. The coverage makes clear the pledge is intended to secure 24/7 access to GPU farms, specialized networking, and power arrangements that many firms now see as prerequisites for training multi-trillion-parameter models and supporting continuous online learning and agentic services. (Anthropic)

AGI progress doesn’t come from algorithms alone — it is enabled (and sometimes constrained) by access to sustained, affordable frontier compute. A $50B infrastructure commitment by a major model developer signals two things: first, the industry is moving from episodic, cloud-rental training to vertically integrated, large-scale compute platforms designed for continuous model iteration; second, it changes the economic and geopolitical terrain, because which organizations control massive, dedicated compute farms will affect who can iterate fastest on the hardest AGI problems. In short, this is a non-algorithmic but highly consequential advance: it materially increases the ability to train, fine-tune, and operate the large, agentic, multimodal models whose capabilities define the near-term AGI battleground. (Anthropic)

[End]

Leave a comment