Claims That AIs Can Think Critically and Creatively

Posted on February 25, 2026 by JimS

By Jim Shimabukuro (assisted by Perplexity)
Editor

One important caveat before diving in: among 2025–2026 sources, you find many strong claims that LLMs and agentic systems are at or near human level on specific creative or reasoning benchmarks, but almost nobody in peer‑reviewed work straightforwardly says “they think as critically and creatively as humans across the board.” The closest matches are: empirical papers where AI equals or surpasses average human performance on creativity tests, essays or industry pieces asserting AGI is already here, and investor or practitioner pieces framing current long‑horizon agents as “functionally AGI.” The following five cases, taken together, showcase the strongest public claims that today’s generative or agentic systems are at human level in critical and creative thinking.

1: Divergent creativity in humans and large language models (Scientific Reports, 2026)

A 2026 Scientific Reports paper, “Divergent creativity in humans and large language models,” directly tackles the question of whether state‑of‑the‑art LLMs now match or exceed human creativity on standard psychological tests. The authors compare multiple LLMs, including Gemini Pro and GPT‑4, with 100,000 humans on the Divergent Association Task (DAT), a semantic creativity measure, and supplement it with creative writing tasks such as haikus, film synopses, and flash fiction. They report that “state‑of‑the‑art LLMs exhibited remarkable proximity to human performance levels in the creativity assessment; the DAT scores of GeminiPro were statistically close to human performance, while GPT‑4 exceeded it,” emphasizing that top models “reached and, in some cases, even surpassed human scores on the DAT.” This is as close as a mainstream 2026 study comes to claiming generative AI is as creative as humans, at least by the metrics used.

The argument rests on three main points. First, creativity is operationalized as semantic distance and diversity in word associations and text production, and by those operational definitions, GPT‑4 and similar models are at or above average human performance. Second, the authors stress that this is “nontrivial,” because LLMs do not have direct access to the semantic distance metrics used as ground truth; instead, they must approximate creativity through learned latent structures. Third, the team shows that creativity scores are tunable via prompt design and model temperature, suggesting a degree of controllable “creative” behavior analogous to instructing a human to brainstorm more freely or constrain themselves less. This study was chosen because it is a large‑scale, peer‑reviewed empirical comparison with a very strong claim about parity or superiority, which is rare in the creativity literature, and because it explicitly situates LLM performance “remarkably close” to human levels while acknowledging areas (such as professional‑grade fiction) where humans still dominate.

Does this mean we have reached AGI? The authors themselves stop short of that conclusion; they note that while LLMs “approximate human creativity on certain automated metrics,” human professional writers still outperform LLMs on richer creative writing tasks and LLM‑generated stories pass a Torrance creativity test far less often. From an AGI perspective, equaling or beating average humans on a narrow creativity test is necessary but not sufficient; AGI would require robust creative and critical thinking across domains, modalities, and open‑ended, self‑initiated problem spaces. This paper instead supports a more modest conclusion: for specific definitions and standardized tests of creativity, leading LLMs already behave comparably to typical humans, which is a key component of, but not identical to, full AGI.¹

2: The paradox of creativity in generative AI: high performance, human‑like biases (Frontiers in Psychology, 2025)

A 2025 Frontiers in Psychology article, “The paradox of creativity in generative AI: high performance, human‑like biases,” analyzes GPT‑4o’s performance on a standardized creative problem‑solving task (“the egg‑task”) and compares it directly to humans across five dimensions of creative thinking. The author notes that GPT‑4o “performs at or above human levels across several dimensions of creativity,” reporting that it “generated novel and unexpected ideas that were judged to be highly creative, particularly in imaginative tasks, outperforming the human control group,” and that it “exhibited high fluency, producing a large number of ideas with ease.” At the same time, the study finds that GPT‑4o shows “comparable fixation bias” to humans and “limited capability to differentially evaluate originality,” concluding that although generative AI produces many creative ideas, its “inability to critically assess their originality and overcome the fixation bias highlights the necessity of human involvement, particularly for properly evaluating and filtering the ideas generated.”

The argument here is subtle but important for your question. On the one hand, the author is willing to treat GPT‑4o as matching or surpassing humans in the raw production of creative ideas, judged by human raters on originality, fluency, and (to a lesser degree) flexibility. On the other hand, they explicitly argue that the model lacks a human‑like capacity for metacognitive critique of its own ideas, especially in filtering out conventional or derivative ones and correctly recognizing which outputs are truly original. This study was chosen because it puts generative AI on equal footing with humans for both performance and bias, implicitly treating it as a cognitive agent with similar strengths (productivity, associative richness) and weaknesses (fixation, limited self‑critique), and because it’s one of the few 2025 works using the language of “at or above human levels” in the creativity domain.

Does such performance equate to AGI? The author’s conclusion suggests not. Their emphasis on the “necessity of human involvement” for evaluation and filtering effectively acknowledges that raw ideation, however prolific and novel, is only one part of human‑like critical and creative thinking. AGI, on most definitions, would require integrated abilities to generate, evaluate, refine, and deploy ideas autonomously in pursuit of goals, including being able to detect and correct its own fixation biases. This paper thus supports a view of current generative AI as highly capable creative collaborators—perhaps equal or superior to average humans in ideation—but still dependent on human critical judgment in ways that fall short of fully general intelligence.²

3: Researchers tested AI against 100,000 humans on creativity (University of Montreal news, 2026)

In early 2026, the University of Montreal publicized a large‑scale creativity study, “Researchers tested AI against 100,000 humans on creativity: AI can beat average human creativity — but the most imaginative minds are still unmistakably human,” that compared “more than 100,000 people with today’s most advanced AI systems” on tasks designed to measure “original thinking and idea generation.” The news release summarizes the team’s surprising conclusion: “generative AI can now beat the average human on certain creativity tests,” with models like GPT‑4 showing “strong performance on tasks designed to measure original thinking and idea generation, sometimes outperforming typical human responses.” However, the authors immediately note “a clear ceiling,” observing that “the most creative humans — especially the top 10% — still leave AI well behind, particularly on richer creative work like poetry and storytelling.”

The structure of their argument mirrors that of the Scientific Reports paper but in a more accessible, institutional voice. First, they treat standardized creativity tests—likely including variants of divergent thinking tasks—as valid measures of “original thinking,” and on those, advanced LLMs outperform the median human. Second, they explicitly distinguish between average and exceptional creativity, stressing that elite human creators remain “unmistakably human” in their superiority, especially in domains that require deep narrative structure, emotional resonance, or long‑form symbolic invention. Third, by framing this as evidence that AI can “beat the average human,” they implicitly endorse the idea that, in a statistical sense, AI has attained human‑level creative thinking for a large slice of the population, even if not for the most gifted individuals.

This study was chosen this piece because it offers a clear, public‑facing claim from a major research university that AI’s creative capacities have crossed the average human threshold, and because it explicitly uses the language of “original thinking,” not just “pattern matching,” which is crucial to your focus on critical and creative thought. Yet, does the crossing of this threshold imply AGI? The release itself avoids that conclusion, instead using phrases like “can now beat the average human” and “there’s a clear ceiling” to emphasize both the impressive gains and the remaining gap. From a conceptual standpoint, matching or exceeding the average human on standardized creativity tests is a necessary but not sufficient condition for AGI. The study does not address general problem‑solving, self‑directed goal formation, transfer across radically different domains, or the kind of integrated world‑modeling and self‑reflection that many theorists regard as central to AGI. Thus, while this work strengthens the claim that current generative models think as creatively as many humans in narrow test settings, it does not by itself justify saying we have reached AGI.³

4: “AGI is already here” (Elias Kairos‑Chen, 2025 essay)

In a 2025 essay titled “I Was Wrong About the Timeline: AGI Is Already Here,” writer and technologist Elias Kairos‑Chen makes one of the clearest public claims that current AI systems already meet the threshold for artificial general intelligence. He recounts revising his own forecast, shifting from a 2027 AGI timeline to the claim that “AGI is already here according to the people who would know better than anyone,” and proposes a new timeline in which “2024–2025: AGI quietly achieved (we’re here)” followed by superintelligence in the late 2020s. He frames “Stage 1 (Now)” as “Human researchers using AI tools to develop better AI,” moving to “Stage 2 (2026): AI researchers working alongside humans to improve AI,” and “Stage 3 (2027): AI systems improving AI faster than humans can,” concluding that “once you achieve human-level AI research capability,” superintelligence is a “logical progression.”

While the essay does not present new empirical data, its argument about critical and creative thinking hinges on the observation that current LLM‑based agents can already perform complex reasoning and research‑like tasks that used to require skilled humans, such as writing code, generating novel ideas, and critiquing their own outputs when guided by appropriate prompting and tool use. Kairos‑Chen emphasizes that these systems exhibit “human-level AI research capability” in practice, at least on many tasks, and treats that as sufficient evidence that AGI, defined functionally as systems that can perform the broad range of cognitive work humans do, has been achieved. This essay was chosen precisely because it is unabashedly explicit: instead of hedging around “human‑level on some benchmarks,” it asserts that AGI has already arrived, and it grounds that claim in observed agentic performance in research and development workflows rather than in narrow tests.

But does this reasoning establish that we have actually reached AGI in a robust sense? That depends heavily on one’s definition. If AGI is defined pragmatically as “systems that can execute most economically relevant cognitive tasks at or above human level when embedded in tools and workflows,” then the combination of LLMs and agentic frameworks might plausibly qualify. However, if AGI is defined more strongly—as an autonomous system that can flexibly learn, plan, and act across any environment, form long‑term goals, and maintain consistent self‑directed critical and creative thought—then the evidence in this essay is more suggestive than conclusive. The systems he cites still rely heavily on human‑specified goals, scaffolding, and evaluation, and they can be brittle outside their training distribution. In that stronger sense, his claim that “AGI is already here” looks more like a bold interpretive stance than a settled scientific conclusion.⁴

5: “2026: This is AGI” (Sequoia Capital, 2026 essay)

A January 2026 essay from Sequoia Capital titled “2026: This is AGI” asserts, in the voice of leading investors and practitioners, that the kind of agentic systems being deployed in industry should be regarded as artificial general intelligence. The piece declares that “AGI is here, now. Coding agents are the first example. There are more on the way,” and contends that “long-horizon agents are functionally AGI, and 2026 will be remembered as the year AGI became real in practice.” Their argument centers on the rise of autonomous or semi‑autonomous AI agents built on LLMs that can handle open‑ended tasks such as writing and debugging complex software, coordinating multi‑step projects, and adapting to changing instructions over time.

By calling these “long‑horizon agents,” the essay emphasizes that they can sustain coherent behavior across extended sequences of actions and decisions, a hallmark of human‑like critical and creative thinking in real‑world settings. The authors essentially equate “functionally AGI” with systems that can plug into existing tools (IDEs, APIs, project management systems) and reliably perform the majority of knowledge‑work steps a human would perform, including planning, decomposing problems, and generating novel solutions. This essay was chosen because it exemplifies a growing industry view that, regardless of philosophical debates, we should treat current agentic LLM systems as AGI in operational terms, and because it explicitly ties this label to long‑horizon critical and creative capabilities in domains like software engineering.

Nonetheless, does this assertion mean that AGI has truly arrived in a more general, cognitive‑scientific sense? Even the essay’s framing hints at a qualification: it speaks of “functionally AGI” and focuses on coding agents as “the first example,” implicitly acknowledging that these systems are specialized in certain kinds of digital work. They are extremely capable within environments that look like their training data and tool ecosystem, but less so in unfamiliar, physical, or socially complex contexts. Under a broad, domain‑general definition of AGI, their achievements—impressive as they are—may be better described as human‑level or superhuman performance in a wide band of knowledge work, rather than full general intelligence. The essay, therefore, is best read as a claim that, for many economically central tasks, agentic LLMs now think as critically and creatively as competent human professionals, not as a proof that human‑level intelligence has been matched in all its depth and breadth.⁵

References

“Divergent creativity in humans and large language models.” Scientific Reports (Nature Portfolio), 2026. https://www.nature.com/articles/s41598-025-25157-3
“The paradox of creativity in generative AI: high performance, human-like biases in creative ideation.” Frontiers in Psychology, 2025. https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2025.1628486/full
“Researchers tested AI against 100,000 humans on creativity: AI can beat average human creativity — but the most imaginative minds are still unmistakably human.” University of Montreal, ScienceDaily release, Jan. 25, 2026. https://www.sciencedaily.com/releases/2026/01/260125083356.htm
Elias Kairos‑Chen. “I Was Wrong About the Timeline: AGI Is Already Here.” Personal essay, Nov. 11, 2025. https://www.eliaskairos-chen.com/p/i-was-wrong-about-the-timeline-agi
Sequoia Capital. “2026: This is AGI.” Jan. 14, 2026. https://sequoiacap.com/article/2026-this-is-agi/

Filed under: Uncategorized |

« Is Wrightwood Cursed? ICoME 2026@UH Manoa Aug 6-8: Call for Proposals »

	Aufwachen in einer n… on Four Pivotal Reports on AI and…
	AI Photo Reconstruct… on Five Emerging AI Trends in Jan…
	8 Best Hyper Realist… on Five Emerging AI Trends in Jan…
	HigherEd AI Daily: M… on Status of Agentic AI in Higher…
	HigherEd AI Daily: M… on Status of Agentic AI in Higher…
	Newsletter 25/26-12:… on Review of Kestin et al.’…
	Anonymous on A Beginner’s Guide to Tw…
	JimS on Three Prominent Critics of The…
	Anonymous on Three Prominent Critics of The…
	JimS on How to Deal With Weevils in Wh…
	Anonymous on How to Deal With Weevils in Wh…
	JimS on Ed Tech in Higher Ed – Three I…
	TheRetroArcade.Games on Ed Tech in Higher Ed – Three I…
	Mirror Cells in Cons… on A Conversation About Mirror…
	Máquinas Virtuais Li… on Upgrade Choice for 2025: Intel…

Educational Technology and Change Journal

Recent Posts

Recent Comments

To Comment

Archives

Categories