Meet DeepScientist: The AI That Outperforms Humans in Scientific Research by 183.7%!
2025-10-09T11:34:11Z
Imagine a world where artificial intelligence not only assists in scientific research but surpasses human experts by a staggering 183.7%. Welcome to the groundbreaking release from Westlake University—a milestone that redefines our understanding of what AI can achieve.
Introducing DeepScientist, the first fully autonomous AI scientist capable of conducting research independently. Developed by Westlake University's NLP Lab, this revolutionary system demonstrates that AI can now engage in iterative, goal-driven scientific discovery, consistently outpacing even the top human talents without requiring any human oversight.
In a remarkable two-week feat, DeepScientist implemented and tested over 1,000 hypotheses, achieving results equivalent to three years of human effort. This AI marvel didn’t just stop there; it also improved performance metrics on the RAID dataset by an impressive 7.9% AUROC, setting a new benchmark for human State-of-the-Art (SOTA).
What sets DeepScientist apart from previous AI systems is its proactive approach. While traditional models often needed explicit instructions and produced minimal results, DeepScientist autonomously identifies research gaps, proposes innovative ideas, writes code, conducts experiments, and even drafts comprehensive research papers. This shift from random exploration to sustained and focused investigation marks a significant leap in the realm of scientific inquiry.
DeepScientist formalizes the discovery process as a hierarchical Bayesian optimization problem. This means it aims to maximize valuable findings all while operating within a defined budget. Its innovative three-tier evaluation loop tests ideas with increasing rigor and resource allocation, promoting the most promising results and ensuring that the exploration is as efficient as it is groundbreaking.
The results have been nothing short of astonishing. In tests involving competitive challenges like AI text detection and agent failure attribution, DeepScientist showcased exceptional performance. Notably, it developed a novel A2P method for failure attribution, yielding a 183.7% improvement over human SOTA on the Who&When benchmark.
One of the core strengths of DeepScientist lies in its capacity to thrive in low-success-rate environments. Its structured methodology strikes a balance between exploration and exploitation, paving the way for continuous progress where brute force may falter. Furthermore, the experiments highlighted a fascinating “scaling law” for discovery, revealing that simply increasing GPU resources can linearly enhance the weekly output of high-impact findings.
DeepScientist is not just rewriting the rules of scientific research; it's heralding a new era of collaboration between humans and AI, where AI acts as an exploration engine, liberating human scientists to tackle profound questions and ultimately oversee critical judgments. In a bid to catalyze this evolution, the development team has committed to open-sourcing the core system, encouraging rapid advancements in this exciting frontier.
Elena Petrova
Source of the news: Pandaily