Advancing Drug Discovery Outcomes with Virtual Cells at Recursion

Introduction

The quest to decode biology is one of humanity’s most enduring and critical challenges. Traditional drug discovery, a painstaking process often spanning over a decade and costing billions, still faces a high rate of failure in clinical trials. We believe a fundamental shift is necessary to alter this paradigm—one that moves towards a more comprehensive, predictive, and mechanistic understanding of biology.

What if we could construct computational models that not only predict cellular responses to interventions but also explain why those responses occur? What if these “virtual cells” could help us navigate the immense complexity of biology to discover and develop new medicines more efficiently and effectively?

At Valence Labs, Recursion’s AI research engine, we are exploring this frontier. We believe the time is ripe to engineer the foundations for such virtual cells. In our recently released perspective paper, “Virtual Cells: Predict, Explain, Discover,” we share our thinking on developing and validating these computational systems.

Predict, Explain, Discover: The Pillars of the Virtual Cell

Our view of the virtual cell rests on three interconnected capabilities:

Predict: Virtual cells must accurately predict how cells respond to a broad range of perturbations—from genetic edits to novel chemical compounds. The ability to accurately predict changes across diverse biological readouts—gene expression, protein activity, cellular morphology—is key to prioritizing hypotheses and de-risking the early stages of drug discovery.
Explain: Moving beyond prediction, virtual cells must offer mechanistic explanations that support reasoning about what would happen under different interventions—not just what did happen. This means enabling counterfactual reasoning and capturing how perturbations ripple through the complex, hierarchical organization of biological systems, from molecular networks to cellular functions to tissue-level behaviours, higher-order systems, and beyond. Such causal insights are essential for building biological understanding and for designing interventions (i.e. therapeutics) with greater precision and confidence.
Discover: When these predictive and explanatory capabilities are integrated within a dynamic, learning-driven framework, virtual cells can evolve into powerful engines for biological discovery. We envision systems that iteratively generate hypotheses, design experiments for validation, and continuously refine their internal models, actively guiding the search for therapeutic opportunities.

Foundational Pillars at Recursion Enable the Pursuit of the Virtual Cell

Realizing this vision is possible due to unique capabilities within Recursion:

Industrial-Scale Interventional Data: Building and validating virtual cells requires biological data at an unprecedented scale and diversity. Recursion has dedicated over a decade to creating one of the world’s largest and most relatable proprietary biological and chemical datasets, now exceeding 65 petabytes. This dynamic and interventional data spans phenomics, transcriptomics, proteomics, invivomics, patient-level data and more. With up to 2.2 million experiments executed weekly via Recursion’s automated labs, these datasets continue to expand.
Pioneering AI and Machine Learning for Biology: Modern AI/ML techniques power virtual cells. We’re focused on developing and applying ML approaches tailored to the complexity of biology and drug discovery, building on a strong track record that includes foundational research, open-source models, and real-world applications.
Purpose-Built Supercomputing Infrastructure: The scale of biological data and complexity of AI models necessitate significant computational capacity. Recursion’s BioHive-2, the most powerful supercomputer in biopharma, provides us with the essential infrastructure to train industry-leading models and explore the frontiers of biological simulation.
Synergy of Automation and AI: Tight integration between Recursion’s advanced lab automation and Valence Labs’ AI model development creates powerful feedback loops, enabling rapid cycles of hypothesis generation, experimental validation, and model refinement.

Laying the Groundwork: TxPert as an Early Step in Our Blueprint

With Recursion, we are taking concrete, iterative steps to build upon our foundations on our path to virtual cells. Today, alongside our perspective paper, we are introducing TxPert, a state-of-the-art model for predicting transcriptional responses to combinatorial genetic perturbations. TxPert is designed to perform robustly in out-of-distribution (OOD) scenarios, such as predicting the effects of unseen gene knockdowns, combinations of perturbations, or responses in new cellular contexts.

Accurately predicting these responses is essential for prioritizing hypotheses in drug discovery, where experimental testing of all possibilities is neither scalable nor economical. TxPert achieves strong performance by leveraging biological knowledge networks and graph-based architectures, enabling it to capture context-specific effects and generalize beyond the training distribution.

TxPert tackles the “Predict” capability within our framework. It represents an initial step in our journey towards the virtual cell, and the benchmarking framework it introduces exemplifies our commitment to building models informed by biological complexity and guided by real-world drug discovery outcomes.

The Engine of Discovery: Agentic Systems

The virtual cell is a dynamic, continuously learning system. As detailed in our paper, we envision a “lab-in-the-loop” paradigm where virtual cells are iteratively refined. Conceptually, a virtual cell can be considered a theory of human cellular physiology and pathology that generates hypotheses about biological systems, observable through experiments.

Agents can then use these hypotheses to design and run experiments in the lab, explicitly seeking to falsify the virtual cell’s current “theory”. An experiment that proves a prediction wrong becomes a crucial learning opportunity. This helps refine the model, making it more accurate and robust. By actively seeking experiments that falsify the current virtual cell, we can uncover novel biology and accelerate therapeutic discovery.

This cycle – predict, test (falsify), refine – creates a powerful discovery engine. In the context of drug discovery, it represents a shift from the slower “design-make-test-measure” paradigm to a faster “design-simulate-test-learn” or “design-simulate” approach, paving the way for new therapeutic opportunities.

Exploring how agentic systems can accelerate this cycle is a key focus. We are actively developing systems capable of not only conditionally generating hypotheses using virtual cells but also prioritizing hypotheses for falsification, designing experiments to efficiently test those hypotheses, orchestrating the execution of experiments, analyzing the experimental outcomes, and integrating these into the virtual cell for iterative refinement. Recursion’s work on the LLM Orchestrated Workflow Engine (LOWE) illustrates this potential.

Bridging Biological Scales: From Molecular Mechanisms to Cellular Function

A virtual cell must also bridge the gap between the molecular world and cellular phenomena. While simulating entire cells from first principles is not yet feasible, we are aiming to integrate across scales. This involves leveraging high-fidelity, bottom-up simulations for key molecular events and connecting these physics-informed models with functional data from Recursion’s large-scale cellular experiments.

Our research includes exploring advanced molecular modeling techniques, including areas related to understanding quantum-derived chemical properties (i.e. OpenQDC), to infuse our models with a deeper mechanistic understanding.

A Path Forward: Rigorous Benchmarking and Collaborative Progress

Tangible progress in the development of virtual cells hinges on establishing and adopting robust, biologically grounded benchmarks. Standardized evaluation is essential for ensuring new methods advance predictive and explanatory capabilities.

Initiatives that promote open benchmarking platforms, such as Polaris, and the sharing of high-quality datasets, like RxRx3-core, are vital steps in establishing these high standards and fostering transparent progress across the field.

Our ultimate goal is to contribute to a future where drug discovery is dramatically accelerated and the probability of success is significantly increased for therapies addressing critical patient needs. The path towards virtual cells and eventually virtual tissues, organs, and even patient-specific models is ambitious but charts a course towards a new era of precision medicine.

We are excited to share our perspective and ongoing work, and we invite the broader scientific community to engage with these ideas and to challenge and build upon them as we collectively work towards a new, more predictable, and powerful paradigm in the discovery of new medicines.

We encourage you to read our full perspective paper, “Virtual Cells: Predict, Explain, Discover,” to delve deeper into the vision guiding our research at Valence Labs.