AI begins to reveal how drugs are found
Editorial / April 07, 2026
Drug discovery has never been a straightforward process. For decades, it has relied on informed guesswork carried out under tight constraints. Testing every possible molecule is not realistic, and each experiment takes time, money and effort. The urgency of the Covid-19 pandemic brought these limits into sharp focus. Now, new research suggests that artificial intelligence could make this process not only faster but also easier to interpret.
A study conducted by Ph.D. scholar Satya Pratik Srivastava and Professor Rajeev Kumar Singh from the Department of Computer Science and Engineering at Shiv Nadar University, Delhi-NCR, in collaboration with Dr. Antonia Mey and Rohan Gorantla from the University of Edinburgh, present a framework that integrates statistical learning with chemical modelling to predict protein–ligand binding affinity, particularly in settings where experimental data are scarce. Published in the Royal Society of Chemistry’s journal Digital Discovery, the work reflects a growing shift towards AI systems that do more than make predictions. They also explain them.
What the team has built is, at its core, a decision-making tool. It suggests which molecules to test next and explains why those choices make sense. The aim is simple: avoid unnecessary experiments and focus on the compounds most likely to succeed. Binding affinity, often expressed as pKi or pIC50, sits at the heart of early drug discovery. But measuring it across thousands of molecules is costly, so choosing where to look becomes crucial.
The method relies on active learning. Rather than analysing everything at once, the model starts small, learns from each result, and gradually refines its approach. It uses a Gaussian process model to estimate both how well a molecule might bind and the confidence in that estimate. That sense of uncertainty turns out to be valuable. It highlights the model’s knowledge gaps and guides the next steps.
An acquisition function ties this together. Each molecule is given a score that reflects both its promise and the uncertainty around it. Those with the highest scores are tested next. In practice, this creates a careful balance between exploring new chemical territory and building on what is already known. Lean too far in one direction, and you either miss better candidates or waste effort chasing unlikely ones.
Crucially, the system does not operate as a black box. Using SHAP, it breaks its predictions down into contributions from specific molecular features, such as functional groups or structural motifs. This allows connecting the model’s output to established chemical understanding and identifying which features drive stronger binding.
When tested on four protein targets, including TYK2 and the SARS-CoV-2 main protease, the approach consistently identified high-affinity molecules more efficiently than random selection.
The researchers also highlight an important caveat. The quality of the dataset matters deeply. When many molecules look alike, it becomes harder to distinguish between strong and weak binders. As datasets become more varied, patterns stand out more clearly, making it easier for models to learn meaningful differences. A key factor here is scaffold diversity, which describes how much the core molecular structures differ within a dataset. When diversity is low, the model sees too many similar compounds and struggles to distinguish strong candidates from weak ones. Higher diversity, by contrast, provides clearer signals and leads to more dependable learning. In many cases, the study finds that the quality and composition of the dataset can be just as important as the model choice itself.
There are trade-offs to consider. Simpler molecular representations tend to perform more consistently across datasets, while more complex approaches can achieve greater accuracy but are also more sensitive to data structure. A similar balance appears in selection strategies. Focusing only on top-scoring candidates works well when patterns are already well understood, but more exploratory methods are useful when the chemical space is less familiar.
The model’s ability to explain its reasoning strengthens confidence in its predictions. Features such as halogen substitutions and nitrogen-containing aromatic groups, which are already known to influence binding through hydrogen bonding and electrostatic effects, consistently emerge as important. This alignment with established chemistry suggests the model is learning genuine relationships rather than spurious patterns.
To support practical use, the researchers have developed an interactive platform that allows users to monitor performance, examine which features matter most, and understand how decisions evolve.
Overall, the takeaway is measured but significant. Progress in drug discovery may rely less on perfect predictions and more on making informed, transparent decisions at each step, combining computational insight with human judgment. Progress in drug discovery may depend less on chasing perfect predictions and more on making better choices along the way. By combining probabilistic modelling with clear, interpretable insights, this approach offers a more thoughtful way to navigate complex chemical space, while keeping human judgement firmly in the loop.
More Blogs
The Hawthornden Literary Retreat bestowed on Dr Sambudha Sen to complete the manuscript of a novel
Professor Sambudha Sen, Head of the Department of English at Shiv Nadar Institution of Eminence, Delhi-NCR, was awarded a residency at the...
The Power of the Moving Body
Movement is an innate bodily action that humans have been exhibiting for the longest time. Long before language was invented, the body was the...
How Does A Multi-Disciplinary Approach To Education Enhance Learning And Prepare Students For A Multi-Faceted World?
In today’s world, where businesses are changing almost every day, it is the responsibility of educational institutes to provide holistic...