How data and AI are helping unlock the secrets of disease

Written by:

Claus Bendtsen

Executive Director, Data Sciences and Quantitative Biology, Discovery Sciences, R&D

Slavé Petrovski

VP and Head of Genome Analytics & Bioinformatics, Centre for Genomics Research, Discovery Sciences, R&D

Artificial Intelligence (AI) is rapidly turning science fiction to science fact. Self-driving cars are just one example of a previously unthinkable technology that looks to harness data science and AI to revolutionise how we get about. AI also has the potential to transform the way we discover and develop potential new medicines.

At AstraZeneca, we are using data science and AI across R&D to collate, connect and analyse different data and information. This will help us better understand disease, identify drug targets with a higher probability of success, recruit for and design better clinical trials and, we hope, ultimately speed up the way we design, develop and make new medicines.

Our work focuses on better understanding the fundamentals of disease, enabling AstraZeneca to find new ways to treat, prevent, modify and eventually even cure disease. This, combined with a more data-driven culture, has the potential to really change how we do our science. Here are some of the ways we use data science and AI in our day-to-day work, helping in our pursuit of advancing science to create potential innovative medicines:  

Building disease understanding through knowledge graphs

If you’ve ever asked Google or Alexa a question, you will have used a knowledge graph. They are incredible libraries of information which can spot the connections between thousands of different sources to find you the answer you need.

Each year, the sheer amount of scientific information and data available to researchers grows. At AstraZeneca, we’re now beginning to harness these vast networks of scientific data facts to give our scientists the information they need about genes, proteins, diseases and drugs, and their relationships – how they interact, work together or work against each other.

By using AI and machine learning to combine information from multiple sources, we hope to draw better and faster conclusions than if we analysed all this data by human hand. AI also has the potential to find previously unexplored patterns not immediately obvious to the human eye which we hope will lead to new understanding of diseases and the drugs we design to treat them.

Our knowledge graphs allow researchers to ask key questions about genes, diseases, drugs and safety information to help identify and prioritise drug targets. And, as our data and knowledge continues to evolve, so will our graphs which means every new experiment will benefit from everything learned before.

Ultimately, we want to develop personalised knowledge graphs that bring the right information to the right scientist, at the right time so that each one can play their part in advancing our understanding.

Advancing genomics research with big data and AI

Our Centre for Genomics Research (CGR) team is working hard to analyse up to two million genome sequences by 2026. Having access to this wealth of information means we hope to identify those variants, genes, pathways or other parts of the genome that are likely to cause disease, predict its progression and response to treatment. All of this, integrated using knowledge graphs, aims to help us better understand diseases and how they work, identify new drug targets and design better clinical trials.

Through access to hundreds of thousands of exome sequences, our team of experts have developed bespoke analytical frameworks to study the genetic underpinnings of human disease. Insights emerging from the CGR currently include identifying candidate drug targets, exploring repositioning opportunities, leveraging natural genetic variation for human safety assessment, understanding market opportunities based on population genomics, and performing real-time human genetic validation/invalidation of target propositions.

This wealth of genomics data coupled with the expert application is enabling our team to focus on analysing and interpreting the data to advance science. For example, we are building novel machine learning and deep learning-based methods to more objectively prioritise the genes or other parts of our genome that could potentially cause disease.

Using AI to get the most from every experiment

CRISPR gene-editing technology plays a significant role in drug discovery. We can use the technology in functional genomics screens, to sequentially delete every gene in the genome to ask what role those genes play in biology. And in cancer research, we use CRISPR to identify which genes, when deleted, lead to resistance or sensitisation to our cancer medicines.

To get the most from every experiment, we are training machine learning and deep learning models to increase our confidence of the data and analyse the imaging-based outputs of CRISPR screens. This can increase the information available from the screens and helps us get answers more quickly.

Beyond disease understanding

The importance of data science and AI to AstraZeneca is not confined to disease understanding. AI is already being embedded across our R&D, enabling our scientists to see more from our imaging data and speeding up the design of clinical trials.

A common reason why a potential new drug fails during its development is that it causes harm to the liver. But it is challenging to predict liver toxicity pre-clinically. To address this, we have created models that take a Bayesian approach to machine learning, i.e. which take a probabilistic approach to inference. The models analyse data from many safety experiments to give predictions on whether a potential new medicine is likely to cause liver injury, and crucially capture the uncertainty of each estimate in a so-called posterior predictive distribution. This improves decision-making, helping ensure only drugs with acceptable side effects are progressed.

This and many other exciting applications for AI mean we are learning where we can best harness these new technologies and further automate processes, freeing up more time for our people to do what they do best – pushing the boundaries of science to deliver life-changing medicines.