The Role of Causal Inference in Drug Discovery
Amidst the demand for large-scale data processing and analysis, researchers grapple with vast, diverse data, especially text-based data with its inherent complexity, high dimensionality, and lack of standardization. Navigating biopharma’s intricate regulatory demands further complicates this. Causal inference is pivotal in deciphering disease biology in this intricate environment.
- Categories
- Tech
Introduction
Research in artificial intelligence, and most notably machine leaning, has experienced a widespread surge of interest that has been steadily increasing in recent years. This trend holds for many different application areas and the life sciences are no exception. It is very intuitive to understand why this is the case – the life sciences have long been a very fertile area for pioneering research in AI and the opportunities that innovations such as large language models (LLMs) have revealed are plenty.
We focus on an equally interesting but less pronounced trend in this blog: the steady rise of interest in causal inference. As Michoel and Zhang point out in a recent study,¹ this trend is quite easily unveiled through keyword searches on MEDLINE/PubMed. These searches reveal an unabating rise in mentions of causal inference in biomedical research of the last couple of decades.
One might wonder why interest in this field is growing recently. Amidst the demand for large-scale data processing and analysis, researchers must manage copious quantities of data from multiple sources. This becomes even more challenging with text-based data, which, due to its inherent complexity, high dimensionality, and the lack of standardization, presents significant hurdles in extracting meaningful insights. Additionally, researchers must navigate the intricate regulatory demands within the biopharma sector. In this complex landscape, causal inference emerges as a crucial tool for untangling the complexities of disease biology. Causal inference has applications throughout the drug discovery lifecycle and is essential is tackling issues from disease understanding to trial data analysis. In this blog, we will explore some of the history of causal inference in the life sciences and give some motivation for future directions.
Causation and Correlation
Causal inference refers to the process of identifying and interpreting cause-and-effect relationships within a system of variables. Historically, it has been instrumental across scientific disciplines, from the principles of physics to intricacies of human psychology. In the complex terrain of drug discovery, causal reasoning serves as a compass, guiding scientists to potential drug targets. Causal relationships help to decode complex biological systems, enabling researchers to pinpoint biomolecules that directly influence disease pathways. For example:
- BCR-ABL gene and Chronic Myeloid Leukaemia (CML): The discovery of a causal link between the BCR-ABL and CML paved the way for the development of targeted drug, imatinib.²
- Warfarin and Clarithromycin: Causal investigations revealed that clarithromycin, an antibiotic, interfered with the blood thinner, warfarin.³ This interaction heightened bleeding risks, underscoring the need for increased patient monitoring.
One major challenge that researchers face in understanding and identifying causal relationships is distinguishing them from correlations between two variables of interest. Let’s look at two examples of correlation:
- Thalidomide and Birth Defects: In the 1950s and 1960s, thalidomide, composed of two mirror-image molecules, was used to treat nausea in pregnant women.⁴ At the same time, correlations between limb abnormalities and pregnant women who took thalidomide was observed.
- Human sigma-receptor-targeting drugs and SARS-CoV-2: Initial research into the effect of various drugs on SARS-CoV-2 showed a correlation between those that targeted human sigma receptors (σ1/σ2) and negative modulation of the SARS-CoV-2 infection.⁵
Although in both situations, scientists discovered correlations in the data, only in the case of Thalidomide was a causal relationship established.⁶ More recent studies into the SARS-CoV-2 infection reveal that other confounding factors might be at play that can explain the correlations observed in the data.⁷ Aside from confounding factors, many other situations between two events (such as the expression levels of two proteins X and Y) can occur that might result in these events being correlated with each other. These include chance as well as issues with how the underlying data was collected.
Causal Inference Today
One of the earliest methods of using data to model causal inference came from the life sciences. Geneticist Sewall Wright (1889-1988) investigated the inheritance of coat colour in guinea pigs while he was working for the US department of Agriculture. He developed a system of “path diagrams” to systematically map out and model various factors, from genetic factors to chance, that lead to coat colour inheritance in newborn guinea pigs.
This quantitative approach eventually inspired more advanced techniques for modelling causal relationships from data such as Judea Pearl’s do-calculus and graphical causal models. These approaches emphasize using data alongside an underlying causal model establishing variables of interest to quantify how strongly certain variables causally impact each other. As these methods rely on using domain expertise to establish which variables should be considered and therefore require making complex assumptions about the world, they have largely flown under the radar compared to modern statistical techniques which epitomize purely data-driven conclusions. In short – it is far easier to reason about statistical correlations than to go the extra mile to establish a line of causality.
A parallel approach in causal inference is the development of logic-based causal inference which is the use of mathematical logic to infer new causal relationships from an existing knowledge base of known relationships. This of course requires systematically collecting and organizing known relationships in a database such as a knowledge graph.
Not all Causal Relationships are Created Equal
In light of the complexity of the field, some researchers have tried to categorize the different types of causal statements we can make about two events X and Y. One such categorization is Judea Pearl’s ladder of causation, popularized by his recent book, The Book of Why.⁸ Pearl distinguishes between three key levels of causal reasoning:
- Association: The lowest level of the ladder, and the easiest to carry out, involves identifying correlations and patterns in data. This has been the traditional strength of machine learning, and by extension LLM models.
- Intervention: Intervention involves modelling what the effect of a deliberate action on a system will be. It this requires understanding causal relationships that underlie the system.
- Counterfactuals: The highest rung of the ladder entails imagining hypothetical futures, especially those that a system or model might not have seen explicitly in its data. This is the holy grail of causal reasoning according to Pearl.
Distinctions such as these are useful when thinking about both the status quo and potential of modern AI systems to carry out causal reasoning.
Causaly: Cause-Directionality-Effect
Causaly is a powerful machine-reading platform for exhaustively extracting causal relationships in the form of cause-and-effect evidence from biomedical research. By machine-reading available biomedical literature, Causaly extracts relationships semantically, discerning directional relationships connecting different concepts. For example, metformin (a drug used to treat type 2 diabetes)⁹ showed a downregulated relationship with polycystic ovary syndrome, Figure 2, implying it is a treatment for this disease. The user can then peruse original sources to make a judgement on the presented evidence.
For research scientists, discerning the relationship between two concepts is just the starting point. What truly adds depth to their analyses is understanding the strength and reliability of these relationships. This requires rigorous evaluation of the evidence supporting the relationships, ensuring it’s not only compelling but also drawn from credible sources.
Causaly plays an indispensable role here, offering features that enable users to assess the robustness of these connections, providing full transparency. By collating all available evidence for a given relationship, Causaly allows users to evaluate the consistency and reliability of the relationship, considering various aspects such as the diversity of the sources, the quality of the studies, and the concurrence of the findings. This comprehensive scrutiny empowers research scientists to make confident decisions based on reliable and robust evidence, adding a crucial layer of assurance to their research outcomes.
Moreover by building, maintaining and curating its ever-growing knowledge graph that captures this wide range of causal relationships, Causaly is perfectly poised to develop further capabilities in the future for ascending the rungs of the causation ladder and giving researchers even more powerful features to reason about biomedical research questions and gain novel and ground-breaking insights.
Conclusion
In the current era of expanding knowledge and technological advancements, the integration of causal reasoning in bioinformatics holds transformative potential for reshaping health science. By leveraging advanced platforms like Causaly, researchers can tap into the vast wealth of the world’s scientific literature, uncovering hidden connections and relationships that would otherwise remain undiscovered. This not only opens doors for breakthroughs in drug discovery but also enables precise and targeted healthcare interventions, ushering in a new era of personalized medicine and improved patient outcomes.
References
- Michoel, T., Zhang, J. D., Drug Discov. Today, 2023;28(10):103737. Source
- Soverini, S., Mancini, M., Bavaro, L., et. al., Mol Cancer., 2018, 17(1):49. Source
- Lane, M. A., Zeringue, A., McDonald, J. R., Am. J. Med., 2014, 127(7):657-663. Source
- Kim, J. H., Scialli, A. R., Toxicol. Sci., 2011;122(1):1-6. Source
- Gordon, D. E., Jang, G. M., Bouhaddou, M., et. al., Nat., 2020;583(7816):459-468. Source
- Gao, S., Wang, S., Fanm R., et. al., Biomed. Pharmacother., 2020;127(1):110114. Source
- Tummino, T. A., Rezelj, V. V., Fischer, B., et. al., Science., 2021;373(6554):541-547. Source
- Pearl, J., & Mackenzie, D. (2018). The Book of Why: The New Science of Cause and Effect. Basic Books.
- Foretz, M., Guigas, B., Viollet, B., Nat. Rev. Endocrinol., 2023;19(8):460-476. Source
More on Tech