AI and Computational Biology for Health

AI2Health: AI and Computational Biology for Health

The AI2Health research cluster is geared towards fundamental and interdisciplinaryAI4HEALTH research in AI and computational biology for human health. Numerous scientific advances have emerged in recent years that are specific to the application of AI to human health. The goal of AI2Health is to leverage this momentum and develop AI methods and tools to make advances in essential problems in human health through three research areas in Computational Biology: (i) Systems and Integrative Biology, (ii) Structural and Functional Biology, and (iii) Metagenomics and Microbiome Biology.

Cluster Members

  • Lead PI: Todd Treangen (Computer Science, Bioengineering, Rice University)
  • Eric Chi (Statistics, Rice University)
  • Santiago Segarra (Electrical & Computer Engineering, Rice University)
  • Vicky Yao (Computer Science, Rice University)

Collaborators

  • Lydia Kavraki (Computer Science, Mechanical Engineering, Bioengineering, Electrical & Computer Engineering, Rice University)
  • Luay Nakhleh (William and Stephanie Sick Dean, George R. Brown School of Engineering; Computer, BioSciences, Rice University)
  • Fritz Sedlazeck (Computer Science, Rice University; Human Genome Sequencing Center, Baylor College of Medicine)

Selected Publications

AI2Health faculty highlighted in bold.

Research Areas

1) AI for Systems and Integrative Biology

Systems biology is one of the early adopters of incorporating computational advances and machine learning, with a heavy focus on Bayesian methods. However, as increasingly more specific, high throughput biological assays continue to grow (both in number and in measurement types), there are new methodological challenges:

Incorporating biologically-inspired intuition into AI model formulation is key to building generalizable methods. One example of such approaches is building biologically meaningful embedding spaces, an AI/ML technique representing high-dimensional data in lower-dimensional spaces while capturing complex nonlinear relationships and intrinsic structures in the original data. Instead of using problem-agnostic embeddings, AI2Health core members Yao and Segarra have developed biologically motivated embedding methods to enable joint modeling of protein interaction networks from different organisms.

Moreover, we observed that in biology, gene set comparisons are routine (e.g., doing gene set enrichment comparing annotated genes with a collection of new genes), yet even in research areas where embeddings are used routinely, such as natural language processing, most efforts to compare sets rely on simple averages. Noticing this gap led to the development of a new, effective general-purpose set comparison method for embeddings that shows promise for broader non-biological applications. These examples highlight our vision for leveraging biological insights to innovate AI methods, which can synergistically enhance fundamental research in AI.

2) AI for Structural and Functional Biology

AlphaFold has unveiled the enormous potential of AI applied to the problem of protein folding and structure prediction, making paradigm-shifting progress on a 50-year-old grand challenge in computational biology. Despite this major success, there are two main limitations.

Building on the success of AlphaFold in protein structure prediction, we are expanding our focus to develop novel machine learning techniques in Functional Biology, particularly for assigning functions to protein-coding genes. Highlighting this area, Segarra and Treangen have developed an ensemble machine learning method to predict microbial pathogenic functions, which we plan to enhance by incorporating protein structure data and improving the handling of poorly annotated genes.

The challenge in computational pathogen screening includes dealing with complex host interactions, virulence factor dynamics, and community-level dynamics. Cluster members Kavraki and Treangen are now exploring an LLM-inspired model that leverages protein sequences and structures to predict functions, specifically targeting virulence factors. This model integrates evolutionary features derived from the DistilProtBert language model with protein structures in a graph convolutional network, promising significant advancements in understanding and predicting protein functions that impact disease causation.

3) AI for Metagenomics and Microbiome Biology

The microbiome refers to the collection of microbes (bacteria, viruses, fungi) that occupy a specific ecological niche (human gut, skin, air filters, etc). Given the established relevance of the human microbiome to human health, there is a recent push towards applying ML to the human host microbiome (in particular, the gut microbiome) to learn signatures of microbiome health and disease states.

Our motivating example on repeat detection highlights the untapped potential of learning discriminative graph features through graph neural networks (GNNs). Unlike predefined features, GNNs generate these characteristics through trainable iterative computations, making them adaptive to specific data samples. This novel approach has shown success in fields such as wireless networks, material discovery, and molecular design, yet its application in metagenomics is still emerging. A primary challenge in genomic data analysis is that most of the data is unlabeled, particularly in distinguishing between repeat and non-repeat sequences.

Modern machine learning techniques seek to embrace this unlabeled data through self-supervised learning, which starts with initially noisy labels and is refined through subsequent machine learning iterations and fine-tuning. Recently, we presented the first use of graph-based self-supervised learning for repeat detection in metagenomics. This serves as an illustration of the potential benefits that can be unlocked by further exploring this avenue.

Long-Term Goal

The AI2Health research cluster aims to tackle pressing health issues of our time. Toward this goal, AI2Health will leverage the research expertise of its core and affiliate members and collaborations with clinicians and scientists at the Texas Medical Center. Our research cluster will focus on transformative Health-outcome-inspired AI research in predicting, diagnosing, and treating health issues. Examples of specific goals include (i) improved cancer screening for early cancer detection and treatment, (ii) early warning systems for pathogen outbreak tracking and mitigation, and (iii) improved vaccine and drug design.

Updates will be posted to the following web page, managed by the cluster: https://treangenlab.github.io/ai2health/

VISIT


The Rice Ken Kennedy Institute is located on the campus of Rice University inside Duncan Hall. Click the map below for directions.


Rice Map

CONTACT


Rice University
Ken Kennedy Institute
6100 Main Street, MS-39
Houston, Texas 77005

CONNECT


Phone: 713-348-5823
Email: kenkennedy@rice.edu


Subscribe to our Newsletter.

Medium