• AI agents in biomedical data science

    At Stanford, my colleague Prof. Le Cong and his lab created CRISPR-GPT, a large language model agent to help scientists design CRISPR experiments, see paper here. The AI agent helps the researcher plan reagents and design protocols, troubleshoot experiments by integrating data from various databases and the literature.

    CRISPR-GPT AI agent helps design CRISPR experiments.

    The age of agentic AI – a version of artificial intelligence that can make decisions, act autonomously, and learn from interactions – is here.

    To study chronic diseases AI agents can be quite helpful. There are dozens of databases available online including the PGS catalog, the GWAS catalog, Open Targets, Global Biobank Engine, AllofUS with data sitting and waiting to be integrated. From 2020-2025 the technology was restricted to static databases. How about if we had agentic AIs analyzing these databases during downtime? With Prof. Le Cong the lab will be working on these ideas – we will begin with Genetics-GPT.

  • Sonification of the human genome

    We applied a Hidden Markov Model to the 1 million exomes from the Regeneron Genetics Center and the 250 thousand exomes from the United States AllofUS project in order to identify regions of the genome that were likely to be constrained and to assign a probability of constraint to each position in the genome.

    Dr. Scott Oshiro, a postdoc in the laboratory, with background in music theory and quantum computing decided to sonify constraint in the genome across these two cohorts.

    Sonification is the mapping or conversion of data to audio and/or musical elements. For example, a collection of sensor data can be mapped to represent the frequency or pitch of a musical pattern, or its rhythm. This audio file is the sonification of 64 sequences of the human genome. For each genetic sequence there is a value between 0 and 1 representing the probability of a mutation for that particular sequence. These values are mapped to specific pitches for a particular musical scale.

    A lot of information is compressed in wave data and I’m excited about the next iteration of our project.

  • Efficiency and power in clinical trials using biomarkers and genetic risk scores

    The incorporation of biomarkers into clinical trials has the potential to improve trial efficiency, reduce required sample sizes, and increase statistical power to detect treatment effects. By identifying and measuring reliable biomarkers early in the trial, researchers can refine patient selection, stratify risk, and potentially replace or supplement traditional endpoints with more responsive or biologically relevant measures. Despite these advantages, questions remain regarding the appropriate study design, analysis plan, and methodological strategies to ensure that incorporating biomarkers truly enhances the power of a trial rather than inflating Type I error or introducing bias.

    Above, I show sample power curves for clinical trials informed by biomarkers where they are prognostic and predictive in nature.

    Biomarkers can also be used to increase the event rate of clinical trials. Genetic risk scores can be used as a very powerful biomarker for clinical trials.