Author: admin

  • Genetics of COVID-19

    Genetics could explain why some people get severe COVID-19. During the pandemic we researched the genetic predisposition to COVID-19 severity. In a matter of a few months we built a biobank at Stanford University. The majority of individuals were admixed individuals, i.e. of multiple ancestries. I’m really amazed as to how quickly we were able to put together a biobank to study a deadly pandemic. To read more about the Stanford Biobank for COVID-19 please see link here.

    With our international colleagues and collaborators we were able to put together results across biobanks and conduct an international meta-analysis of SARS-CoV-2 (virus) infection and COVID-19 (disease) severity.

    Location of the international COVID-19 biobanks.

    This resulted in over 40 loci associated SARS-CoV-2 infection and COVID-19 hospitalization. Paper summarizing findings is found in Nature.

  • Generating effective therapeutic hypotheses from human genetic data

    Protective genetic mutations are a guide from natural experiments of Nature to identify genetically validated therapeutic targets.

    For glaucoma, we discovered loss-of-function variants in ANGPTL7 that lower intraocular pressure and protect against glaucoma. Editor’s pick from Science journal.


    For Inflammatory Bowel Diseases, we discovered loss-of-function variants in CARD9 and RNF186 that protect against Crohn’s disease and ulcerative colitis.


    For cancers, we discovered a missense variant in WNT6 that protects against human cancers.

    For chronic kidney disease and its related biomarkers including creatinine and eGFR we discovered genetic variants that lowered Creatinine (improved eGFR) and protected against chronic kidney disease.

    These are genetically validated therapeutic hypotheses that will hopefully lead to effective therapies.

  • Efficient regression for population scale genome sequencing studies

    As we move from Common Variant Association Studies (CVAS) to Rare Variant Association Studies (RVAS) it has become increasingly obvious that the majority of our computational workload will be dedicated to analyzing rare variants.

    For simple illustration here is a side by side barplot showing the absolute number of common variants in a whole genome sequencing study of approximately 200,000 individuals (AllofUs dataset).

    If we were to visualize it in a barplot the difference is quite dramatic.

    I initially analyzed the body mass index phenotype in the AllofUs cohort using PLINK 2.0.

    PLINK 2.0 has computationally efficient algorithms. However, it was quite clear that the original algorithms required quite a lot of computational resources to run it across hundreds of thousands of individuals.

    To compare we conducted univariate regression across the exome for the body mass index (BMI) phenotype in the AllofUS cohort, which required 11.6 hours using 50 threads on a single machine.

    We realized that we could improve the efficiency of the study by taking advantage of the property that most of the variants that were being analyzed were rare.

    To compute the estimates of the regression coefficients we only needed access to the data from the rare variant carriers after residualizing the covariates (this computation only needs to be done once).

    We were able to reduce computation time down from 11.6 hours to 1.6 minutes using 50 threads in a single machine.

    We also found that we were able to maintain power and control type 1 error rates.

    Paper is found here on biorxiv and it is In Press at Bioinformatics.