LEARNING FROM DATA

Concrete functional analysis

The book Concrete Functional Analysis by Richard M. Dudley and Rimas Norvaiša presents some aspects of nonlinear analysis and their applications to probability.

I wanted to get an understanding of the book as it could have been taught to a high school student. Here, I found the following observations by examining via the lens of AI tools.

The idea behind the book is that there is a new measure or yardstick for variation referred to as wiggliness and defined as p-variation. It adds up the size of each little wiggle and raises it to the power of p. Bigger p pays little attention to tiny jitter and more to big moves. This gives a new family of wiggliness meters not just one.

The book demonstrates that with the right p, we can make sense of integrals and calculus style operations even for quite jumpy signals. This lead to a result called the Love-Young inequality, which tells you when an integral $\int f dg$ exists and how big it can be based on the p-variation of $f$ and the q-variation of $g$ with $\frac{1}{p} + \frac{1}{q} > 1$ .

For Brownian motion, a Brownian path has finite p-variation only when $p > 2$ , too wiggly for $p \leq 2$ .

If one function is “not too wiggly” in the p-variation sense and the other is “not too wiggly” in q-variation with $\frac{1}{p}+\frac{1}{q} > 1$ the integral $\intf dg$ exists and is controlled (Love–Young). This is like a cousin of Cauchy–Schwarz/Hölder, but tuned to rough signals. In stats terms: it tells you when you can safely integrate (or “accumulate”) a noisy curve against another without things blowing up.

The book also studies when composition of functions behave smoothly – like a data pipeline where you first transform your variable with $G$ , then apply $F$ .

The Love–Young “safe integration” region where $\frac{1}{p}+\frac{1}{q} > 1$ .Example: Brownian needs $p > 2$ ; pair it with something with, say, $q < 2$ so the inequality holds.

how the estimate $\sum |\Delta x|^p$ changes as we cut the interval into more and more pieces.

Brownian-like path

p=3 shrinks → for $p>2$ , the roughness is “tamed,” and p-variation is finite.

p=1.5 grows with refinement → too rough: p-variation “blows up” for $p\le 2$ .

p=2 hovers around a constant (the borderline).

October 7, 2025
Sonification of the human genome

We applied a Hidden Markov Model to the 1 million exomes from the Regeneron Genetics Center and the 250 thousand exomes from the United States AllofUS project in order to identify regions of the genome that were likely to be constrained and to assign a probability of constraint to each position in the genome.

Dr. Scott Oshiro, a postdoc in the laboratory, with background in music theory and quantum computing decided to sonify constraint in the genome across these two cohorts.

Sonification is the mapping or conversion of data to audio and/or musical elements. For example, a collection of sensor data can be mapped to represent the frequency or pitch of a musical pattern, or its rhythm. This audio file is the sonification of 64 sequences of the human genome. For each genetic sequence there is a value between 0 and 1 representing the probability of a mutation for that particular sequence. These values are mapped to specific pitches for a particular musical scale.

A lot of information is compressed in wave data and I’m excited about the next iteration of our project.

February 22, 2025
Efficiency and power in clinical trials using biomarkers and genetic risk scores

The incorporation of biomarkers into clinical trials has the potential to improve trial efficiency, reduce required sample sizes, and increase statistical power to detect treatment effects. By identifying and measuring reliable biomarkers early in the trial, researchers can refine patient selection, stratify risk, and potentially replace or supplement traditional endpoints with more responsive or biologically relevant measures. Despite these advantages, questions remain regarding the appropriate study design, analysis plan, and methodological strategies to ensure that incorporating biomarkers truly enhances the power of a trial rather than inflating Type I error or introducing bias.

Above, I show sample power curves for clinical trials informed by biomarkers where they are prognostic and predictive in nature.

Biomarkers can also be used to increase the event rate of clinical trials. Genetic risk scores can be used as a very powerful biomarker for clinical trials.

February 19, 2025