Stefan Groha

DPhil

AI/ML Engineer at GSK.ai
s.groha@gmail.com

I am a AI/ML engineer at GSK. Before I was a post-doc in Alexander Gusev's group interested in the application of statistics, causal inference, and machine learning methods to problems in precision medicine, health care and genetics/genomics. I have expertise in developing and applying methods in survival-, multi-state- and time series analysis on electronic health records and genetic data. We want to understand biological mechanisms of cancer therapy and adverse events thereof as well as being able to predict these outcomes. I am leveraging my background in theoretical physics to find flexible statistical techniques that are well suited for the aforementioned problems.

Publications and Preprints

Germline variants associated with immunotherapy-related adverse events

Stefan Groha, Sarah Abou Alaiwi, Wenxin Xu, et al.

medrxiv.org/content/10.1101/2022.04.10.22273627v2

Immune checkpoint inhibitors (ICIs) have yielded remarkable responses in patients across multiple cancer types, but often lead to immune related adverse events (irAEs). Although a germline cause for irAEs has been hypothesized, no systematic genome wide association study (GWAS) has been performed and no individual variants associated with the overall likelihood of developing irAEs have yet been identified. We carried out a Genome-Wide Association Study (GWAS) of 1,751 patients on ICIs across 12 cancer types, with replication in an independent cohort of 196 patients and independent clinical trial data from 2275 patients. We investigated two irAE phenotypes: (i) high-grade (3-5) events defined through manual curation and (ii) all detectable events (including high-grade) defined through electronic health record (EHR) diagnosis followed by manual confirmation. We identified three genome-wide significant associations (p<5×10−8) in the discovery cohort associated with all-grade irAEs: rs16906115 near IL7 (combined p=1.6×10−11; hazard ratio (HR)=2.1), rs75824728 near IL22RA1 (combined p=6.6×10−9; HR=1.9), and rs113861051 on 4p15 (combined p=1.3×10−8, HR=2.0); with rs16906115 replicating in two independent studies. The association near IL7 colocalized with the gain of a novel cryptic exon for IL7, a critical regulator of lymphocyte homeostasis. Patients carrying the IL7 germline variant exhibited significantly increased lymphocyte stability after ICI initiation than non-carriers, and this stability was predictive of downstream irAEs and improved survival.

@article{groha2022germline,
  title={Germline variants associated with immunotherapy-related adverse events},
  author={Groha, Stefan and Abou Alaiwi, Sarah and Xu, Wenxin and Naranbhai, Vivek and Nassar, Amin H and Bakouny, Ziad and Adib, Elio and Nuzzo, Pier V and Schmidt, Andrew L and Labaki, Chris and others},
  journal={medRxiv},
  year={2022},
  publisher={Cold Spring Harbor Laboratory Press}
}

Topological Data Analysis of copy number alterations in cancer

Stefan Groha*, Caroline Weis*, Alexander Gusev, Bastian Rieck
* co-first author

arxiv.org/abs/2011.11070

Identifying subgroups and properties of cancer biopsy samples is a crucial step towards obtaining precise diagnoses and being able to perform personalized treatment of cancer patients. Recent data collections provide a comprehensive characterization of cancer cell data, including genetic data on copy number alterations (CNAs). We explore the potential to capture information contained in cancer genomic information using a novel topology-based approach that encodes each cancer sample as a persistence diagram of topological features, i.e., high-dimensional voids represented in the data. We find that this technique has the potential to extract meaningful low-dimensional representations in cancer somatic genetic data and demonstrate the viability of some applications on finding substructures in cancer data as well as comparing similarity of cancer types.

@Article{groha2020topological,
          author  = {Stefan Groha, Caroline Weis, Alexander Gusev, Bastian Rieck},
          title   = {Topological Data Analysis of copy number alterations in cancer},
          journal = {https://arxiv.org/abs/2011.11070},
          year    = {2020},
        }

Neural ODEs for Multi-state Survival Analysis

Stefan Groha*, Sebastian Schmon*, Alexander Gusev
* co-first author

arxiv.org/abs/2006.04893

Survival models are a popular tool for the analysis of time to event data with applications in medicine, engineering, economics and many more. Advances like the Cox proportional hazard model have enabled researchers to better describe hazard rates for the occurrence of single fatal events, but are limited by modeling assumptions, like proportionality of hazard rates and linear effects. Moreover, common phenomena are often better described through multiple states, for example, the progress of a disease might be modeled as healthy, sick and dead instead of healthy and dead, where the competing nature of death and disease has to be taken into account. Also, individual characteristics can vary significantly between observational units, like patients, resulting in idiosyncratic hazard rates and different disease trajectories. These considerations require flexible modeling assumptions. Current standard models, however, are often ill-suited for such an analysis. To overcome these issues, we propose the use of neural ordinary differential equations as a flexible and general method for estimating multi-state survival models by directly solving the Kolmogorov forward equations. To quantify the uncertainty in the resulting individual cause-specific hazard rates, we further introduce a variational latent variable model. We show that our model exhibits state-of-the-art performance on popular survival data sets and demonstrate its efficacy in a multi-state setting.

@Article{groha2020neural,
          author  = {Stefan Groha, Sebastian M Schmon and Alexander Gusev},
          title   = {Neural ODEs for Multi-state Survival Analysis},
          journal = {https://arxiv.org/abs/2006.04893},
          year    = {2020},
        }

Full counting statistics in the transverse field Ising chain

Stefan Groha, Fabian Essler, Pasquale Calabrese

SciPost Physics 4 (6), 2018

We consider the full probability distribution for the transverse magnetization of a finite subsystem in the transverse field Ising chain. We derive a determinant representation of the corresponding characteristic function for general Gaussian states. We consider applications to the full counting statistics in the ground state, finite temperature equilibrium states, non-equilibrium steady states and time evolution after global quantum quenches. We derive an analytical expression for the time and subsystem size dependence of the characteristic function at sufficiently late times after a quantum quench. This expression features an interesting multiple light-cone structure.

@article{groha2018full,
            title={Full counting statistics in the transverse field Ising chain},
            author={Groha, Stefan and Essler, Fabian and Calabrese, Pasquale},
            journal={SciPost Physics},
            volume={4},
            number={6},
            year={2018},
            publisher={SciPost}
                                      }

Full counting statistics in the spin-1/2 Heisenberg XXZ chain

Mario Collura*, Fabian HL Essler*, Stefan Groha*
* co-first author

Journal of Physics A: Mathematical and Theoretical 50 (41), 414002, 2017

The spin-1/2 Heisenberg chain exhibits a quantum critical regime characterized by quasi long-range magnetic order at zero temperature. We quantify the strength of quantum fluctuations in the ground state by determining the probability distributions of the components of the (staggered) subsystem magnetization. Some of these exhibit scaling and the corresponding universal scaling functions can be determined by free fermion methods and by exploiting a relation with the boundary sine-Gordon model.

@article{collura2017full,
            title={Full counting statistics in the spin-1/2 Heisenberg XXZ chain},
            author={Collura, Mario and Essler, Fabian HL and Groha, Stefan},
            journal={Journal of Physics A: Mathematical and Theoretical},
            volume={50},
            number={41},
            pages={414002},
            year={2017},
            publisher={IOP Publishing}
        }

Spinon decay in the spin-1/2 Heisenberg chain with weak next nearest neighbour exchange

Stefan Groha, Fabian HL Essler

Journal of Physics A: Mathematical and Theoretical 50 (334002), 2017

Integrable models support elementary excitations with infinite lifetimes. In the spin-1/2 Heisenberg chain these are known as spinons. We consider the stability of spinons when a weak integrability breaking perturbation is added to the Heisenberg chain in a magnetic field. We focus on the case where the perturbation is a next nearest neighbour exchange interaction. We calculate the spinon decay rate in leading order in perturbation theory using methods of integrability and identify the dominant decay channels. The decay rate is found to be small, which indicates that spinons remain well-defined excitations even though integrability is broken.

@article{groha2017spinon,
            title={Spinon decay in the spin-1/2 Heisenberg chain with weak next nearest neighbour exchange},
            author={Groha, Stefan and Essler, Fabian HL},
            journal={Journal of Physics A: Mathematical and Theoretical},
            volume={50},
            number={33},
            pages={334002},
            year={2017},
            publisher={IOP Publishing}
          }

Thermalization and light cones in a model with weak integrability breaking

Bruno Bertini*, Fabian HL Essler*, Stefan Groha*, Neil J Robinson*
* co-first author

Physical Review B 94 (24), 245117, 2016

We employ equation-of-motion techniques to study the nonequilibrium dynamics in a lattice model of weakly interacting spinless fermions. Our model provides a simple setting for analyzing the effects of weak integrability-breaking perturbations on the time evolution after a quantum quench. We establish the accuracy of the method by comparing results at short and intermediate times to time-dependent density matrix renormalization group computations. For sufficiently weak integrability-breaking interactions we always observe prethermalization plateaus, where local observables relax to nonthermal values at intermediate time scales. At later times a crossover towards thermal behavior sets in. We determine the associated time scale, which depends on the initial state, the band structure of the noninteracting theory, and the strength of the integrability-breaking perturbation. Our method allows us to analyze in some detail the spreading of correlations and in particular the structure of the associated light cones in our model. We find that the interior and exterior of the light cone are separated by an intermediate region, the temporal width of which appears to scale with a universal power law t^1/3.

@article{bertini2016thermalization,
                title={Thermalization and light cones in a model with weak integrability breaking},
                author={Bertini, Bruno and Essler, Fabian HL and Groha, Stefan and Robinson, Neil J},
                journal={Physical Review B},
                volume={94},
                number={24},
                pages={245117},
                year={2016},
                publisher={APS}
              }

Prethermalization and thermalization in models with weak integrability breaking

Bruno Bertini*, Fabian HL Essler*, Stefan Groha*, Neil J Robinson*
* co-first author

Physical review letters 115 (18), 180601, 2015

We study the effects of integrability-breaking perturbations on the nonequilibrium evolution of many-particle quantum systems. We focus on a class of spinless fermion models with weak interactions. We employ equation of motion techniques that can be viewed as generalizations of quantum Boltzmann equations. We benchmark our method against time-dependent density matrix renormalization group computations and find it to be very accurate as long as interactions are weak. For small integrability breaking, we observe robust prethermalization plateaux for local observables on all accessible time scales. Increasing the strength of the integrability-breaking term induces a “drift” away from the prethermalization plateaux towards thermal behavior. We identify a time scale characterizing this crossover.

@article{bertini2015prethermalization,
                title={Prethermalization and thermalization in models with weak integrability breaking},
                author={Bertini, Bruno and Essler, Fabian HL and Groha, Stefan and Robinson, Neil J},
                journal={Physical review letters},
                volume={115},
                number={18},
                pages={180601},
                year={2015},
                publisher={APS}
              }

Talks and Presentations

ML4H seminar series, Broad Institute of MIT and Harvard, 2021, invited talk
AAAI Symposium 2021: Survival prediction, oral presentation
Modeling & Simulation Forum, Genentech, 2021, invited talk
Machine Learning for Health (ML4H), NeurIPS workshop 2020, poster presentation.
Learning Meaningful Representations of Life, NeurIPS workshop 2020, poster presentation and oral.
American Society for Human Genetics Conference 2020, poster presentation.
European Society for Human Genetics Conference 2020, oral presentation.
Learning Meaningful Representations of Life, Neurips workshop 2019, poster presentation.
Harvard PQG, Quantitative Challenges in Cancer Immunology and Immunotherapy 2019.
American Society for Human Genetics Conference 2019, poster presentation.
Erwin Schrödinger International Institute, Vienna, 2018, invited talk.
Rudolf Peierls Centre for Theoretical Physics, Oxford, 2018, invited talk.
Brookhaven National Laboratory, 2017, invited talk.