Information theory for data-driven model reduction in biology
Our work addresses a central challenge in biology: how to find minimal mathematical models that capture the essential dynamics of a system. This is especially difficult for complex non-equilibrium or living systems where we don’t have much physical intuition to guide us. Here, we present a data-driven pipeline that is grounded on a mathematical mapping between model reduction and optimal signal compression.
This information-theoretic formulation allows us to construct neural networks that identify the relevant variables based only on their information content. For example, in the figure on the left, our networks automatically extract the dominant slow collective variables from uncurated videos of atmospheric flows and discover an emergent synchronization order parameter from experimental videos of cyanobacteria colonies.
We stress that the key feature of these neural networks is that they are interpretable by construction. By unveiling a theoretical link between optimal compression and the operator-theoretic formalism of dynamical systems, we can (i) show analytically that the learned latent variables are given by eigenfunctions of the transfer operator and (ii) systematically answer the question of when to stop increasing the complexity of a minimal model.
NSF Award NSF DMS-2235451
Matthew S. Schmitt, Maciej Koch-Janusz, Michel Fruchart, Daniel S. Seara, Michael Rust, Vincenzo Vitelli, arXiv:2312.06608, submitted to PNAS.