Sparse model identification enables the discovery of nonlinear dynamical systems purely from data; however, this approach is sensitive to noise, especially in the low-data limit. In this work, we propose the statistical approach of bootstrap aggregating (bagging) to robustify the sparse identification of nonlinear dynamics (SINDy) algorithm. An ensemble of SINDy models are identified from subsets of limited and noisy data, and their aggregate statistics are used to extract more information about the underlying model. This ensemble-SINDy (E-SINDy) algorithm produces inclusion probabilities of the candidate functions, which enables uncertainty quantification and probabilistic forecasts. We apply E-SINDy to several synthetic and real-world data sets and demonstrate substantial improvements to the accuracy and robustness of model discovery from extremely limited and noisy data. For example, E-SINDy uncovers partial differential equation models from data with more than twice as much measurement noise as has been previously reported. Similarly, E-SINDy learns the Lotka Volterra dynamics from remarkably limited and noisy data of the yearly lynx and hare pelts collected from 1900 to 1920. Further, E-SINDy is computationally efficient, with similar scaling as standard SINDy. We also show that the ensemble statistics from E-SINDy can be exploited for active learning and improved model predictive control.
Figure: Schematic of the E-SINDy framework. E-SINDy exploits the statistical method of bootstrap aggregating (bagging) to identify ordinary and partial differential equations that govern the dynamics of observed noisy data. First, sparse regression is performed on bootstraps of the measured data to identify an ensemble of SINDy models. Mean or median of the coefficients are then computed, coefficients with low inclusion probabilities are thresholded, and the E-SINDy model is aggregated and used for probabilistic forecasting, active learning, or control.