Efficient spatio-temporal Gaussian regression via Kalman filtering

Abstract

We study the non-parametric reconstruction of spatio-temporal dynamical processes via Gaussian Processes (GPs) regression from sparse and noisy data. GPs have been mainly applied to spatial regression where they represent one of the most powerful estimation approaches also thanks to their universal representing properties. Their extension to dynamical processes has been instead elusive so far since classical implementations lead to unscalable algorithms or require some sort of approximation. We propose a novel procedure to address this problem by coupling GPs regression and Kalman filtering. In particular, assuming space/time separability of the covariance (kernel) of the process and rational time spectrum, we build a finite-dimensional discrete-time state-space process representation amenable to Kalman filtering. With sampling over a finite set of fixed spatial locations, our major finding is that the current Kalman filter state represents a sufficient statistic to compute the minimum variance estimate of the process at any future time over the entire spatial domain. In machine learning, a representer theorem states that an important class of infinite-dimensional variational problems admits a computable and finite-dimensional exact solution. In view of this, our result can be interpreted as a novel Dynamic Representer Theorem for GPs. We then extend the study to situations where the spatial input locations set varies over time. The proposed algorithms are tested on both synthetic and real field data, providing comparisons with standard GP and truncated GP regression techniques.

Introduction

The "Big-Data" and "Machine-Learning" era we are living in during the last few decades has been supplied by the exponential growth of research fields like statistics and optimization. Within these areas, function estimation problems play an important role and many different regression approaches have been developed in recent years. In particular, in the Bayesian estimation context, Gaussian Processes (GPs) methods (O'Hagan & Kingman, 1978), also known as kriging (Cressie, 1990), have become the standard approach (Cucker and Smale, 2001, Williams and Rasmussen, 2006) in application realms such as robotic networks, biomedicine, and system identification (Pillonetto et al., 2014, Schölkopf and Smola, 2001, Todescato et al., 2017).

In the classical GPs framework the process is assumed to be static so that only spatial locations are seen as input variables. However, due to the heavy computational requirements, characterized by a cubical growth in the number of input data, many efficient approaches have been developed. Some of these rely on the notion of pseudo input locations (Lázaro-Gredilla et al., 2010, Quiñonero-Candela and Rasmussen, 2005, Snelson and Ghahramani, 2006) or "truncated" observation (Gramacy & Apley, 2015). Others use matrix factorizations (Ambikasaran, Foreman-Mackey, Greengard, Hogg, & O'Neil, 2016) and approximations of the kernel function (Bach and Jordan, 2005, Kulis et al., 2006) through the Nyström method, greedy techniques (Smola and Schölkopf, 2000, Williams and Seeger, 2001, Zhang and Kwok, 2010) or tapering (Furrer, Genton, & Nychka, 2006). Another widely used approach relies on nearest-neighbors techniques as in Datta, Banerjee, Finley and Gelfand (2016). However, while the above mentioned approaches can easily deal with very general kernel functions, they either introduce some sort of approximation or need to compute a suitable set of meaningful measurement points in the neighborhood of the desired prediction point. This might turn out to be nontrivial, computationally demanding and potentially affecting the prediction performance.

Recent research has instead focused on the use of the classical methods in dynamical contexts. In fact, to capture many interesting time-varying phenomena, like wind and ocean currents and meteorology (Datta, Banerjee, Finley and Gelfand, 2016, Datta, Banerjee, Finley, Hamm et al., 2016, Handcock and Wallis, 1994, Miller and Cane, 1989, Wunsch, 1996), it is necessary to extend the methodology to the class of spatio-temporal processes. The simplest approach is to interpret time just as an additional input feature (Williams & Rasmussen, 2006) and then devise approximated numerically efficient solutions. However, this approach of treating time as an additional feature in the input space has important practical limitations mainly due to: (i) heavy memory and computational requirements; and (ii) the non iterative nature of the methodology. Indeed, the classical paradigm, being tailored for static processes, relies on batch implementations where data are processed at once, after they have been collected. Hence, in the dynamical context, approaches to cope with computational complexity are: sparse approximations as in Furrer et al., 2006, Oh et al., 2010, Williams and Rasmussen, 2006 and Gramacy and Apley (2015); finite memory implementations, often based on truncated observations, as in Xu, Choi, and Oh (2011) and Xu, Choi, Dass, and Maiti (2012); or the use of parametric models (Stroud, Müller, & Sansó, 2001).

This paper instead takes inspiration from a different idea which is related to the use of Kalman filter (Kalman, 1960). In this context, the main underlying idea is to build suitable state-space model for spatio-temporal Gaussian processes which are amenable to Kalman filtering. More specifically, there is large body that has shown that it is possible to find an equivalent representation of a Gaussian Process as a Stochastic Partial Differential Equation, for which many possible approximate solutions are available to solve estimation and prediction problems. Along this line of research, some preliminary works (De Nicolao and Ferrari Trecate, 2001, Hartikainen and Särkkä, 2010) offer a solution to the case of temporal only processes. Extensions to the case of spatio-temporal processes, for the class of kernel functions which are separable in space and time and stationary in time, are presented in multiple works (Cressie and Wikle, 2002, Hartikainen, 2013, Lindgren et al., 2011, Särkkä and Hartikainen, 2012, Särkkä et al., 2013) where the authors deal with infinite dimensional state-space models. Alternative state-space approaches include Stroud et al. (2001) where an infinite dimensional space is paired with a finite time-varying parametric model. Other authors proposed ad-hoc state-space strategies based on the use of Kalman filtering for sea level and ocean modeling and estimation. (Miller, 1986, Miller and Cane, 1989). Finally, in Hartikainen, Riihimäki, and Särkkä (2011) the authors mention the case of non-stationary time kernels.

However, although the previous works consider general non-separable space–time kernels, all the above results require the introduction of some sort of approximation when computing the filtering procedure in the presence of measured data. More specifically, in Särkkä et al. (2013), the authors must resort to some sort of eigenfunction expansion and approximation to compute the space–time estimator, in Lindgren et al. (2011) the Gaussian Process is approximated by a Gaussian Markov random field, and in Ho, Fieguth, and Willsky (1996) an approximated hierarchical multi-resolution approximation is adopted to speed up the computation.

Differently, in this paper, extending our preliminary work in the topic and presented in Carron, Todescato, Carli, Schenato, and Pillonetto (2016), we search for some reasonable assumptions on the space–time kernes that allows up to find numerically efficient yet exact solutions for the infinite-dimensional space–time GP regression. More specifically, our contributions rely on two assumptions. First, we assume that the kernel process (non-stationary in space and potentially even non-stationary in time as shown in Appendix D) is separable in space/time, with a rational temporal power spectral density. This is a strong assumption that might limit its use in classical diffusion processes, but it has been shown to be realistic in other fields such as biological imaging applied to FMRI (Noh and Solo, 2007, Noh and Solo, 2013) and in meteorology. Then, we also assume that measurements are collected only on a finite set of fixed spatial locations (this hypothesis will be however removed in the second part of the paper). Under these two assumptions, our novel contributions are twofold.

First: Our proposed estimation procedure is exact, i.e. it returns the exact minimum variance estimate on any spatio-temporal prediction location and can be computed recursively via a finite-dimensional time-varying Kalman Filter whose complexity scales with the cube of the number of distinct measurement locations and only linearly on the number of prediction locations. This result comes from a novel result which we refer to as Dynamic Representer Theorem. In the static scenario, classical representer theorems state that the optimizers of a wide class of variational problems admit a finite-dimensional representation (Kimeldorf and Wahba, 1970, Schölkopf et al., 2001). In particular, in the case of regularization networks the function estimate belongs to the subspace generated by the kernel sections centered on the observed input locations, with coefficients given by a linear output transformation (Evgeniou, Pontil, & Poggio, 2000). The Dynamic Representer theorem here derived shows that in a dynamic scenario the estimate at instant t k is still the combination of the spatial kernel sections centered on the spatial locations but with time-varying coefficients which now depend linearly on a finite dimensional Kalman filter state. Although in the following we will present the case of stationary time kernels (Assumption 1), of particular relevance is that both space and time kernels might be non-stationary (Appendix D).

Second: We address the situation where the sampling locations set is adaptive and changes over time. This set-up is interesting in many applications like aerial vehicle wind estimation and multi-robots exploration, where it is possible to keep in memory only a finite set of locations and measurements due to storage capacity limits. When memory limits are hit, old locations can be discarded (following a policy depending on the specific application) and the state-space can be accordingly modified. We design a new approach to perform such operations. In addition, we show that, after any change in the sampling set, if no other perturbations occur, our suboptimal estimate converges (with an exponential rate) to the optimal minimum variance estimate (obtainable only storing all the past measurements).

The remainder of the paper is as follows. Section 2 formulates the problem with the necessary assumptions. Section 3 proposes the solution to the estimation problem over the finite set of sampling locations. Section 4 extends the result to the prediction problem at any spatio-temporal location. Section 5 addresses the problem of time-varying sampling locations. Section 6 presents some compelling simulations. Finally, Section 7 concludes the paper. All the necessary preliminaries on GP regression, Kalman filtering and spectral factorization theory can be conveniently found in Carron et al. (2016) while all the technical proofs are collected in Appendix A Proof of, Appendix B Proof of.

Section snippets

Problem formulation

Here, we formally state the main problem at hand and we introduce the necessary assumptions. As clarified later, the main assumption restricts our analysis on a particular yet sufficiently rich class of kernel functions separable in space and time. We already stress that, while in Särkkä and Hartikainen (2012) and Särkkä et al. (2013) the authors consider the same class of kernels, since they deal with infinite dimensional state-space systems, they must resort to approximated approaches in

Kalman regression on I

Here we formally show how to built an exact state space representation for a certain class of GPs and we bridge GP regression and Kalman filtering over the observable finite-dimensional space I , providing a clear and systematic methodology to implement the filter.

To implement the Kalman filter (Kalman, 1960), the first step is to build a state-space representation for the Gaussian process f . In particular, we are interested in reconstructing f over the "observable" I . To compactly represent the

Kalman regression on X

In Section 3 we have shown how to build an estimate f ̂ of the process f over the observable finite-dimensional set I . Here, we are interested in extending the result of Proposition 4 to build the minimum variance estimate f ̂ ( x , t ) E f ( x , t ) | D of the process f on any desired spatio-temporal location ( x , t ) X × R + , t t k , being t k the last time instant where measurements have been collected, see Fig. 1 and D defined as in (5).

To state our result we first introduce the following additional symbols Γ x = C o v f (

Adaptive input location space I ( k )

In Proposition 4 we have assumed that the input locations always fall in a fixed set I . In other words, even if the measurements collected at t k might come from a time-varying subset M ( k ) , one has M ( k ) I k . In this section we will remove this constraint to allow measurements of f to be collected over an adaptive set of input locations, i.e., the location set becomes now a function of the time I ( k ) . See Fig. 5 for an illustrative representation of the time evolution of I ( k ) . This scenario is

Simulations

We test both Algorithm 1 and Proposition 6 on synthetic and real field data. Finally, we test the proposed adaptive strategy on synthetic data only. All the simulations are run in MATLAB® on a 2,7 GHz Intel Core i5 processor with 16 GB RAM.

Conclusions & future works

In this work we focused on building an efficient GP estimator for spatio-temporal dynamical Gaussian processes. The main idea was to couple Kalman-filtering and GP regression. In particular, assuming space/time separability of the covariance (kernel) of the process and rational time spectrum, we built a finite-dimensional discrete-time state-space process representation over a finite dimensional set of input locations. Our major finding is that the Kalman filter state at instant t k represents a

Marco Todescato received his Ph.D. degree in Information Engineering in 016 from the Department of Information Engineering (DEI), University of Padova, Italy. During his doctoral studies he has been a Visiting Researcher at the Max Planck Institute for Biological Cybernetics, Tübingen, Germany, at the Laboratory for Information and Decision Systems, MIT, at the Department of Electrical Engineering, UCLA, at the Center for Control, Dynamical Systems and Computation, UCSB, and at the Institut für

References (54)

  • et al.

    Multi-robots Gaussian estimation and coverage control: from server-based to distributed architecture

    Automatica

    (2017)

  • PillonettoG. et al.

    Kernel methods in system identification, machine learning and function estimation: a survey

    Automatica

    (2014)

  • NeveM. et al.

    Nonparametric identification of population models via Gaussian processes

    Automatica

    (2007)

  • AmbikasaranS. et al.

    Fast direct methods for Gaussian processes

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2016)

  • AndersonB.D.O. et al.

    Detectability and stabilizability of time-varying discrete-time linear systems

    SIAM Journal on Control and Optimization

    (1981)

  • BachF.R. et al.

    Predictive low-rank decomposition for kernel methods

  • CarronA. et al.

    Machine learning meets Kalman Filtering

  • CressieN.

    The origins of kriging

    Mathematical Geology

    (1990)

  • CressieN. et al.

    Space-time kalman filter

    Encyclopedia of Environmetrics

    (2002)

  • CuckerF. et al.

    On the mathematical foundations of learning

    Bulletin of the American Mathematical Society

    (2001)

  • DattaA. et al.

    Hierarchical nearest-neighbor Gaussian process models for large geostatistical datasets

    Journal of the American Statistical Association

    (2016)

  • DattaA. et al.

    Nonseparable dynamic nearest neighbor Gaussian process models for large spatio-temporal data with an application to particulate matter analysis

    The Annals of Applied Statistics

    (2016)

  • De NicolaoG. et al.

    Regularization networks: fast weight calculation via Kalman filtering

    IEEE Transactions on Neural Networks

    (2001)

  • EvgeniouT. et al.

    Regularization networks and support vector machines

    Advances in Computational Mathematics

    (2000)

  • FurrerR. et al.

    Covariance tapering for interpolation of large spatial datasets

    Journal of Computational and Graphical Statistics

    (2006)

  • GramacyR.B. et al.

    Local Gaussian process approximation for large computer experiments

    Journal of Computational and Graphical Statistics

    (2015)

  • HandcockM.S. et al.

    An approach to statistical spatial-temporal modeling of meteorological fields

    Journal of the American Statistical Association

    (1994)

  • HartikainenJ.

    Sequential inference for latent temporal Gaussian process models

    (2013)

  • HartikainenJ. et al.

    Sparse spatio-temporal Gaussian processes with general likelihoods

  • HartikainenJ. et al.

    Kalman filtering and smoothing solutions to temporal Gaussian process regression models

  • HoT.T. et al.

    Multiresolution stochastic models for the efficient solution of large-scale space-time estimation problems

  • JazwinskiA.H.

    Stochastic processes and filtering theory, Vol. 64

    (1970)

  • KalmanR.E.

    A new approach to linear filtering and prediction problems

    Journal of Basic Engineering

    (1960)

  • KimeldorfG. et al.

    A correspondence between Bayesian estimation on stochastic processes and smoothing by splines

    The Annals of Mathematical Statistics

    (1970)

  • KulisB. et al.

    Learning low-rank kernel matrices

  • Lázaro-GredillaM. et al.

    Sparse spectrum Gaussian process regression

    Journal of Machine Learning Research (JMLR)

    (2010)

  • LindgrenF. et al.

    An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach

    Journal of the Royal Statistical Society. Series B. Statistical Methodology

    (2011)

  • Cited by (6)

    Marco Todescato received his Ph.D. degree in Information Engineering in 016 from the Department of Information Engineering (DEI), University of Padova, Italy. During his doctoral studies he has been a Visiting Researcher at the Max Planck Institute for Biological Cybernetics, Tübingen, Germany, at the Laboratory for Information and Decision Systems, MIT, at the Department of Electrical Engineering, UCLA, at the Center for Control, Dynamical Systems and Computation, UCSB, and at the Institut für Automatik, ETH Zürich. After a Postdoctoral Fellowship with the control group at DEI, he is now a Research Scientist at the Bosch Center for Artificial Intelligence, Renningen, Germany. His research interests include distributed optimization, nonparametric estimation, multiagent robotics and reinforcement learning.

    Andrea Carron is a Senior Lecturer at the Institute of Dynamical Systems and Control at ETH Zurich, Switzerland. He received his bachelor, master, and Ph.D. in Control Engineering from the University of Padova, Italy. During his master and PhD, he spent three stays abroad as a visiting researcher: the first at the University of California Riverside, USA, the second at the Max Planck Institute in Tübingen, Germany, and the third at the University of California Santa Barbara, USA. In October 2016, he joined the Intelligent Control Systems Group of ETH Zurich and the NCCR Digital Fabrication. His research interests include safe learning-based control, multi-agent systems and robotics.

    Ruggero Carli received the Laurea Degree in Computer Engineering and the Ph.D. degree in Information Engineering from the University of Padova, Padova, Italy, in 2004 and 2007, respectively. From 2008 through 2010, he was a Post-doctoral Fellow with the Department of Mechanical Engineering, University of California at Santa Barbara. He is currently an Associate Professor with the Department of Information Engineering, University of Padova. His research interests include control theory and, in particular, control under communication constraints, cooperative control, and distributed estimation.

    Gianluigi Pillonetto was born on January 1, 1975 in Montebelluna (TV), Italy. He received the Doctoral degree in Computer Science Engineering cum laude from the University of Padova in 1998 and the Ph.D. degree in Bioengineering from the Polytechnic of Milan in 2002 . In 2000 and 2002 he was visiting scholar and visiting scientist, respectively, at the Applied Physics Laboratory, University of Washington, Seattle. From 2002 to 2005 he was Research Associate at the Department of Information Engineering, University of Padova, becoming an Assistant Professor in 2005. He is currently an Associate Professor of Control and Dynamic Systems at the Department of Information Engineering, University of Padova. His research interests are in the field of system identification, estimation and machine learning. On these topics, he has published around 70 papers on peer reviewed international journals. He currently serves as Associate Editor for Automatica and IEEE Transactions on Automatic Control. In 2003 he received the Paolo Durst award for the best Italian Ph.D. thesis in Bioengineering. He was the 2017 recipient of the Automatica Prize and Plenary Speaker at System Identification IFAC Symposium in 2018.

    Luca Schenato received the Dr. Eng. degree in electrical engineering from the University of Padova in 1999 and the Ph.D. degree in Electrical Engineering and Computer Sciences from the UC Berkeley, in 2003. He held a post-doctoral position in 2004 and a visiting professor position in 2013–2014 at U.C. Berkeley. Currently he is an Associate Professor at the Information Engineering Department at the University of Padova. His interests include networked control systems, multi-agent systems, wireless sensor networks, smart grids and cooperative robotics. Luca Schenato has been awarded the 2004 Researchers Mobility Fellowship by the Italian Ministry of Education, University and Research (MIUR), the 2006 Eli Jury Award in U.C. Berkeley and the EUCA European Control Award in 2014, and IEEE Fellow in 2017. He served as Associate Editor for IEEE Trans. on Automatic Control from 2010 to 2014 and he is currently Senior Editor for IEEE Trans. on Control of Network Systems and Associate Editor for Automatica.

    View full text