State-Space Models: Learning the Kalman Filter
Translating Equations
Recently, Mamba has provoked a lot of excited discussion about potentially supplanting Transformer-based architectures) for large language models. Mamba is merely one of a large class of state-space models, which have demonstrated utility in modeling not only language but also gene expression, neural activity, animal movement, and macroeconomic and other time series.
While I’ve read plenty of articles explaining Mamba’s architecture, I wanted to dive deeper into the basics of state-space models, which inevitably led me to Kalman filters. However, I found various sources confusing because they used different notation in specifying the filter’s equations. The control research literature uses one set of variable names, while the statistical time series literature uses another set. To reconcile these for my own understanding, I created a table of correspondences. My two main sources were:
Kalman and Bayesian Filters in Python, 2020
Roger R Labbe Jr
Time Series Analysis by State Space Methods, 2nd Edition, 2012
J. Durbin, S.J. Koopman
First, the Kalman filter equations as they appear in Labbe (2020) Chapter 6:
$$ \begin{aligned} \mathbf{\bar x} &= \mathbf{F x} + \mathbf{B u} \\ \mathbf{\bar P} &= \mathbf{FP{F}}^\mathsf T + \mathbf Q \\ \\ \textbf{S} &= \mathbf{H\bar PH}^\mathsf T + \mathbf R \\ \mathbf K &= \mathbf{\bar PH}^\mathsf T \mathbf{S}^{-1} \\ \textbf{y} &= \mathbf z – \mathbf{H \bar x} \\ \mathbf x &=\mathbf{\bar x} +\mathbf{K\textbf{y}} \\ \mathbf P &= (\mathbf{I}-\mathbf{KH})\mathbf{\bar P} \end{aligned} $$
Second, the equations from Durbin and Koopman (2012), (4.24) on page 85, but in the same order as above:
$$ a_{t+1} = T_t a_t + K_t v_t $$ $$ P_{t+1} = T_t P_t (T_t – K_t Z_t)’ + R_t Q_t R_t’ $$ \ $$ F_t = Z_t P_t Z_t’ + H_t $$ $$ K_t = T_t P_t Z_t’ F_t^{-1} $$ $$ v_t = y_t – Z_t a_t $$ $$ a_{t|t} = a_t + P_t Z_t’ F_t^{-1} v_t $$ $$ P_{t|t} = P_t – P_t Z_t’ F_t^{-1} Z_t P_t $$
Now, the table specifying how the variables correspond:
Labbe (2020) | Durbin and Koopman (2012) | Terminology |
---|---|---|
state estimate | ||
predicted state at next time step | ||
process model / state transition matrix | ||
control input model / control function | ||
control input | ||
state covariance estimate | ||
predicted state covariance at next time step | ||
selection matrix | ||
process noise / state disturbance covariance matrix | ||
system uncertainty / innovation covariance | ||
measurement function / design matrix | ||
measurement noise / observation disturbance covariance matrix | ||
Kalman gain / scaling factor | ||
measurement / observation / data point | ||
residual between predicted state and measurement | ||
updated state estimate | ||
updated state covariance estimate |
Testing Implementation Code
In addition to the table above, I coded the Kalman filter in both sets of notation and tested both on several examples of input data, parameters, and results from various sources to verify that the implementation was correct. The two versions in Python/NumPy are below:
Go here for the full code.
The tested examples came from:
Kalman Filter For Dummies, 2009
Bilgin Esme
Kalman and Bayesian Filters in Python, 2020
Roger R Labbe Jr
Time Series Analysis by State Space Methods, 2nd Edition, 2012
J. Durbin, S.J. Koopman
The Durbin and Koopman (2012) example using the classic Nile River data set didn’t provide exact numbers for its results, but the results do appear in plots in Figure 2.1 on page 16. The plots that I obtained appear to match their plots very closely.
First, the data points and the filtered state shown in Figure 2.1 (i):
Second, the filter variance shown in Figure 2.1 (ii):
Finally, the residuals shown in Figure 2.1 (iii):