State-Space Models: Learning the Kalman Filter

State-Space Models: Learning the Kalman Filter

Translating Equations

Recently, Mamba has provoked a lot of excited discussion about potentially supplanting Transformer-based architectures) for large language models. Mamba is merely one of a large class of state-space models, which have demonstrated utility in modeling not only language but also gene expression, neural activity, animal movement, and macroeconomic and other time series.

While I’ve read plenty of articles explaining Mamba’s architecture, I wanted to dive deeper into the basics of state-space models, which inevitably led me to Kalman filters. However, I found various sources confusing because they used different notation in specifying the filter’s equations. The control research literature uses one set of variable names, while the statistical time series literature uses another set. To reconcile these for my own understanding, I created a table of correspondences. My two main sources were:

First, the Kalman filter equations as they appear in Labbe (2020) Chapter 6:

$$ \begin{aligned} \mathbf{\bar x} &= \mathbf{F x} + \mathbf{B u} \\ \mathbf{\bar P} &= \mathbf{FP{F}}^\mathsf T + \mathbf Q \\ \\ \textbf{S} &= \mathbf{H\bar PH}^\mathsf T + \mathbf R \\ \mathbf K &= \mathbf{\bar PH}^\mathsf T \mathbf{S}^{-1} \\ \textbf{y} &= \mathbf z – \mathbf{H \bar x} \\ \mathbf x &=\mathbf{\bar x} +\mathbf{K\textbf{y}} \\ \mathbf P &= (\mathbf{I}-\mathbf{KH})\mathbf{\bar P} \end{aligned} $$

Second, the equations from Durbin and Koopman (2012), (4.24) on page 85, but in the same order as above:

$$ a_{t+1} = T_t a_t + K_t v_t $$ $$ P_{t+1} = T_t P_t (T_t – K_t Z_t)’ + R_t Q_t R_t’ $$ \ $$ F_t = Z_t P_t Z_t’ + H_t $$ $$ K_t = T_t P_t Z_t’ F_t^{-1} $$ $$ v_t = y_t – Z_t a_t $$ $$ a_{t|t} = a_t + P_t Z_t’ F_t^{-1} v_t $$ $$ P_{t|t} = P_t – P_t Z_t’ F_t^{-1} Z_t P_t $$

Now, the table specifying how the variables correspond:

Labbe (2020)Durbin and Koopman (2012)Terminology
xatstate estimate
x¯at+1predicted state at next time step
FTtprocess model / state transition matrix
Bcontrol input model / control function
ucontrol input
PPtstate covariance estimate
P¯Pt+1predicted state covariance at next time step
Rtselection matrix
QQtprocess noise / state disturbance covariance matrix
SFtsystem uncertainty / innovation covariance
HZtmeasurement function / design matrix
RHtmeasurement noise / observation disturbance covariance matrix
KKtKalman gain / scaling factor
zytmeasurement / observation / data point
yvtresidual between predicted state and measurement
xat|tupdated state estimate
PPt|tupdated state covariance estimate

Testing Implementation Code

In addition to the table above, I coded the Kalman filter in both sets of notation and tested both on several examples of input data, parameters, and results from various sources to verify that the implementation was correct. The two versions in Python/NumPy are below:

No description has been provided for this image

Go here for the full code.

The tested examples came from:

The Durbin and Koopman (2012) example using the classic Nile River data set didn’t provide exact numbers for its results, but the results do appear in plots in Figure 2.1 on page 16. The plots that I obtained appear to match their plots very closely.

First, the data points and the filtered state shown in Figure 2.1 (i):

No description has been provided for this image

Second, the filter variance shown in Figure 2.1 (ii):

No description has been provided for this image

Finally, the residuals shown in Figure 2.1 (iii):

No description has been provided for this image