Updating Our Predictions with New Data: Visualizing 3 Choices

How do we handle more than 2 outcomes?

Using basketball as an example, we previously looked at how we can start with estimates of the probability of an outcome and then update those estimates as new information becomes available. Specifically, we could start the basketball season with our estimate of how many games our team would win, and as wins and losses pile up through the course of the season, we could update our estimates, which converge towards our final answer of how good our team is.

In that example, we used beta distributions to represent our estimates and our confidence in those estimates, and binomial data represented our two possible outcomes: win or loss. But what do we do when we have more than two possible outcomes?

Fortunately, the answer is easy: the beta-binomial model is a special case of the Dirichlet-multinomial model, which can handle three or more outcomes.

Dirichlet-Multinomial Model: Example Applications

For example, we could extend our outcomes for basketball games to include wins, losses, and ties. However, those three outcomes are not merely nominal data; ties could be situated between wins and losses, giving the data an ordinal character. While we certainly could use a Dirichlet-multinomial model in that case, we would be ignoring the ranks of the outcomes, which might be crucial information for our modeling.

Airline safety might be a useful example to consider. Each scheduled flight might

land safely,
not land safely, or
never take off in the first place.

We can start with our estimates of each outcome and how confident we are in those estimates by setting parameters of the Dirichlet distribution, just like we did for the beta distribution in the two-outcome situation. Then as we obtain data about the outcome of each scheduled flight, we can update our estimates by updating our Dirichlet parameters.

As another example, perhaps our city wants to prepare a budget for its program to spay and neuter stray animals. The city might start with estimates of cats and dogs that are

privately owned,
stray and already spayed or neutered, and
stray and not already spayed or neutered.

Using a Dirichlet-multinomial model lets the city department incorporate its (un)certainty about its estimates, which will let it prepare “best-case” and “worst-case” budgets. As city personnel count and classify the animals, they can update their estimates, increase their certainty in their estimates, and thus improve their budget estimates.

Dirichlet-Multinomial Model: Visualization

The Dirichlet-multinomial model can handle more than three outcomes or categories, but focusing on only three categories allows us to easily visualize the Dirichlet distribution in 2-dimensional plots. We won’t go through any examples step-by-step here. Suffice to say, they all work the same way as the basketball example, just with three outcomes/categories instead of two.

Below are some videos of animations of 3-category Dirichlet distributions that are updated with multinomial data. Viewing the animations can provide an intuition about how quickly (i.e., how much data is needed to) the distributions converge towards an accurate estimate and how quickly the model’s confidence increases as it receives new data. Further details about the the animation and the software that produces it are available at this repository.