Topic Modeling
We want to use topic modeling to learn about what topics the comic strip Peanuts talked about or touched upon during its 50-year run. To do this, we can train a Latent Dirichlet Allocation (LDA) model on documents from Wikipedia. That will define the topics. We can then use the model to figure out which topics were associated with which Peanuts comic strips.
The first step is to create the Wikipedia corpus.
The next step is to train the LDA model on the corpus, though it helps to do some pre-testing first.
Then we use the model to classify the comic strips.
Finally, we can plot and assess the results.
Below are the resulting plots that we discussed on the main page.
Here’s the topics timeline:
Here are the topics associated with each character: