Preparing the Data

Preparing the Data

The data were derived from text descriptions of the Peanuts comic strips.

The first step was to generate a list of the original publication dates of the Peanuts comics, so that those dates could be used to identify the comic strips. Here is the code to do that.

The next step was to extract the descriptions from their webpages. Here is the code. Once this step was finished, this is how the data looked, except that the descriptions of and dates for Peanuts have been replaced by data that I made up and the ‘BR’ HTML tags are enclosed by parentheses:

pagename text
2015-03-22 Moofles does something or other. He says,"Hi!"(BR)(BR) Miggles does something else.(BR)(BR) Moofles looks confused.(BR)(BR) Miggles says,"Hi!" She smiles.(BR)(BR)
2015-03-23 Moofles does something or other. He exclaims, "Howdy!"(BR) (BR) He then says, "Oh, well!"(BR) (BR) Miggles says, "What?" Miggles replies, "Maybe?"(BR) (BR) Miggles then says,"Okay."(BR) (BR)
2015-03-24 Miggles walks. Moofles says, "Okay, I’ll look."(BR)(BR) Moofles walks.(BR)(BR) Miggles walks some more.(BR)(BR) Miggles says, "Yep!"(BR)(BR)
2015-03-25 Miggles says, "Bravo!"(BR)(BR) Moofles says, "Yay!"(BR)(BR) Miggles then says, "Meow."(BR)(BR) Moofles says, "That’s right."(BR)(BR)

Next, the descriptions needed to be cleaned by removing repetitions, splitting them by HTML tags that demarcated the panels of each comic strip, and removing extra leading and trailing punctuation and whitespace. The number of panels in each strips was also counted. Here is the code. The number of panels and the cleaned descriptions were added to the data as new columns. This is how those columns looked:

num_panels text_by_panels
4 [‘Moofles does something or other. He says,"Hi!"’, ‘Miggles does something else.’, ‘Moofles looks confused.’, ‘Miggles says,"Hi!" She smiles.’]
4 [‘Moofles does something or other. He exclaims, "Howdy!"’, ‘He then says, "Oh, well!"’, ‘Miggles says, "What?" Miggles replies, "Maybe?"’, ‘Miggles then says,"Okay."’]
4 [‘Miggles walks. Moofles says, "Okay, I’ll look."’, ‘Moofles walks.’, ‘Miggles walks some more.’, ‘Miggles says, "Yep!"’]
4 [‘Miggles says, "Bravo!"’, ‘Moofles says, "Yay!"’, ‘Miggles then says, "Meow."’, ‘Moofles says, "That’s right."’]

Next, the descriptions were spell-corrected. Most misspellings were handled by a standard English dictionary, but some had to be customized (e.g., characters’ names). Here is the code. The spell-corrected descriptions were added as a new column to the data. Here’s how it looked:

text_spell_corrected
[‘moofles does something or other. he says,"hi!"’, ‘miggles does something else.’, ‘moofles looks confused.’, ‘miggles says,"hi!" she smiles.’]
[‘moofles does something or other. he exclaims, "howdy!"’, ‘he then says, "oh, well!"’, ‘miggles says, "what?" miggles replies, "maybe?"’, ‘miggles then says,"okay."’]
[‘miggles walks. moofles says, "okay, i’ll look."’, ‘moofles walks.’, ‘miggles walks some more.’, ‘miggles says, "yep!"’]
[‘miggles says, "bravo!"’, ‘moofles says, "ya!"’, ‘miggles then says, "meow."’, ‘moofles says, "that’s right."’]

The next step was to distinguish the characters’ speech (and thoughts) from non-speech. The speech was demarcated by enclosure of quote marks. Sometimes there was an odd number of quote marks, which prevented reliable identification of speech/thought. The numbers of panels in each comic that had no quote marks or an odd number of quote marks was counted. A simple heuristic was also used to identify which character spoke (or thought), but that method was not reliable enough for analysis. The two counts and the identified speakers were added as columns to the data table and looked like the made-up data below. Here is the code.

no_quotes_n odd_quotes_n comics_speakers
2 0 [[[‘moofles’, 41, 45]], [], [], [[‘miggles’, 13, 17]]]
0 0 [[[‘moofles’, 46, 53]], [[‘moofles’, 14, 24]], [[‘miggles’, 14, 20], [‘miggles’, 40, 47]], [[‘miggles’, 18, 24]]]
2 0 [[[‘moofles’, 29, 46]], [], [], [[‘miggles’, 14, 19]]]
0 0 [[[‘miggles’, 14, 21]], [[‘moofles’, 14, 18]], [[‘miggles’, 19, 25]], [[‘moofles’, 14, 28]]]

Finally, the data table with the comic descriptions was expanded, so that each row represented a panel instead of an entire comic. At the same time, new columns were added with the speech (and thought) and non-speech parts of the descriptions. Here is the code. Here’s what the entire table looked like:

pagename text num_panels text_by_panels text_spell_corrected quotes_n no_quotes odd_quotes comics_speakers text_nontalk text_talk
2015-03-22 "Moofles does something or other. He says,""Hi!""(BR)(BR) Miggles does something else.(BR)(BR) Moofles looks confused.(BR)(BR) Miggles says,""Hi!"" She smiles.(BR)(BR)" 4 "Moofles does something or other. He says,""Hi!""" "moofles does something or other. he says,""hi!""" 2 0 0 [[‘moofles’, 41, 45]] [‘moofles does something or other. he says,’] [‘hi!’]
2015-03-22 "Moofles does something or other. He says,""Hi!""(BR)(BR) Miggles does something else.(BR)(BR) Moofles looks confused.(BR)(BR) Miggles says,""Hi!"" She smiles.(BR)(BR)" 4 Miggles does something else. miggles does something else. 0 1 0 [] [[]] [[]]
2015-03-22 "Moofles does something or other. He says,""Hi!""(BR)(BR) Miggles does something else.(BR)(BR) Moofles looks confused.(BR)(BR) Miggles says,""Hi!"" She smiles.(BR)(BR)" 4 Moofles looks confused. moofles looks confused. 0 1 0 [] [[]] [[]]
2015-03-22 "Moofles does something or other. He says,""Hi!""(BR)(BR) Miggles does something else.(BR)(BR) Moofles looks confused.(BR)(BR) Miggles says,""Hi!"" She smiles.(BR)(BR)" 4 "Miggles says,""Hi!"" She smiles." "miggles says,""hi!"" she smiles." 2 0 0 [[‘miggles’, 13, 17]] [‘miggles says,’, ‘she smiles.’] [‘hi!’]
2015-03-23 "Moofles does something or other. He exclaims, ""Howdy!""(BR) (BR) He then says, ""Oh, well!""(BR) (BR) Miggles says, ""What?"" Miggles replies, ""Maybe?""(BR) (BR) Miggles then says,""Okay.""(BR) (BR)" 4 "Moofles does something or other. He exclaims, ""Howdy!""" "moofles does something or other. he exclaims, ""howdy!""" 2 0 0 [[‘moofles’, 46, 53]] [‘moofles does something or other. he exclaims,’] [‘howdy!’]
2015-03-23 "Moofles does something or other. He exclaims, ""Howdy!""(BR) (BR) He then says, ""Oh, well!""(BR) (BR) Miggles says, ""What?"" Miggles replies, ""Maybe?""(BR) (BR) Miggles then says,""Okay.""(BR) (BR)" 4 "He then says, ""Oh, well!""" "he then says, ""oh, well!""" 2 0 0 [[‘moofles’, 14, 24]] [‘he then says,’] [‘oh, well!’]
2015-03-23 "Moofles does something or other. He exclaims, ""Howdy!""(BR) (BR) He then says, ""Oh, well!""(BR) (BR) Miggles says, ""What?"" Miggles replies, ""Maybe?""(BR) (BR) Miggles then says,""Okay.""(BR) (BR)" 4 "Miggles says, ""What?"" Miggles replies, ""Maybe?""" "miggles says, ""what?"" miggles replies, ""maybe?""" 4 0 0 [[‘miggles’, 14, 20], [‘miggles’, 40, 47]] "[‘miggles says,’, ‘"" miggles replies,’]" [‘what?’, ‘maybe?’]
2015-03-23 "Moofles does something or other. He exclaims, ""Howdy!""(BR) (BR) He then says, ""Oh, well!""(BR) (BR) Miggles says, ""What?"" Miggles replies, ""Maybe?""(BR) (BR) Miggles then says,""Okay.""(BR) (BR)" 4 "Miggles then says,""Okay.""" "miggles then says,""okay.""" 2 0 0 [[‘miggles’, 18, 24]] [‘miggles then says,’] [‘okay.’]
2015-03-24 "Miggles walks. Moofles says, ""Okay, I’ll look.""(BR)(BR) Moofles walks.(BR)(BR) Miggles walks some more.(BR)(BR) Miggles says, ""Yep!""(BR)(BR)" 4 "Miggles walks. Moofles says, ""Okay, I’ll look.""" "miggles walks. moofles says, ""okay, i’ll look.""" 2 0 0 [[‘moofles’, 29, 46]] [‘miggles walks. moofles says,’] "[""okay, i’ll look.""]"
2015-03-24 "Miggles walks. Moofles says, ""Okay, I’ll look.""(BR)(BR) Moofles walks.(BR)(BR) Miggles walks some more.(BR)(BR) Miggles says, ""Yep!""(BR)(BR)" 4 Moofles walks. moofles walks. 0 1 0 [] [[]] [[]]
2015-03-24 "Miggles walks. Moofles says, ""Okay, I’ll look.""(BR)(BR) Moofles walks.(BR)(BR) Miggles walks some more.(BR)(BR) Miggles says, ""Yep!""(BR)(BR)" 4 Miggles walks some more. miggles walks some more. 0 1 0 [] [[]] [[]]
2015-03-24 "Miggles walks. Moofles says, ""Okay, I’ll look.""(BR)(BR) Moofles walks.(BR)(BR) Miggles walks some more.(BR)(BR) Miggles says, ""Yep!""(BR)(BR)" 4 "Miggles says, ""Yep!""" "miggles says, ""yep!""" 2 0 0 [[‘miggles’, 14, 19]] [‘miggles says,’] [‘yep!’]
2015-03-25 "Miggles says, ""Bravo!""(BR)(BR) Moofles says, ""Yay!""(BR)(BR) Miggles then says, ""Meow.""(BR)(BR) Moofles says, ""That’s right.""(BR)(BR)" 4 "Miggles says, ""Bravo!""" "miggles says, ""bravo!""" 2 0 0 [[‘miggles’, 14, 21]] [‘miggles says,’] [‘bravo!’]
2015-03-25 "Miggles says, ""Bravo!""(BR)(BR) Moofles says, ""Yay!""(BR)(BR) Miggles then says, ""Meow.""(BR)(BR) Moofles says, ""That’s right.""(BR)(BR)" 4 "Moofles says, ""Yay!""" "moofles says, ""ya!""" 2 0 0 [[‘moofles’, 14, 18]] [‘moofles says,’] [‘ya!’]
2015-03-25 "Miggles says, ""Bravo!""(BR)(BR) Moofles says, ""Yay!""(BR)(BR) Miggles then says, ""Meow.""(BR)(BR) Moofles says, ""That’s right.""(BR)(BR)" 4 "Miggles then says, ""Meow.""" "miggles then says, ""meow.""" 2 0 0 [[‘miggles’, 19, 25]] [‘miggles then says,’] [‘meow.’]
2015-03-25 "Miggles says, ""Bravo!""(BR)(BR) Moofles says, ""Yay!""(BR)(BR) Miggles then says, ""Meow.""(BR)(BR) Moofles says, ""That’s right.""(BR)(BR)" 4 "Moofles says, ""That’s right.""" "moofles says, ""that’s right.""" 2 0 0 [[‘moofles’, 14, 28]] [‘moofles says,’] "[""that’s right.""]"