tidy text mining --- sentiment and most frequent words (MFW) analyses of Star Trek DS9 first episode, "The Nagus"
A few months ago, the Revolution Analytics newsletter directed me to the 'tidy data' approach to text mining by Julia Silge and David Robinson. I began trying out their tidytext() R package on The Federalist papers, attempting to sort of replicate an analysis similar to Mosteller and Wallace (1964), and secondly inaugural addresses by U.S. presidents. The Federalist analysis ended up morphing into an application of Burrow's 'rolling delta' and the use of a different R text analytics package. More on that in a subsequent post.
Silge and Robinson's text mining examples include a lexicon-based sentiment analysis of Jane Austen's novels. On example included the net positive versus negative change in sentiment over the progression of each. So while mulling over what to do next with tidy text mining, I was re-watching the pilot episode of my favorite Star Trek series, Deep Space Nine. I wondered to what extent the dialogue spoken by characters in screenplays such as DS9 would contain sentiment-laden terms, and whether the course of sentiment throughout the progression of an episode would follow a pattern, perhaps following the contour of the DS9 crew and residents confronting and resolving conflict. Of course, the screenplay for the first episode of the series --- "The Nagus", written by Ira Steven Behr and directed by David Livingston --- might be somewhat of an outlier. The episode mostly features Quark, the Ferengi owner of a bar on DS9, and doesn't feature the types of conflict with alien species the way other episodes did. Still, I decided to process the screenplay for the first episode, which aired on January 7, 1993, converting it to a tidy text format, filtering stop (or function words), creating a word cloud, and visualizing episode sentiment. Below are some highlights.
There are various websites with DS9 scripts. Seeing each screenplay is somewhat interesting in itself; each screenplay begins with writing and production credits, tables of characters, sets, and a "pronunciation guide". The episode occurs over six parts, beginning with a "Teaser" (the opening scenes, prior to the first commercial break), through Acts One to Five.
Excerpt: Star Trek DS9 Screenplay
STAR TREK: DEEP SPACE NINE
"The Nagus"
(fka "Friends & Foes")
#40511-411
Story by
David Livingtson
Teleplay by
Ira Steven Behr
Directed by
David Livingston
[cast, sets, and pronunciation guide excluded.]
FADE IN:
1 EXT. SPACE - DEEP SPACE NINE (OPTICAL)
2 INT. AIRLOCK CORRIDOR - MORNING
The door opens, out steps a Ferengi, KRAX. Tough, authoritative, arrogant. He checks the corridor. Satisfied that no danger lurks, he gestures back into the airlock. Out steps ZEK, an ancient hunched-over Ferengi, features mysteriously obscured by a hooded cloak. In one hand he clutches the supporting arm of Maihar'du, a tall bald humanoid alien. In his other hand, he carries a staff, the handle of which is a smiling Ferengi head made of gold-press latinum. Together Krax and Maihar'du help maneuver the old fellow slowly down the corridor.
3 INT. SISKO'S QUARTERS - JAKE'S ROOM
JAKE is hurrying to get ready for school. He's sitting on his bed, putting on his shoes, when SISKO ENTERS from the living room grinning with anticipation.
SISKO
Hey Jake, I've got a terrific surprise for you.
JAKE
(smiling) Oh yeah, what is it?
SISKO
The two of us are going to Bajor for the start of the Gratitude Festival.
JAKE
What's the Gratitude Festival?
"The Nagus"
(fka "Friends & Foes")
#40511-411
Story by
David Livingtson
Teleplay by
Ira Steven Behr
Directed by
David Livingston
[cast, sets, and pronunciation guide excluded.]
FADE IN:
1 EXT. SPACE - DEEP SPACE NINE (OPTICAL)
2 INT. AIRLOCK CORRIDOR - MORNING
The door opens, out steps a Ferengi, KRAX. Tough, authoritative, arrogant. He checks the corridor. Satisfied that no danger lurks, he gestures back into the airlock. Out steps ZEK, an ancient hunched-over Ferengi, features mysteriously obscured by a hooded cloak. In one hand he clutches the supporting arm of Maihar'du, a tall bald humanoid alien. In his other hand, he carries a staff, the handle of which is a smiling Ferengi head made of gold-press latinum. Together Krax and Maihar'du help maneuver the old fellow slowly down the corridor.
3 INT. SISKO'S QUARTERS - JAKE'S ROOM
JAKE is hurrying to get ready for school. He's sitting on his bed, putting on his shoes, when SISKO ENTERS from the living room grinning with anticipation.
SISKO
Hey Jake, I've got a terrific surprise for you.
JAKE
(smiling) Oh yeah, what is it?
SISKO
The two of us are going to Bajor for the start of the Gratitude Festival.
JAKE
What's the Gratitude Festival?
Pre-processing the screenplay
Prior to preparing a 'tidy' version of the screenplay, tokenized on words, I cleaned it up a bit, removing page headers (such as "DEEP SPACE: "The Nagus" - REV. 1/07/93 - ACT ONE 8." and replacing the first header that identifies a section with a description of just the section, such as "ACT ONE". Numbers that appear to identify a scene within each Act, such as "11 CONTINUED:", were deleted as well.
Tidying the screenplay and MFWs
After converting the screenplay to a dataframe tibble and tokenizing on words, I followed the procedures in section 1.3 of Text Mining with R to produce a bar graph of the most frequent words, filtered for n > 30. One difference from Silge and Robinson's bar plot is the color palette. I used the Wes Anderson color palette package to select 11 colors based on set designs from my personal favorite Anderson film, Rushmore.
Quark appears almost twice as frequently as other characters. The screenplay text included both cast identifiers and scene directions, such as Quark speaking to Rom: "QUARK (exploding) You worthless, tiny eared fool! Don't you know the First Rule of Acquisition?" With stop words removed a MFW bar graph appears to be a fairly good proxy for the appearances of characters, although of course the 11th most frequent term is "Ferengi". (In the figure, the words are in lower case due to text processing.)
The next visualization is a bar graph of sentiment based on Saif Mohammad's NRC lexicon implemented in the Syuzhet package. (The color scheme is admittedly wrong and should at least be re-arranged, but is from Moonrise Kingdom, a close second favorite Anderson film.) Counting the frequencies of sentiment scored words across the screenplay reveals a tendency toward more positive than negative terms overall. Given that the first episode sets up plot developments in subsequent episodes, it would seem to make sense that the most common positive sentiment is anticipation.
Lastly, I have an analysis of sentiment -- net positive and negative --- over each Act of the first episode. This bar graph is modeled after the Tidy Text Mining comparison of Jane Austen's novels across 80 lines of text. I broke up the comparison by Acts, starting with the preliminary material in the screenplay, to Act Zero (the "Teaser") through Act Five.
It's somewhat intriguing that the positive sentiment declines through the middle acts of the episode, then in Act Five ends on a positive sentiment higher than the opening Act.
Later, I'll post the R code, along with an analysis of the entire series. I'm still in the process of automating the R code to do so. Once I get the visualizations prepared, I'll post the code and analyses of changes over time in sentiment.
Thanks for sharing such an informative Article. I really Enjoyed. It was great reading this article. Keep posting more articles on
ReplyDeleteBig Data Solutions
Advanced Data Analytics Services
Data Modernization Solutions
AI & ML Service Provider