Friday, December 09, 2016

Visualization: Classic Books in a Word Cloud #HavingFunWithR #23

In preparation for a new module on Data Visualization next semester, I have been briefly experimenting with some R code to visualize classic books in different ways. Last Monday, I took a look at books without letters and words. Another way to visualize the content of a book is to create a Word Cloud which shows us the most common words.

Word Clouds are fun to make, and they are a great way to compare books. For this experiment I have chosen the following books from the Project Gutenberg:
  • "Ulysses" by James Joyce
  • "A Tale of Two Cities" by Charles Dickens
  • "Pride and Prejudice" by Jane Austen
  • "The Last of the Mohicans" by James Fennimore-Cooper
Below are the Word Clouds showing the 100 most used words in each book (the larger the word in the cloud, the more often it is used):


A Tale of Two Cities.

Pride and Prejudice.

Last of the Mohicans.

One word stands out as very much in all four clouds: "said". As all are novels, dialogue needs this word to show what each character said. Interesting, the lead character's name does not appear as a top word in each cloud. "Elizabeth" (Bennet) is the most used word in Pride and Prejudice - she is very much the lead character in this book. "Bloom" (Leopold) stands out in Ulysses, but "Carton (Sydney) - the lead character in A Tale of Two Cities shows up as only a small word. For the Last of the Mohicans, the lead character "Hawkeye" is not dominant, Fennimore-Cooper constantly refers to this character as the "scout" throughout this book, so the character is essentially divided into two names. Major Heyward, a significant though lesser character in the book, shows up as a most used word. Interestingly, other characters such as Cora, Alice, and Uncas make the top 100 words, but the very last of the Mohicans himself, Chingachgook, does not make the list. The word "savage" appears more often.

No comments:

Post a Comment