Plays and Poems: Data Preview (cat, cut, head and csvlook)

In this project, we utilise a text corpus containing plays and poems from the Shakespeare-era (16th and 17th centuries) and find which are the words most frequently used by some of the known authors (.e.g., Shakespeare) of that time!

You may wonder to know that, so far there is no comprehensive collection of electronic texts of these works in the public domain, rather a portion of the plays and poems are held in an machine readable archive at the Centre for Literary and Linguistic Computing at The University of Newcastle.

This has been assembled over many years by editing versions available from commercial online collections or from other sources such as keyboarding from early printed versions. Later, by developing and using a software tool called Intelligent Archive (IA) by Craig and Whipp identified in total a set of approximately 66,907 unique words in the 256 texts. IA calculated the frequency of each of the aforementioned 66,907 words in each work and stored the final outcome in the form of a 66,907×256 matrix, which is downloadable from the link below:

http://educative.io/udata/k8mQvDgxL7v/Shakespeare.zip

Get hands-on with 1200+ tech skills courses.