Nation, Genre & Gender: Data
The Nation, Genre & Gender project is a unique collaboration between researchers in literature and data analytics. The research team combines UCD’s strengths in cultural criticism and social network analysis, traditional humanities and new computational approaches, established and early stage researchers.
As part of this project, we have manually annotated a corpus of 19th-20th century Irish and British novels. Sample data for three case study novels is provided below. A full downloadable corpus is to follow in the second phase of the project.
Data Format
Each annotated novel contains the following data files and directories:
- fulltext.txt: a single file containing a version of the novel text with manual annotations to aid character identification.
- dictionary.txt: file containing list of all unique characters in the novel, along with their aliases.
- stopwords.txt: file containing list of words/phrases which should not be identified as characters.
- attributes.txt: file containing list of attributes for the characters in the novel.
- notes.txt: notes regarding the edition and annotation process for the specific novel.
- networks: directory containing the individiual chapter and overall character networks for the novel, in GEXF format.
Downloads
Data for each of the three case study novels is provided in a separate ZIP archive:
- >> Pride and Prejudice by Jane Austen
- >> Phineas Finn by Anthony Trollope
- >> A Portrait of the Artist as a Young Man by James Joyce
This data emanating from the Nation, Genre, Gender SNA Project by Gerardine Meaney, Derek Greene, Karen Wade, Maria Mulvany, Siobhan Grayson, Jennie Rothwell is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Permissions beyond the scope of this license may be available here.
For further information on our research, or if you have any queries on our research, please feel free to drop us a line at nationgenregender@gmail.com.