Data and code is provided here to replicate results from the paper:
—D. Greene, J. P. Cross. "Exploring the Political Agenda of the European Parliament Using a Dynamic Topic Modeling Approach". Political Analysis, 2016. [Paper] [BibTeX]
All data is provided for personal use or for further non-commercial use, and all rights, including copyright, are © European Union, 2016 (Source: European Parliament). The EP releases this data under the following terms:
As a general rule, the reuse (reproduction or use) of textual data and multimedia items which are the property of the European Union (identified by the words '© European Union, [year(s)] ? Source: European Parliament' or '© European Union, [year(s)] ? EP' ) or of third parties (© External source, [year(s)]), and for which the European Union holds the rights of use, is authorised, for personal use or for further non-commercial or commercial dissemination, provided that the entire item is reproduced and the source is acknowledged.
Data downloads made available under the ODBLv1.0 license:
Python code for specifically apply topic modeling to the above European Parliament speech data is provided below,
made available under the Apache 2.0 license.
The README file in the archive describes the steps required to replicate our results.
This code has been tested with Python 2 (version 2.7.11), with the following third party modules installed. These can be installed via Pip or Anaconda:
Please note that the scikit-learn implementation of NMF was re-implemented in version 0.17 of the package, which can result in marginally different results for the topic models described in our paper. We recommend using 0.16 if seeking to reproduce our results exactly.
For more general purpose dynamic topic modeling on other text datasets, the dynamic-nmf package is also available, which is compatible with both Python 2.x and Python 3.x.
Figure 1 in the paper and the figures found in Appendix A can be replicated from the following file:
Figure 2 in the paper was produced using the following data:
Stata code for replicating the analysis in Sections 6.2 and 6.3, along with the relevant data derived from our topic model, can be found in the following file: