These files are part of "Replication Data for: Exploring the Political Agenda of the European Parliament Using a Dynamic Topic Modeling Approach".
See https://doi.org/10.7910/DVN/LVHLZK/IBZAXJ
See https://doi.org/10.7910/DVN/LVHLZK/IBZAXJ
- europarl-data-speeches.zip: Archive of 211,302 English language European Parliament speeches in plain text format, one file per speech, where the filename is the unique speech identifier from the Europarl website. Speeches are arranged in sub-directories based on their date.
- europarl-documents-metadata.tsv: Tab-separated file containing metadata for all speeches, linked by the unique speech identifiers.
- europarl-meps-metadata.tsv: Tab-separated file containing metadata for MEPs who delivered the speeches.
- europarl-word2vec-model.bin: A pre-trained Word2vec model, generated on the set of all text files, which was created using Gensim using the default parameter values (i.e. CBOW model with window size of 5 and vector size of 100). This model is stored in standard Word2vec binary format.