In this project, we crawled over 180,000 newspaper articles from German newspapers and analyzed their sentiment towards political parties and politicians (in total over 740,000 entities) using sentiment machine learning models. The project aimed at revealing biases within the political landscape and making public discussions more quantifiable.
The first step was to crawl, process and clean huge amounts of newspaper articles. We scraped data from all large German newspapers during the election year 2017⁄2018. Then, we extracted entities, i.e. words from the articles that can be associated with political parties, such as politicians or the party name. Lastly, we extracted a contextual sentiment towards these entities. The sentiment is just a simple “positive” or “negative” judgement (on a continuous scale). This is arguably a rather simple sentiment (compared to, say, “anger”) which has the advantage that the atomic decisions of the algorithm can be easily verified with “common sense”.
To give an example, here is the sentiment analysis applied to a particular sentence:
This sentiment labeled dataset was the basis of our subsequent analysis. We started off better understanding the dataset. The following plot, a Sankey diagram, visualizes the “flow” of mentions of particular entities as they occur in the different newspapers and also shows which political parties these entities belong to. It was interesting to see that few politicians (like Angela Merkel or Martin Schulz) make up the majority of mentions for a political party. Or in short: Few politicians seem to dominate the political landscape.
We also analyzed the sentiments on multiple levels. First, take a look at the sentiment distribution on average over time. Isn’t it fascinating to observe how major political events influence the overall sentiment? In other words, sentiment could be an indicator for a political situation and political trends.
Next, consider this barplot illustrating positive and negative mentions of specific politicians. If I’d now tell you that Annegret Kramp-Karrenbauer and Malu Dreyer both won state elections during our observation period, isn’t it an interesting coincidence that these are exactly the entities with the most positive sentiment? Also, considering the often accused “AfD bashing” of German media towards the AfD, does it surprise you that their politicians are at the very bottom of the sentiment scale?
The following chart is a Treemap which you can use to dive in deeper into the sentiments (with left and right click).
Finally, we used the gained insights of conditional sentiment distributions for the newspapers to build a two-dimensional framework that brings newspapers and parties together; we call it Sentiment Political Compass. One can use this metric to better understand newspapers’ attitudes towards political parties.
We were happy to get a paper of this work accepted at The Internet, Policy & Politics Conference 2018 at the University of Oxford, UK. Feel free to read the paper for further illustrations and the whole story. Together my co-author Julian, I presented the paper in front of a diverse audience. Speaking with social scientists and journalists about the media and politics was a fascinating experience with many interesting thoughts we have never come up with before!
In addition, we were featured by speakerpolitics.co.uk! Find their article here.
None of this work would have been possible without my fabulous co-authors and advisors, especially Julian Marstaller, Niklas Stoehr, Sören Maucher, Jeana Ren, [Andreas Thalhammer], Achim Rettinger, [Rudi Studer] who all put huge amounts of work into this project! We redid the whole data crawling and analysis twice, the second time next to our “regular” full-time duties as a student. So I could not be more proud of them sticking with me for so long :-)
For further reading, check out the following resources:
Website: Sentiment Political Compass website
Paper: Sentiment Political Compass: A Data-driven Analysis of Online Newspapers regarding Political Orientation
Dataset: Entity sentiment dataset
Code: Sentiment Political Compass code