Deconstructing Political Echo-Chambers with NLP Based Recommender Systems

Typically the goal of a recommender system is to serve users content they will like.  They can be useful tools for showing users interesting products or new shows, but they can become extremely problematic in political contexts where social media users are only served content that confirms their existing beliefs.

Many do not realize that while we are all using the same social media services, the content liberals and conservatives sees are so vastly divergent that despite living on the same planet, we exist in completely different media realities (the wall street journal did a fantastic piece showing blue v red fb feeds ->  This kind of dynamic creates environments where disinformation campaigns and hyperpartisan emotionally-charged clickbait go largely unchecked and unchallenged and contributes to the perilous social and political upheaval that’s challenging democratic societies around the world.

This project envisions a content-based recommender system that breaks down rather than reinforcing political filter bubbles and echo chambers.  (credit image below from WSJ article

The basic idea is simple – a user clicks some political content, for example, ​a pro-trade war Facebook ad.  The algorithm then attempts to recommend content that is the same topic, but comes from the opposite political spectrum. In this example, it would recommend an anti-trade war piece of content.  To do this we need to start with a large corpus of political content to build train and test models.

To build the corpus I ​pulled several hundred thousand political ads pulled from Facebook’s political ad library through its graph API and then labeled ads as left or right-wing depending on the ad buyer.  Then we train an LSTM neural net to recognize left-wing from right-wing.  We use the trained neural net in conjunction with TF-IDF model with cosine similarity scoring to search the corpus for ads that are from the opposite political spectrum but have similar topical content. here is the high-level architecture.

High Level Architecture

The project delivers results like below.  In this real example, the user clicks on an anti-Joaquin Castro ad, the model serves a pro-Joaquin Castro ad.

In this real example the user clicks on an anti Joaquin Castro ad, the model serves a pro Joaquin Castro ad.

This was an interesting experiment that let me play out how to build this kind of thing.  There are lots of technical challenges that would need to be addressed here to make this kind of thing​ deployable.  How do you discern political vs non-political content at scale in the real world?  Is it just ads?  Does it include non-payed content?

I would need still some additional work to make the engine more efficient and some LSTM model optimization as well.  To be clear this does not stop the spread of disinformation, extremist content, etc. but the hope is that this kind of engine might blunt the effects of hyperpartisan content by making sure users are seeing content that might challenge disinformation or provide additional issue context.

To actually tackle the core issues posed by political filter bubbles and echo chambers companies like Facebook (and especially Facebook) need to be exploring ways to modify their algorithms such that when it comes to political content users are seeing red and blue content, not just one or the other, and to eliminate micro-targeted ad-placements.  Google and Twitter have both taken steps in this vein recently to reduce political interference – your move Facebook!

Tech Stack

  • Python

About the Engineer

Joe is a Data Scientist, engineer, and problem solver with a passion for learning and an obsession with turning disorganized information into interesting and useful products. I love all things Statistics, Analytics, Machine Learning, Natural Language Processing and remote sensing.

company logo
Joe McAllister
Data Scientist and Machine Learning Engineer

Comment test