Biased news:
Analyzing Google News: Introduction
Greg Coppola
Jul 25 · 2 min read
I have been suspended from my job at Google for saying in an
"); background-size: 1px 1px; background-position: 0px calc(1em + 1px);">interview that I believe News and Search results have a political bias. I want to explore this question in a series of posts, using
data science, with only publicly available information and tools.
We begin by replicating and extending an
"); background-size: 1px 1px; background-position: 0px calc(1em + 1px);">experiment run originally by Paula Boylard. I scraped Google News, searching for the query “
donald trump”, once a minute, 5000 times. A scrape had 105 stories on average.
Power-Law Distribution Over Sites
We begin by looking at the distribution of
publications (or
web-sites) that make up our new Google/Trump corpus. In particular, we look at the
probability that a randomly selected story comes from each given news site. The results are depicted here:
Note the use of a
power-law (or
80/20, or
rich-get-richer) distribution. The most-used site, CNN, is selected in 20% of all articles! In other words, even with the millions of sites on the Internet, 1 out of every 5 stories about “
donald trump” from Google News is from CNN.
Cumulative Distribution
In power-law style, 50% of all stories come from the top 5 sites (CNN, USA Today, NYT, Politico, Guardian), and 83% of all stories come from the top 20.
To be continued…
Does this list of web-sites look politically neutral to you? We’ll explore further in a future post!

WRITTEN BYGreg Coppola