- Motivation: Social Media in the Philippines
- Process: Topic modeling
- News landscape of the Philippines
- Topic trends in 2016
- News page topic distribution
- Reactions to topics
- Final remarks
- Technical documentation
Process: Topic modeling
To answer the question, we need data, and the most complete source of data is on Facebook, by far the most commonly used social media platform on the Islands, particularly the Facebook Graph API. We extract data from this API for the top news pages in the country, then apply topic modeling to the content of the headlines, captions, and posts.
Topic modeling, in natural language processing and machine learning, is a way for us to take unstructured text such as headlines, captions, and posts, and discover latent “topics” or themes that are underlying the corpus. For this specific article, we use Latent Dirichlet Allocation (LDA)2 In other words, by taking into consideration the words used, we are able to define certain topics then classify each post in the topic to which the article belongs. For more information, please read the technical documentation.
News landscape of the Philippines
News ‘atlas’ of the Philippines
Once we perform the topic modeling, we can generate some pretty interesting visualizations. See here for an overview of the news landscape of the Philippines, each group is composed of the most relevant words for that topic, and a manually-determined label. The distances that the topics have from each other reflect their semantic distance, or basically how different the words are for that topic.
As you can see there is a central mass of mainly English news on both local and international topics. In the periphery are mostly lifestyle and entertainment topics, on the far top right there are sports news, and then Filipino language news settles at the bottom.
In the center of the mass you can see what (or who) is clearly the center of most news in 2016 - newly elected President Duterte. Spanning out from that topic are those of his policies and programs, the War on Drugs, his campaign, law enforcement, the Marcos Burial, and the drug-related charges filed against his critic Senator Leila de Lima.
To the northwest of President Duterte, you’ll see more general nationwide news - electoral process, transportation, finance and economy, and the weather.
You’ll see further northwest a slightly separated island that covers international news, foreign policy, and the exiting Aquino Administration, whose main focus in the final years of his term was to secure an arbitral judgement against China for its claims in the South China Sea.
One thing to note is that Women’s Volleyball and Beauty Pageants seems to have lumped together. Why? I have no clue.
But wait, how can we determine that these topic classifications actually make sense? Let’s try to pull out sample articles classified by the model and see if they can be understood.
Inspecting the headlines seems to show that the topics are well classified. For more details about how the model was trained, you are welcome to view the technical documentation.
Topic trends in 2016
Now that we are confident that topic classification is attained reasonably well, let’s take a look at the trends over time.
News page topic distribution
News page distribution
A common accusation leveled against traditional news media is the amount of bias in reporting various topics. We try to explore the topic distribution of news articles.
News page topic concentration
We know how each of the topics are distributed, but how do we measure topics against each other in terms of topic concentration and potential “bias”. One way to measure concentration across different topics is the Herfindahl-Hirschman Index (HHI). This index ranges from 0 to 100 with 100 meaning perfect concentration.
Reactions to topics
If topics are relatively well distributed in terms of topics, then how do we go about explaining the perception of biased media. Well, Facebook’s timeline is heavily geared to provide information relating to topics that you have already interacted with. If we compare the amount of news articles published per topic over time, with the reactions we have to them over time:
We can clearly see that this might just be because of people overindexing on topics that are most popular or those that they are more interested in.
Using natural language processing and topic modeling, we are able to turn unstructured text information and uncover latent topics in the corpus, so we can learn about the entire online news landscape in the country, not just what pops up in our newsfeeds. Some key takeaways from this exercise would be:
- Despite being a politics-driven year, we are still obsessed with entertainment. Movies, Television, and Showbiz all have taken top spots in the news.
- There are definite spikes in activity during certain weeks. Some of the most newsworthy events were the Elections in May, the South China Sea dispute in July, President Duterte in May and again in June, and the Marcos Burial in November.
- Despite what Facebook commenters may say, news pages are well distributed in terms of topics, and do not solely focus on particular topics. What actually appears on people’s newsfeeds, which is dependent on what users interact with, is another story.
In the interest of reproducible research, here are the notebooks containing the code, results, and commentary behind the post. You may download the resulting code and run it yourself, provided:
- You have the proper API keys to access the Facebook Graph API, and
- You agree that this is provided as-is and with no warranty.
You may also view the GitHub Repository here for the complete code and analysis that went into this post. You may also choose to collaborate with me in producing more parts to this series!
We extract data from the Facebook Graph API. This process took place on a Google Compute Engine Instance running over a week.
We model the topics using Latent Dirichlet Allocation and try to explore trends in the allocations.
On Data Visualization Design
: (a.k.a. a bunch of charts and graphs)
I was invited to speak last year at the World Information Architecture Day 2016 in Manila on data visualization design and use cases. The video is available for viewing.
On Why the Hero Generation is an Informed One (TEDxDLSU)
Businesses and governments in the Philippines should adopt a data mindset - where intuition and personal experience are always backed up by data and an effort to see the entire picture is always made - in order to realize the benefits of entering the demographic window.
On What It Means to Be Filipino
: Quantifying the Filipino Psyche
What's important in life? What do you want in a job? Is suicide justifiable? Would you want a drug addict as a neighbor? How many children would you like to have? Do you have confidence in the church? Find out how Filipinos responded to these types of questions using data from the World Values Survey.