Social Media Log-in Pages

When was the last time you actually saw the Facebook log-in page? Other than those few times you might be accessing the social network on another computer, there's a good chance that you don't really see this page all that often. I know that for me, between mobile apps and just staying logged in on … Continue reading Social Media Log-in Pages

Advertisements

The Tumblr Porn Ban

In early December, the micro-blogging website and popular social media platform Tumblr announced that it was making a significant change to its website content policies. In a blog post titled "A better, more positive Tumblr" the company explained that effective December 17, posts containing adult content would no longer be allowed on the Tumblr platform. … Continue reading The Tumblr Porn Ban

Google Cloud Compute Housekeeping

I've been passively gathering data from 4chan's /pol/ board to keep tabs on the "Qanon" conspiracy, and the communities that were promoting it. I had started out with a "set it and forget it" sort of a deal for my 4chan /pol/ scraper. The problem is, though, I set it and the I forgot it.

Read on about some of the challenges of studying online communities and ephemeral content...

Automated Shitposting – Creating a 4chan Frontpage Scraper

Of course, researching and studying online communities can be incredibly difficult. Contrary to popular belief, once something is posted on the Internet, it isn't necessarily "there forever." When I was in elementary school, I was constantly told that once something was online, it was impossible for it to ever be removed. The reasoning behind this is sound—encouraging young people to be cognizant of what information they share is incredibly important. However, the truth is that there is plenty of online content that has simply disappeared. People stop paying their web hosting bills, links fail to get updated, or perhaps in the countless petabytes of data old content simply gets forgotten. And in the case of 4chan, threads are regularly pruned and "content is usually available for only a few hours or days before it is removed." This ephemerality, combined with the anonymity afforded by the website, challenge traditional conventions of research. It isn't necessarily possible for someone to visit the same URL and access the same content.

Given these challenges, I decided to work on creating an automated system to scrape 4chan content and save a local copy.