By Stephanie Kim, Algorithmia.
Who swears more? Do Twitter users who mention Donald Trump swear more than those who mention Hillary Clinton? Let’s find out by taking a natural language processing approach (or, NLP for short) to analyzing tweets.
This walkthrough will provide a basic introduction to help developers of all background and abilities get started with the NLP microservices available on Algorithmia. We’ll show you how to chain them together to perform light analysis on unstructured text. Unfamiliar with NLP? Our gentle introduction to NLP will help you get started.
We know that getting started with a new platform or developer tool is an investment in time and energy. Sometimes it can be hard to find the information you need in order to start exploring on your own. That’s why we’ve centralized all our information in the Algorithmia Developer Center and API Docs, where users will find helpful hints, code snippets, and getting started guides. These guides are designed to help developers integrate algorithms into applications and projects, learn how to host their trained machine learning models, or build their own algorithms for others to use via an API endpoint.
Now, let’s tackle a project using some algorithms to retrieve content, and analyze it using NLP. What better place to start than Twitter, and analyzing our favorite presidential candidates?
Twitter, Trump, and Profanity: An NLP Approach
First, let’s find the Twitter-related algorithms on Algorithmia. Go to the search bar on top of the navigation and type in “Twitter”:
You’ll get quite a few results, but find the one called Retrieve Tweets with Keyword, and check out the algorithm page where it will tell you such information as the algorithm’s description, pricing, and the permissions set for this algorithm:
+-- profanity_demo | +-- data | +-- Donald-Trump-OR-Trump.csv | +-- Hillary-Clinton-OR-Hillary.csv | +-- logs | +-- twitter_data_pull.log | +-- profanity_analysis.py | +-- twitter_pull_data.py
You’ll need a free Algorithmia account to complete this project. Sign up for free and receive an extra 10,000 credits. Overall, the project will consist of processing around 700 tweets or so with emoticons and other special characters stripped out. This means if a tweet only contained URL’s and emoticons then it won’t be analyzed. Once we pull our