The dictionary I use in my code
If you write code that deals with natural language, then at some point, you will need to use data from a dictionary. You have to make a choice at this point.
- You can either choose one the big names e.g. Oxford, Merriam-Webster, Macmillan etc. and use their API for getting the data
- Or you can choose WordNet.
I have tried both and find WordNet to be the best tool for the job.
For those who don’t know, WordNet is a machine readable database of words which can be accessed from most popular programming languages (C, C#, Java, Ruby, Python etc.). I have several reasons for preferring WordNet over the other options.
- Many of the big company APIs require payment. WordNet is free.
- Many of the big company APIs are online only. WordNet can be downloaded and used offline.
- WordNet is many times more powerful that any other dictionary or thesaurus out there.
The last point requires some explanation.
WordNet is not like your everyday dictionary. While a traditional dictionary features a list of words and their definitions, WordNet focuses on the relationship between words (in addition to definitions). The focus on relationships makes WordNet a network instead of a list. You might have guessed this already from the name WordNet.
In the WordNet network, the words are connected by linguistic relations. These linguistic relations (hypernym, hyponym, meronym, pertainym and other fancy sounding stuff), are WordNet’s secret sauce. They give you powerful capabilities that are missing in ordinary dictionaries/thesauri.
We will not go deep into linguistics in this article because that is besides the point. But I do want to show you what you can achieve in your code using WordNet. So let’s look at the two most common use cases (which any dictionary or thesaurus should be able to do) and some advanced use cases (which only WordNet can do) with example code.
Let’s start with the simplest use case i.e word lookups. We can look up the meaning of the any word in WordNet in three lines of code (examples are in Python).
Synonym and Antonym lookup
WordNet can function as a thesaurus too, making it easy to find synonyms and antonyms. To get the synonyms of the word beloved, for instance, I can type the following line in Python…
… and get the synonyms dear, dearest, honey and love, as expected. Antonyms can be obtained just as simply.
Cross Part of Speech lookup
WordNet can do things that dictionaries/thesauri can’t. For example, WordNet knows about cross Part of Speech relations. This kind of relation connects a noun (e.g. president) with its derived verb (preside), derived adjective (presidential) and derived adverb (presidentially). The following snippet displays this functionality of WordNet (using a WordNet based Python package called word_forms).
Being able to generate these relations is particularly useful for Natural Language Processing and for English learners.
In addition to being a dictionary and thesaurus, WordNet is also a taxonomical classification system. For instance, WordNet classifies dog as a domestic animal, a domestic animal as an animal, and an animal as an organism. All words in WordNet have been similarly classified, in a way that reminds me of taxonomical classifications in biology.
The following snippet shows what happens if we follow this chain of relationships till the very end.
To visualize the classification model, it is helpful to look at the following picture, which shows a small part of WordNet.
Semantic word similarity
The classification model of WordNet have been used for many useful applications. One such application computes the similarity between two words based on the distance between words in the WordNet network. The smaller the distance, the more similar the words. In this way, it is possible to quantitatively figure out that a cat and a dog are similar, a phone and a computer are similar, but a cat and a phone are not similar!
WordNet has comprehensive coverage of the English language. Currently, it has 155,287 English words. The complete Oxford English Dictionary has nearly the same number of modern words (171,476). WordNet was last updated in 2011. Some contemporary English words like bromance or chillax seems to be missing it in for this reason, but this should not be a deal breaker for most of us.
If you want to know more about WordNet, the following references are very helpful.
- The original WordNet paper does an excellent job of explaining the linguistic relations defined in WordNet.
- The excellent NLTK reference on WordNet provides a great tutorial on accessing WordNet in Python.
- If you want to try WordNet in a different programming language e.g. C, C#, Java, Ruby, PHP, MySQL etc., it is possible to locate the required package in this page.
- You can also try out the web based online demo.
Thanks for reading. I hope this quick post about WordNet was helpful to you.
If you like this post, kindly recommend it to your friends by clicking the ❤ button. It will help other people on Medium discover it.
Do you have some thoughts, questions or opinions on WordNet? Please let me know in the comments!