Get Real with NLP
Learning Objectives
- Define intrinsic and extrinsic evaluation.
- Describe the advantages and disadvantages of intrinsic and extrinsic evaluation.
- Perform sentiment analysis on a corpus of text.
Evaluating Word Vectors
Over the past couple of modules, we’ve talked a lot about how various algorithms for training word vectors perform, but how do you actually evaluate a word vector? What makes a good word vector? And how do we know when our model outperforms other models?
There are two basic categories of word vector evaluations: intrinsic and extrinsic.
Intrinsic evaluations are usually made on a specific sub-task that’s part of a larger end-task. For example, determining how well word vector similarities correlate with human ideas about word similarity is an intrinsic evaluation that you could make on a larger text classification system (the end-task).
Extrinsic evaluations are ones you make on real, external tasks. How well a particular set of word vectors works for machine translation or sentiment analysis is an extrinsic evaluation.
There are advantages and disadvantages to both methods. Intrinsic evaluations are generally fast to compute and give insight into your specific word vectors. However, it’s not clear from an intrinsic evaluation how your word vectors actually work in a real application. Improvements you can measure using an intrinsic evaluation don’t necessarily guarantee better performance on a real task.
Extrinsic evaluations, on the other hand, take longer to compute. And most final tasks, like machine translation, involve more than one system. It’s possible for a factor other than your word vectors to influence how well or poorly the task performs. With an extrinsic evaluation, you only know that your word vectors are an improvement if replacing a previous set of word vectors with your new vectors (without any other changes) improves task performance.
Who’s up for some examples?
Example: Intrinsic Evaluation
Probability & Ratio | k = solid | k = gas | k = water |
P(k | ice) | 1.9 x 10-4 | 6.6 x 10-5 | 3.0 x 10-3 |
P(k | steam) | 2.2 x 10-5 | 7.8 x 10-4 | 2.2 x 10-3 |
P(k | ice)P(k | steam) | 8.9 | 8.5 x 10-2 | 1.36 |
- GloVe is much more likely (8.9 times more) to predict the word “solid” after seeing the word “ice” than seeing the word “steam.”
- GloVe is much more likely to predict the word “gas” after seeing the word “steam” than seeing the word “ice.”
- The probability of seeing the word “water” is roughly equal after seeing the word “ice” or the word “steam.”
These results are pretty intuitive. Ice is solid, steam is gaseous, and both ice and steam are made upof water. This intrinsic evaluation tells us that GloVe performed well on this set of word vectors.
Notice, however, that this doesn’t tell us anything about whether we’d see results in a final task, like search or machine translation.
Example: Extrinsic Evaluation
To evaluate word vectors extrinsically, we need to look at how models perform on an NLP task using one set of vectors compared to another set of vectors on a real task.
Let’s think about sentiment analysis. When we do sentiment analysis, we’re trying to indentify the general sentiment, or feeling, of a particular text. In the real world, a company could use sentiment analysis to do something like parse a bunch of Tweets to see whether people are talking about their brand in a positive or negative light.
- “I’m having a blast learning about NLP on Trailhead!”—positive
- “Writing code for NLP is very hard :(”—negative
A system that can make these types of classifications has a lot of components. There’s a pre-processing phase, where data is collected and cleansed. There could be systems to segment tweets into different parts, a feature extraction stage, and, of course, the model itself.
In these cases, it’s often unclear which of these sub-systems is causing problems in your larger system. Generally, if you replace a sub-system and your results improve, then you made a good change.
Get Hands On with Natural Language Processing
In this trail, you’ll complete problem sets using a Google product called Colaboratory. That means you have to have a Google account to complete the challenges. If you don’t have a Google account or you want to use a separate one, you can create an account here.
- Download the source code.
- Make sure you’re logged in to your Google account.
- Go to Colaboratory.
- In the dialog menu, click Upload.
- Choose the source code file (.ipynb) and click Open.
Now you’re ready to start coding! Each piece of code is contained in cells. When you click into a cell, a play button appears. This button lets you run the code in the cell.
Have fun!