Understand Deep Learning
- Explain how deep learning relates to machine learning.
- Describe the advantages of using deep learning for natural language processing.
Deep learning is a subfield of machine learning. But what sets it apart?
Machine learning runs on data. Most machine learning relies on human beings to identify and describe specific features of a data set. So for example, data scientists building a machine learning solution to identify place names in text can use code to describe specific features to look for, like:
- Capitalization of the target word
- Words to the left and the right of the target word
- Specific substrings in the target word that often indicate companies or people
- Hyphens in the target word
- And so forth.
A typical machine learning solution like this would eventually have thousands or even millions of hand-designed features. So what’s left for the machine to do, once humans have done all this feature identification work by hand? The machine’s job in this type of solution is mostly to use a learning algorithm to adjust the weighting of each feature to optimize prediction accuracy. Computers are great at this kind of numeric optimization, but these solutions still rely heavily on a human being thinking and learning about the problem.
So how can we actually help a machine to do the learning on its own, without using so much human intervention? We can use representation learning. In representation learning, computers identify the features in data on their own, without humans manually describing what to look for.
Simple forms of representation learning include things you might have seen in introductory machine learning materials. Clustering algorithms, like k-means and expectation maximization, are a type of representation learning that take unlabeled data and look for patterns to group it into clusters. Dimensionality reduction, where an algorithm works to "flatten" data with a large number of dimensions into fewer dimensions, is also a good example of representation learning.
Deep learning builds on this idea by using multiple layers of learned representations to create a human brain-esque system that outperforms other methods of learning. With deep learning, you feed a large data set into a model, which produces a learned representation. Then the model feeds that learned representation into another algorithm layer, which uses that input data to produce a new learned representation. The model repeats this pattern over and over, for a given number of layers, depending on how "deep" the model is. Each successive layer uses the output of the previous layer as its data, and then produces its own learned representations from that input. In a diagram, it looks something like this.
This layered structure produces a network. There are several families of deep learning models within this larger umbrella of layered representations. In current practice, most of deep learning uses neural networks.
Deep learning is an exciting technique for natural language processing. Previous attempts at natural language processing with hand-designed features were often over-specified and incomplete. They also took a very long time to validate and improve. Deep learning is both comparatively quick and flexible enough to adapt to new data quickly. This approach avoids much of the long design and validation cycle of manually designed features.
Because deep learning lets the computer build features for data on its own, it is a nearly universal framework for learning all kinds of information. This includes linguistic information, visual information, and contextual information about the world.
But the best reason to explore deep learning for natural language processing is that it works, and it works much better than other techniques researchers have tried.
Deep learning has made huge strides since its first successes with natural language processing around 2010. However, the basic techniques for deep learning first emerged in the 1980s and 1990s. So why have we only started exploring them in the last 10 years?
Firstly, and probably most importantly, we have vastly more data available now than we did in the 80s and 90s. The popularity and pervasiveness of the Internet means we’ve collected an unprecedented amount of data on just about everything, from products we buy to how we socialize. The Internet is made up of huge samples of language data, including casual speech from sources like Twitter and blogs. When it comes to machine learning, and especially deep learning, having large datasets to work with is key.
At the same time, faster machines and multicore CPUs and GPUs have emerged, which helps support the compute power deep learning requires. In particular, deep learning is well-suited to parallel processing, which is especially cheap and efficient right now.
Finally, new models, algorithms, and ideas have made deep learning more effective and flexible. This includes better, more flexible learning for intermediate representations, more effective learning methods for using context and transferring between tasks, and more effective end-to-end joint system learning.