Skip to main content

Get to Know Natural Language Processing

Learning Objectives

After completing this unit, you’ll be able to:

  • Describe natural language processing.
  • Discuss everyday uses of natural language processing.
  • Explain how it has evolved since the 1950s.
  • Differentiate between natural language processing, natural language understanding, and natural language generation.

Before You Start

This badge contains terms like neural networks and deep learning that are described in detail in the Artificial Intelligence Fundamentals and Generative AI Basics badges. We recommend that you earn those badges first.

What Is Natural Language Processing?

Natural language processing (NLP), is a field of artificial intelligence (AI) that combines computer science and linguistics to give computers the ability to understand, interpret, and generate human language in a way that’s meaningful and useful to humans.

NLP helps computers perform useful tasks like understanding the meaning of sentences, recognizing important details in text, translating languages, answering questions, summarizing text, and generating responses that resemble human responses.

NLP is already so commonplace in our everyday lives that we usually don’t even think about it when we interact with it or when it does something for us. For example, maybe your email or document creation app automatically suggests a word or phrase you could use next. You may ask a virtual assistant, like Siri, to remind you to water your plants on Tuesdays. Or you might ask Alexa to tell you details about the last big earthquake in Chile for your daughter’s science project.

The chatbots you engage with when you contact a company’s customer service use NLP, and so does the translation app you use to help you order a meal in a different country. Spam detection, your online news preferences, and so much more rely on NLP.

A Very Brief History of NLP

It’s worth mentioning that NLP is not new. In fact, its roots wind back to the 1950s when researchers began using computers to understand and generate human language. One of the first notable contributions to NLP was the Turing Test. Developed by Alan Turing, this test measures a machine’s ability to answer any question in a way that’s indistinguishable from a human. Shortly after that, the first machine translation systems were developed. These were sentence- and phrase-based language translation experiments that didn’t progress very far because they relied on very specific patterns of language, like predefined phrases or sentences.

A 1950s mainframe-style computer being operated by a computer scientist.

By the 1960s, researchers were experimenting with rule-based systems that allowed users to ask the computer to complete tasks or have conversations.

The 1970s and 80s saw more sophisticated knowledge-based approaches using linguistic rules, rule-based reasoning, and domain knowledge for tasks like executing commands and diagnosing medical conditions.

Statistical approaches (i.e., learning from data) to NLP were popular in the 1990s and early 2000s, leading to advances in speech recognition, machine translation, and machine algorithms. During this period, the introduction of the World Wide Web in 1993 made vast amounts of text-based data readily available for NLP research.

A stack of papers and books.

Since about 2009, neural networks and deep learning have dominated NLP research and development. NLP areas of translation and natural language generation, including the recently introduced ChatGPT, have vastly improved and continue to evolve rapidly.

Note

Note: 

For more information about these and other important NLP advances, check out the Resources section.

Human Language Is “Natural” Language

What is natural language anyway? Natural language refers to the way humans communicate with each other using words and sentences. It’s the language we use in conversations, when we read, write, or listen. Natural language is the way we convey information, express ideas, ask questions, tell stories, and engage with each other. While NLP models are being developed for many different human languages, this module focuses on NLP in the English language.

If you completed the Artificial Intelligence Fundamentals badge, you learned about unstructured data and structured data. These are important terms in NLP, too. Natural language–the way we actually speak–is unstructured data, meaning that while we humans can usually derive meaning from it, it doesn’t provide a computer with the right kind of detail to make sense of it. The following paragraph about an adoptable shelter dog is an example of unstructured data.

Tala is a 5-year-old spayed, 65-pound female husky who loves to play in the park and take long hikes. She is very gentle with young children and is great with cats. This blue-eyed sweetheart has a long gray and white coat that will need regular brushing. You can schedule a time to meet Tala by calling the Troutdale shelter.

For a computer to understand what we mean, this information needs to be well-defined and organized, similar to what you might find in a spreadsheet or a database. This is called structured data. The information included in structured data and how the data is formatted is ultimately determined by algorithms used by the desired end application. For example, data for a translation app is structured differently than data for a chatbot. Here’s how the data in the paragraph above might look as structured data for an app that can help match dogs with potential adopters.

  • Name: Tala
  • Age: 5
  • Spayed or Neutered: Spayed
  • Sex: Female
  • Breed: Husky
  • Weight: 65 lbs.
  • Color: Gray and white
  • Eye color: Blue
  • Good with children: Yes
  • Good with cats: Yes
  • Favorite activities: Parks, hikes
  • Location: Troutdale

Natural Language Understanding and Natural Language Generation

Today’s NLP matured with its two subfields, natural language understanding (NLU) and natural language generation (NLG). Data processed from unstructured to structured is called natural language understanding (NLU). NLU uses many techniques to interpret written or spoken language to understand the meaning and context behind it. You learn about these techniques in the next unit.

Data processed the reverse way–from structured to unstructured–is called natural language generation (NLG). NLG is what enables computers to generate human-like language. NLG involves the development of algorithms and models that convert structured data or information into meaningful, contextually appropriate, natural-like text or speech. It also includes the generation of code in a programming language, such as generating a Python function for sorting strings.

In the past, NLU and NLG tasks made use of explicit linguistic structured representations like parse trees. While NLU and NLG are still critical to NLP today, most of the apps, tools, and virtual assistants we communicate with have evolved to use deep learning or neural networks to perform tasks from end-to-end. For instance, a neural machine translation system may translate a sentence from, say, Chinese, directly into English without explicitly creating any kind of intermediate structure. Neural networks recognize patterns, words, and phrases to make language processing exponentially faster and more contextually accurate.

In the next unit, you learn more about our natural language methods and techniques that enable computers to make sense of what we say and respond accordingly.

Resources

Share your Trailhead feedback over on Salesforce Help.

We'd love to hear about your experience with Trailhead - you can now access the new feedback form anytime from the Salesforce Help site.

Learn More Continue to Share Feedback