Skip to main content

Learn About Natural Language Parsing

Learning Objectives

After completing this unit, you’ll be able to:

  • Discuss the basic elements of natural language.
  • Describe several important techniques used when parsing natural language.
  • Explain how sentiment, intent, and context analysis contribute to NLP.

Basic Elements of Natural Language

Understanding and processing natural language is a fundamental challenge for computers. That's because it involves not only recognizing individual words, but also comprehending their relationships, their context, and their meaning.

Our natural language, in text and speech, is characterized by endless complexity, nuances, ambiguity, and mistakes. In our everyday communication, we encounter words with several meanings; words that sound the same but are spelled differently and have different meanings; misplaced modifiers; misspellings; and mispronunciations. We also encounter people who speak fast, mumble, or who take forever to get to the point; and people who use speech patterns in accents or dialects that are different from ours.

Take this sentence for example:

“We saw six bison on vacation in Yellowstone National Park.”

You might giggle a little as you imagine six bison in hats and sunglasses posing for selfies in front of Old Faithful. But, most likely, you understand what actually happened–that is, that someone who was on vacation in Yellowstone National Park saw six bison.

Or this:

“They swam out to the buoy.”

If you heard someone speak this sentence without any context, you may think the people involved swam out to a male child, when in fact, they swam out to a marker in the water. The pronunciation of “boy” and “buoy” is slightly different, but the enunciation is not always made clear.

While humans are able to flex and adapt to language fairly easily, training a computer to consider these kinds of nuances is quite difficult.

Elements of natural language in English include:

  • Vocabulary: The words we use
  • Grammar: The rules governing sentence structure
  • Syntax: How words are combined to form sentences according to grammar
  • Semantics: The meaning of words, phrases, and sentences
  • Pragmatics: The context and intent behind cultural or geographic language use
  • Discourse and dialogue: Units larger than a single phrase or sentence, including documents and conversations
  • Phonetics and phonology: The sounds we make when we communicate
  • Morphology: How parts of words can be combined or uncombined to make new words

Parsing Natural Language

Teaching a computer to read and derive meaning from words is a bit like teaching a child to read–they both learn to recognize words, their sounds, meaning, and pronunciation. But when a child learns to read, they usually have the advantage of context from a story; visual cues from illustrations; and relationships to things they already know, like trees or animals. They also often get assistance and encouragement from experienced readers, who help explain what they’re learning. These cues help new readers identify and attach meaning to words and phrases that they can generalize to other things they read in the future.

Natural Language Processing robot pointing to symbols of tasks it can process: lists, information, customer service, spam detection.

We know that computers are a different kind of smart, so while a computer needs to understand the elements of natural language described above, the approach needs to be much more scientific. NLP uses algorithms and methods like large language models (LLMs), statistical models, machine learning, deep learning, and rule-based systems to process and analyze text. These techniques, called parsing, involve breaking down text or speech into smaller parts to classify them for NLP. Parsing includes syntactic parsing, where elements of natural language are analyzed to identify the underlying grammatical structure, and semantic parsing which derives meaning.

As mentioned in the last unit, natural language is parsed in different ways to match intended outcomes. For example, natural language that’s parsed for a translation app uses different algorithms or models and is parsed differently than natural language intended for a virtual assistant like Alexa.

Syntactic parsing may include:

  • Segmentation: Larger texts are divided into smaller, meaningful chunks. Segmentation usually occurs at the end of sentences at punctuation marks to help organize text for further analysis.
  • Tokenization: Sentences are split into individual words, called tokens. In the English language, tokenization is a fairly straightforward task because words are usually broken up by spaces. In languages like Thai or Chinese, tokenization is much more complicated and relies heavily on an understanding of vocabulary and morphology to accurately tokenize language.
  • Stemming: Words are reduced to their root form, or stem. For example breaking, breaks, or unbreakable are all reduced to break. Stemming helps to reduce the variations of word forms, but, depending on context, it may not lead to the most accurate stem. Look at these two examples that use stemming:

“I’m going outside to rake leaves.”

Stem = leave 

“He always leaves the key in the lock.”

Stem = leave 

  • Lemmatization: Similar to stemming, lemmatization reduces words to their root, but takes the part of speech into account to arrive at a much more valid root word, or lemma. Here are the same two examples using lemmatization:

“I’m going outside to rake leaves.”

Lemma = leaf 

“He always leaves the key in the lock.”

Lemma = leave 

  • Part of speech tagging: Assigns grammatical labels or tags to each word based on its part of speech, such as a noun, adjective, verb, and so on. Part of speech tagging is an important function in NLP because it helps computers understand the syntax of a sentence.
  • Named entity recognition (NER): Uses algorithms to identify and classify named entities–like people, dates, places, organizations, and so on–in text to help with tasks like answering questions and information extraction.

Semantic Analysis

Parsing natural language using some or all of the steps we just described does a pretty good job of capturing the meaning of text or speech. But it lacks soft skill nuances that make human language, well, human. Semantic parsing involves analyzing the grammatical format of sentences and relationships between words and phrases to find the meaning representation. Extracting how people feel, why they are engaging, and details about circumstances surrounding an interaction all play a crucial role in accurately deciphering text or speech and forming an appropriate response.

Here are several common analysis techniques that are used in NLP. Each of these techniques can be powered by a number of different algorithms to get the desired level of understanding depending on the specific task and the complexity of the analysis.

Sentiment analysis: Involves determining whether a piece of text (such as a sentence, a social media post, a review, or a tweet) expresses a positive, negative, or neutral sentiment. A sentiment is a feeling or an attitude toward something. For example, sentiment analysis can determine if this customer review of a service is positive or negative: "I had to wait a very long time for my haircut.” Sentiment helps identify and classify emotions or opinions in text to help businesses understand how people feel about their products, services, or experiences.

A happy-looking woman with a speech bubble that says, “This is my favorite pizza, ever!” and a grumpy-looking man with a speech bubble that says, “I’m still waiting for my haircut.”

Intent analysis: Intent helps us understand what someone wants or means based on what they say or write. It’s like deciphering the purpose or intention behind their words. For example, if someone types, “I can’t log in to my account,” into a customer support chatbot, intent analysis would recognize that the person’s intent is to get help to access their account. The chatbot might reply with details about resetting a password or other means the user can try to access their account. Virtual assistants, customer support systems, or chatbots often use intent analysis to understand user requests and provide appropriate responses or actions.

Context (discourse) analysis: Natural language relies heavily on context. The interpretation of a statement might change based on the situation, the details provided, and any shared understanding that exists between the people communicating. Context analysis involves understanding this surrounding information to make sense of a piece of text. For example, if someone says, “They had a ball,” context analysis can determine if they are talking about a fancy dance party, a piece of sports equipment, or a whole lot of fun. It does this by considering the previous conversation or the topic being discussed. Context analysis helps NLP systems interpret words more accurately by taking into account the broader context, the relationships between words, and other relevant information.

These three analysis techniques–sentiment analysis, intent analysis, and context analysis–play important roles in extracting valuable insights from text and speech data. They create a more sophisticated and accurate understanding and engagement with textual content in various applications of NLP.

Summary

In this module, you’ve learned about NLP at a very high level, and as it relates to the English language. To-date, the majority of NLP study is conducted using English, but you can also find a great deal of research done in Spanish, French, Farsi, Urdu, Chinese, and Arabic. NLP is a very rapidly evolving field of AI. And advancements in NLP are quickly leading to more sophisticated language understanding, cross-language capabilities, and integration with other AI fields.

Resources

Share your Trailhead feedback over on Salesforce Help.

We'd love to hear about your experience with Trailhead - you can now access the new feedback form anytime from the Salesforce Help site.

Learn More Continue to Share Feedback