Start tracking your progress
Trailhead Home
Trailhead Home

Train the Dataset and Create a Model

Learning Objectives

After completing this unit, you’ll be able to:
  • Call the Einstein Intent API to train the dataset and create a model.
  • Use the API to query the status of the training.
  • Explain some of the basic stats returned by the training status.

What Does Training Mean?

In machine learning, training is the process by which algorithms are combined with data to create a model. The algorithms “learn” from the data you provide, and that learning is encapsulated in the model.

In our scenario, the model is the artifact that returns an answer to the question, “What type of service is the user requesting?” The model bases the answer on the text users enter in the service request form.

Train the Dataset

The best way to learn about training is to jump in and do it. Now that you have a dataset that contains labeled examples, it’s time to train it.
  1. In the following cURL command, replace <TOKEN> with your token and <DATASET_ID> with the dataset ID. Then run the command in the command line window.
    curl -X POST -H "Authorization: Bearer <TOKEN>" -H "Cache-Control: no-cache" -H "Content-Type: multipart/form-data" -F "name=Service Request Routing Model" -F "datasetId=<DATASET_ID>" https://api.einstein.ai/v2/language/train
    The API response looks similar to this JSON. Training a dataset can take a while, depending on how much data there is.
    {
      "datasetId": 1010060,
      "datasetVersionId": 0,
      "name": "Service Request Routing Model",
      "status": "QUEUED",
      "progress": 0,
      "createdAt": "2017-08-18T21:39:53.000+0000",
      "updatedAt": "2017-08-18T21:39:53.000+0000",
      "learningRate": 0,
      "epochs": 0,
      "queuePosition": 1,
      "object": "training",
      "modelId": "3XVRF4KPA4522DWDRDCQ4D4BEQ",
      "trainParams": null,
      "trainStats": null,
      "modelType": "text-intent"
    }

    The important fields to note are status and modelId. The status field value is QUEUED, and that tells you that the training process hasn’t started. The queuePosition field tells you that it’s first in line. The modelId field contains the ID of the model. Make a note of the modelId, because you use this ID any time you refer to the model in your code.

Get the Training Status

You know that the training process is queued up and ready to go. But you want to know when the model is ready.
  1. In the following cURL command, replace <TOKEN> with your token and <MODEL_ID> with the model ID. Then run the command in the command line window.
    curl -X GET -H "Authorization: Bearer <TOKEN>" -H "Cache-Control: no-cache" https://api.einstein.ai/v2/language/train/<MODEL_ID>
    The API response looks similar to the following JSON. Training takes a while to complete depending on the size of the dataset. The status of RUNNING means that the training process is still running.
    {
      "datasetId": 1010060,
      "datasetVersionId": 0,
      "name": "Service Request Routing Model",
      "status": "RUNNING",
      "progress": 0,
      "createdAt": "2017-09-18T19:59:33.000+0000",
      "updatedAt": "2017-09-18T19:59:33.000+0000",
      "learningRate": 0,
      "epochs": 0,
      "object": "training",
      "modelId": "3XVRF4KPA4522DWDRDCQ4D4BEQ",
      "trainParams": null,
      "trainStats": null,
      "modelType": "text-intent"
    }
    A status field value of SUCCEEDED indicates that the training was successful, and the model is ready for you to use it. When the model is ready, the API response looks similar to the following JSON. Be sure that status is SUCCEEDED before you proceed to the next unit.
    {
      "datasetId": 1010060,
      "datasetVersionId": 6189,
      "name": "Service Request Routing Model",
      "status": "SUCCEEDED",
      "progress": 1,
      "createdAt": "2017-08-18T21:39:53.000+0000",
      "updatedAt": "2017-08-18T21:42:23.000+0000",
      "learningRate": 0,
      "epochs": 1000,
      "object": "training",
      "modelId": "3XVRF4KPA4522DWDRDCQ4D4BEQ",
      "trainParams": null,
      "trainStats": {
        "labels": 5,
        "examples": 150,
        "totalTime": "00:02:28:577",
        "transforms": null,
        "trainingTime": "00:02:25:646",
        "earlyStopping": true,
        "lastEpochDone": 49,
        "modelSaveTime": "00:00:00:579",
        "testSplitSize": 32,
        "trainSplitSize": 118,
        "datasetLoadTime": "00:00:02:931",
        "preProcessStats": null,
        "postProcessStats": null
      },
      "modelType": "text-intent"
    }

    The trainStats object contains information about how long various tasks in the training process took. This can be helpful information to gauge the training times based on the amount of data you have.

    Another interesting field is the epochs field. An epoch is a training iteration or a full pass through all the examples in the dataset. When you train a dataset with the command we used, the API selects a default number of epochs based on the amount of data in the dataset. However, you can pass in the number of epochs you want the training process to use.

    In our dataset, the API chose 1,000 epochs or training iterations. But the earlyStopping field reports that the training stopped before it completed the 1,000 epochs. And the lastEpochDone field reports that the training stopped after 49 epochs.

    Why did it stop before completing 1,000 epochs? During the training process, if the API determines that further training won’t improve the model accuracy, it stops the training early. This is just one way that the API handles a lot of the details for you.

Resources

retargeting