Turn Data into Models

Learning Objectives

After completing this unit, you’ll be able to:

Explain differences between hand-coded algorithms and trained models.
Define machine learning and how it relates to AI.
Distinguish between structured and unstructured data, and how it affects training.

Trailcast

If you'd like to listen to an audio recording of this module, please use the player below. When you’re finished listening to this recording, remember to come back to each unit, check out the resources, and complete the associated assessments.

The Trick Behind the Magic

What AI can do may seem like magic. And like magic, it’s natural to want a peek behind the curtain to see how it’s all done. What you’ll find is that computer scientists and researchers are using lots of data, math, and processing power in place of mirrors and misdirection. Learning how AI actually works will help you use it to its fullest potential, while avoiding pitfalls due to its limitations.

The Shift from Crafting to Training

For decades, programmers have written code that takes an input, processes it using a set of rules, and returns an output. For example, here’s how to find the average from a set of numbers.

Input: 5, 8, 2, 9
Process: Add the values [5 + 8 + 2 + 9] then divide by the number of inputs [4]
Output: 6

This simple set of rules for turning an input into an output is an example of an algorithm. Algorithms have been written to perform some pretty sophisticated tasks. But some tasks have so many rules (and exceptions) that it’s impossible to capture them all in a hand-crafted algorithm. Swimming is a good example of a task that is hard to encapsulate as a set of rules. You might get some advice before jumping in the pool, but you only really figure out what works once you’re trying to keep your head above water. Some things are learned best by experience.

What if we could train a computer in the same way? Not by tossing it into a pool, but by letting it figure out what works to succeed at a task? But just like learning to swim is very different from learning to speak a foreign language, the kind of training depends on the task. Let’s check out a few of the ways AI is trained.

Experience Required

Imagine that every time you went to the store to pick up milk, you tracked details of the trip in a spreadsheet. It’s a little weird, but go with it. You set up the following columns.

Is it the weekend?
Time of day
Is it raining or not?
Distance to store
Total minutes of trip

After several trips you start getting a feel for how conditions affect how long it’ll take. Like, rain makes the drive longer, but it also means fewer people are shopping. Your brain makes connections between the inputs (weekend [W], time [T], raining [R], distance [D]) and the output (minutes [M]).

Diagram of inputs [W, T, R, D

But how can we get a computer to notice trends in the data so it can estimate too? One way is the guess-and-check method. Here’s how you do it.

Step 1: Assign all of your inputs a “weight.” This is a number that represents how strongly an input should affect the output. It’s OK to start with the same weight for everything.

Step 2: Use the weights with your existing data (and some clever math we won’t get into here) to estimate the minutes for a milk run. We can compare the estimate to the historic data. It’ll be way off, but that’s OK.

Step 3: Let the computer guess a new weight for each input, making some a little more important than others. For example, the time of day might be more important than whether or not it’s raining.

Step 4: Rerun the calculations to check if the new weights result in a better estimate. If so, it means the weights are a better fit, and changing in the right direction.

Step 5: Repeat steps 3 and 4, letting the computer tweak weights until its estimates aren’t getting any better.

At this point the computer has settled on weights for each input. If you think of weight as how strongly an input is connected to the output, you can make a diagram that uses line-thickness to represent the weight of a connection.

Diagram of input nodes connected to an output.

For this example it looks like the time of day has the strongest connection, but apparently rain doesn’t make much of a difference.

This process of guess-and-check has created a model of our milk runs. And like a model boat, we can take it to the pool to see if it floats, so to speak. That means testing it in the real world. So for your next several milk runs, before you leave, have the model estimate how long it’ll take. If it’s right enough times in a row, you can confidently let it do the estimating for every future trip.

A robot is at a workbench putting together the pieces of a small model sailboat. The picture is drawn in 2D vector art style.

[AI-generated image using DreamStudio at stability.ai with the prompt, “A robot is at a workbench putting together the pieces of a small model sailboat. The picture is drawn in 2D vector art style.”]

Use the Right Data for the Right Job

This is a very simple example of using training to make an AI model, but it touches on some important ideas. First, it’s an example of machine learning (ML), which is the process of using large amounts of data to train a model to make predictions, instead of handcrafting an algorithm.

Second, not all data is the same. In our milk run example, the spreadsheet is what we would call structured data. It is well organized, with labels on every column so you know the significance of every cell. In contrast, unstructured data would be something like a news article, or an unlabeled image file. The kind of data that you have available will affect what kind of training you can do.

Third, the structured data from our spreadsheet lets computers do supervised learning. It’s considered supervised because we can make sure every piece of input data has a matching, expected output that we can verify. Conversely, unstructured data is used for unsupervised learning, which is when AI tries to find connections in the data without really knowing what it’s looking for.

Letting the computer figure out a single weight for each input is just one kind of training regimen. But often interconnected systems are more complicated than what 1-to-1 weighting can represent. Thankfully, as you learn in the next unit, there are other ways to train!

Time Estimate

Topics

Looking for Help?

Einstein Resources