Remove Bias from Your Data and Algorithms

Learning Objectives

After completing this unit, you’ll be able to:

Identify factors that are excluded from or overrepresented in your dataset.
Explain the benefit of holding premortems to reduce interaction bias.
Set a plan to ensure new bias hasn’t been introduced into your results.

Manage Risks of Bias

We've discussed the different kinds of bias to consider while working with AI. Now for the hard part: how to prevent or manage the risks those biases create. You can’t magically de-bias your training data. Removing exclusion is both a social and technical problem: You can take precautions as a team in how you plan and execute your product, in addition to modifying your data.

Conduct Premortems

As we discussed in the first unit, creating a product responsibly starts with building an ethical culture. One way to do this is by incorporating premortems into your workflow.

A premortem is the opposite of a post-mortem—it's an opportunity to catch the “what went wrong” before it happens. Often, team members can be hesitant to share reservations in the planning phase of a project. In a sensitive area like AI, it’s paramount that you and your team are open about whatever misgivings you might have and are willing to get uncomfortable. Holding such a meeting can temper the desire to throw caution to the wind in the initial enthusiasm over a project by setting measured and realistic expectations.

Identify Excluded or Overrepresented Factors in Your Dataset

Consider the deep social and cultural factors that are reflected in your dataset. As we detailed in the previous unit, any bias at the level of your dataset can impact your AI’s recommendation system, and can result in the over- or underrepresentation of a group.

From a technical perspective, here are a couple ways that you can address bias in your data. These techniques are by no means comprehensive.

What: Statistical patterns that apply to the majority may be invalid within a minority group.

How: Consider creating different algorithms for different groups rather than one size fits all.

What: People are excluded from your dataset, and that exclusion has an impact on your users. Context and culture matter, but it may be impossible to see the effects in the data.

How: Look for what researchers call unknown unknowns, errors that happen when a model is highly confident about a prediction that is actually wrong. Unknown unknowns are in contrast to known unknowns, incorrect predictions that the model makes with low confidence. Similar to when a model generates content, it can produce information completely unfactual to your request.

Regularly Evaluate Your Training Data

As we’ve said before, developing an AI system starts at the level of your training data. You should be scrupulous about addressing data quality issues as early as possible in the process. Make sure to address extremes, duplicates, outliers, and redundancy in CRM Analytics or other data preparation tools.

Before you release your models, make sure to run prerelease trials so that your system doesn't make biased predictions or judgments and impact people in the real world. Ensure that they’ve been tested so that they won’t cause harm. You want to be able to account for your product working across different communities so that you don’t get any surprises upon release.

After you release a model, develop a system for periodically checking the data that your algorithms are learning from, and the recommendations your system is making. Think of your data as having a half-life—it won’t work for everyone indefinitely. On the technical side, the more data enters a system, the more an algorithm learns. This can lead the system to identify and match patterns that those developing the product didn’t foresee or want.

On the social side, cultural values change over time. Your algorithms’ output may no longer suit the value systems of the communities it serves. Two ways you can address these challenges include paid community review processes to correct oversight, and by creating mechanisms in your product for individuals and users to opt out or correct data about themselves. Community review processes should include people from the communities that may be impacted by the algorithmic system you’re developing. You should also hold sessions with the people who will implement, manage, and use the system to meet their organization’s goals. Head over to our UX Research Basics to learn more about methods you can use to conduct community review processes as well as conduct user research to understand the contexts your tool will be used in.

Conclusion

AI can be a force for good, potentially detecting tumors that humans cannot and Alzheimer’s before one’s family can or preserving indigenous languages. Throughout this module, we’ve shown the power of AI systems, but also their opacity. If we want AI to benefit society more than it harms it, we have to acknowledge the risks and take action to ensure AI systems are designed, developed, and used responsibly.

As technologists, even when we’re conscientious and deliberate in our approach there will be surprises along the way. We can’t always predict the interactions between datasets, models, and their cultural context. Datasets often contain biases that we’re not aware of, and it’s our responsibility to evaluate and assess training data and our models’ predictions to ensure they don’t yield any damaging results.

Developing ethical AI systems is a sociotechnical process. Look at it not only in terms of its technical implementation, but also through the way it’s developed across teams and the social contexts it will be used in. What’s more, assess who’s involved in the process—how are gender, race, ethnicity, and age represented? The people building AI products and the bias engendered by these systems are interconnected.

To realize safe, socially beneficial AI, we need to remember that humans are at the heart of it. AI is a tool and we choose how to use it. Regardless of someone’s role, their minor decisions can have serious, lasting consequences. At Salesforce, we strongly believe that we can do well and do good. You can make a profit without harming others, and, in fact, make a positive impact in the process.

Resources

Trailhead: Ethical Data Use Best Practices: Quick Look
Blog: How to Build Ethics Into AI - Part 2
Blog: Is Your Data AI-Ready?
Research Paper: Path-Specific Counterfactual Fairness

Time Estimate

Topics

Looking for Help?

Einstein Resources