Learn Data Fundamentals
After completing this unit, you’ll be able to:
- Describe what data is.
- Describe the various types of data sources.
You know that data literacy is the ability to explore, understand, and communicate with data. But what exactly is data?
Data is individual facts, statistics, or items of information. A collection of data is a collection of facts. Even more specifically, consider this expanded definition. Jeffrey Leek, a data scientist working as a professor at Johns Hopkins Bloomberg School of Public Health, started with Wikipedia’s definition of data and expanded it to form his own definition:
Data is comprised of [sic] values of qualitative or quantitative variables, belonging to a set of items.
Let’s break that down and define its terms.
Set of items
Sometimes called the population, this is the group of objects you are interested in.
A measurement, property, or characteristic of an item that may vary or change (as opposed to a constant measurement, such as pi, that does not vary).
A qualitative variable describes qualities or characteristics, such as country of origin, gender, name, or hair color.
A quantitative variable describes measurable characteristics, such as height, weight, or temperature.
Are you ready to check your understanding? In the following activity, you determine whether each characteristic is a qualitative variable or a quantitative variable.
Note: In this module, we treat the word data as singular, not plural. However, there is ongoing debate as to whether the word is singular or plural. The Cambridge Dictionary, for example, designates data as both singular and plural.
How Is Data Collected?
We have various tools and techniques for collecting data, such as questionnaires, interviews, observations, analysis of documents, web scraping, and machine measurements. Received or collected data is called raw data. Raw data, which is also known as source data or primary data, hasn’t been processed in any way. This means it hasn’t been run through any software, had any variables modified, had any data removed, or been summarized in any way. Raw data allows the most comprehensive data analysis, because no data has been removed or summarized.
Some examples of raw data include:
- A bacteria specimen viewed under a microscope
- Binary files produced by measurement machines
- Unformatted spreadsheet files
- JSON data scraped from the Twitter API
- Numbers collected and recorded manually
Types of Data Sources
A data source contains the data used for exploration, understanding, and communication. In Tableau, for example, every chart you see has a connected data source that supplies the data. Use these interactive flashcards to learn about some common data sources.
Read the term on each card, then click on the card to reveal the term’s description. Click the right-facing arrow to move to the next card, and the left facing arrow to return to the previous card.
- Tableau blog: Find hidden insights in your data: Ask why and why again
- Book: Few, S. (2021). Now You See It: An Introduction to Visual Data Sensemaking (2nd ed). Analytics Press, 29-32.
- Website: Perceptual Edge, Stephen Few’s professional website
- Coursera: The Data Scientist’s Toolbox (course registration required)
- Tableau: Mission
You now understand what data literacy means, how important questions are, and which traits are useful for working effectively with data. You also know how data is defined, how it's collected, and where it’s located.