Optimize Google BigQuery Performance
After completing this unit, you’ll be able to:
- Explain what Google BigQuery is.
- Explain best practices for optimizing data storage and queries in BigQuery.
- Explain how Tableau and Google BigQuery work together with machine learning (ML) to make business predictions.
Understand the Challenges of Storing Big Data
In the previous unit, you learned about optimizing data queries using Tableau. Let’s take a step back—how are you storing that big data in the first place? Data warehouses have become the common solution for data storage in business, as warehouses integrate and store data from multiple sources, then serve that data up for analytics.
The next question: Is your storage solution optimized for big data analytics? Many more challenges present themselves.
- It’s very difficult to design, build, maintain, and scale data warehouses for big data.
- Common data warehouses become expensive quickly as you scale.
- They’re even more expensive if you need high availability.
- Some data warehouses struggle with parallel queries.
And so on.
Use Google BigQuery to Solve Big Data Challenges
These are some of the fundamental challenges Google BigQuery answers. It’s a cloud data warehouse, part of the Google Cloud product suite, designed to scale and keep costs low. It’s optimized for running Structured Query Language (SQL) queries against massive, structured, and semistructured data sets. Use cases include:
- Ad hoc analytics
- Web logs
- Machine/server logs
- Internet of Things (IoT) data sets
- Ecommerce customer behavior
- Mobile app data
- Retail analytics
- Gaming telemetry
- Google Analytics Premium data
- Any data set that’s too large for standard relational database management system queries
As with Tableau, there are a set of best practices you can use with BigQuery to optimize your storage for analytics, and get the most out of your investment.
Know the Techniques for BigQuery Optimization
|Rely on native tables.
||With BigQuery, you optimize with iterative querying or by going to the source to get your data and repeating. Avoid using federated tables (creating a local table to represent a foreign or remote table) and instead use the Query API. This ensures high performance when you then use Tableau to query against BigQuery.
|Denormalize your data sets.
||You denormalize data when you take individual tables and combine them or create copies based on similar or duplicate fields. Denormalizing improves data querying performance. While it does mean you’re repeating and nesting fields and increasing storage needs in the process, the high performance you get with denormalization in BigQuery outweighs this tax.
|Shard your tables by date where you can.
Shard refers to the act of dividing a table into smaller partitions. This is especially useful when you have data that supports partition by date, like log data, or any data that is frequently timestamped. This simplifies the data structure and enables you to focus queries by date.
Note, when you query sharded data, use custom SQL in Tableau.
When you optimize your data and analytics, special things happen. Managers can make better informed decisions about their business, faster. People in any position of the business can do their job better, with knowledge that they’re making an impact and where they should focus.
BigQuery and Tableau can help take analytics even further with machine learning (ML).
Use Tableau to Visualize Google BigQuery ML Results
Machine learning (ML) refers to the development practice of coding a learning model for a computer and giving it thousands to millions of data points. After the model learns about the information, it can make predictions based off of that data.
ML is famous for such things as swapping the faces of famous people in viral videos, and creating speech when coupled with natural language processing. There are many useful things that businesses can use it for.
- Predicting real estate prices.
- Recommending products on a retail site.
- Predicting resource needs in areas experiencing poverty around the globe.
- Advising on sales quotas and business targets.
And so on. You can integrate BigQuery ML into your data to train models that can help make business decisions. Then, use Tableau to visualize the results.
In this case, use Tableau’s native connector whenever possible. This optimizes performance as described in the previous unit. However, you always have the option of using custom SQL if you need to.
Good Data Analytics for Everyone
As you dig into your data and analytics, you might have questions. It’s good to be curious. It’s even better to be able to answer these questions with data at your fingertips. Use these best practices to build an analytics-rich environment everyone at your company can benefit from.