Understand Common Data Analysis Use Cases
After completing this unit, you’ll be able to:
- Explain why data analytics is relevant in modern business.
- Explain how data analytics tools are used in common scenarios.
Use Data Analytics for a Complex World
What do gaming, commerce, and social media have in common? These verticals each produce a lot of data that organizations use to improve their services, as well as detect and troubleshoot issues. In the following video, Raf explores the common verticals and use cases where data analysis is present in everyday life.
Are we talking about 100 lines of data? 1,000? In some cases, it can be hundreds of thousands—or even millions! How do you work with it all?
The quiz at the end of this unit asks questions about the content of this video. Be sure to watch so you get the information you need to answer the questions at the end of this unit.
[Raf] Now that you know the difference between the different types of data analysis, let me show you some examples on how data analytics is very likely to be present right now in your life, both as a consumer, and perhaps professionally. Data analytics is widely present in many verticals today, such as gaming; social media feeds; ecommerce, online stores; websites; statistics, also known as clickstreaming; recommendation engines; Internet of Things, or IoT; log processing; and much more.
Let me give you a couple of examples on where data analytics is valuable in some of those scenarios, so you grasp what is the exact purpose of data analytics in these contexts. Let's say you like to play computer games, like me. Who doesn't, right?
So, if you like to play games, either in your phone, your computer, or in the game console, you may be familiar with the check box you may need to check before you starting play. That chat box usually says something like Send anonymous data statistics for game developers to improve gaming experiences, or such. What this does is basically allowing to collect information regarding the way you play the game in order to detect potential crashes, design failures, and other data. It is clear in this case, that real-life data, such as you playing your game, is being transformed into information that helps developers to circumvent potential issues and enhance the gaming experience. That is exactly why data analytics exists, and why it is so relevant for the modern world.
You may ask, why is this a thing nowadays? I've been playing games since childhood, and that was not the case. Games were sold on cartridges. We used it to buy and play, right? Well, yes. But if you think with me, those games were not that complex as the games we have today. And that's what I want to conclude.
Analytics helps people develop insights, and those insights help them to deal with complex problem solving. No matter if it is regarding gaming, stock markets, real-estate data, traffic information, fashion computer systems, web server or security logs, data analytics help to provide answers to complex scenarios.
With storage prices going down day after day, companies often collect data that they may currently not have a use for. However, if a question arises tomorrow, the answer can be in the data they had previously collected.
The world nowadays is becoming more complex than it was 10 years ago. And having the help of computer systems is instrumental for two main reasons. Scalability, and data-driven decision making. Another major part of data analysis is log analytics. Let me dig a little deeper into this one, because that is where I will mainly focus during this course, specifically regarding security logs.
When we talk about log analysis, we're usually talking about the information produced by computer systems, based on events. That event can be an HTTP request made to a webpage, user logging information, API calls, or any other type of requests. API is the acronym for application programming interface, which is basically a computing interface which defines interactions between multiple software intermediaries.
It defines the kinds of calls or requests that can be made, how to make them, the data formats that should be used, the conventions to follow, etc. And for the data analytics standpoint, it is very common to log all those activities somewhere.
A classic example of data analytics is using web server logs to extract insights about visitors in a website. Let's say every request made to an HTTP server is logged into files in a file system. Those are mostly called access logs. If you have one new line added to the access log for every visit in your website, you can say that the number of lines in this log is equivalent to the number of requests that was served by the web server.
If you just have one server and a small website with a couple of visits per minutes, you can use some basic tools, like text editors, to parse those files and extract what you're looking for. But if you want to do something slightly more useful than just summing up lines in the log file, the usage of a data analysis tool is key.
We encourage using data analysis tools everywhere, but we need professional ones that handle scale when we want to do log aggregation and visualization. Imagine you have tens of web servers attempting thousands of users per second. You can estimate that each log file on each server will be filled up pretty quick. So, you need to have all that data concentrated somewhere.
In addition, you may need to have a way to visualize that data in a line chart, which could easily help you with identifying spikes, also called deviations or outliers. Another big use of data analysis nowadays is data security. If you have systems producing security logs in a way you can quickly get to in order to extract analytics, you were in clear advantage if you need to pinpoint when a request was made, by whom, from when, from where, and what was the system's response to that request.
If you reach into the level of doing predictive analysis on top of this data, you can even reach a state where you will automatically block bad requests to computer systems before they occur, or creating a self-healing architecture that starts building a failover environment when a current environment is presenting degradation.
That can be achieved with the help of infrastructure automation tools in the cloud. There is an AWS service called AWS CloudTrail, which logs API activity made to an AWS account, and another AWS service called Amazon S3, which is a storage service. Let me briefly talk about them.
This is what CloudTrail stores every time you or someone logs in to your AWS account by using the AWS Management Console. That is stored in Amazon S3, and contains information such as who made the request, from which IP address, what was the request for, what was the answer to that request, and some other useful compliance information that can quickly turn into evidence, if needed. Because of that nature, CloudTrail is a service that enables infrastructure governance, operational auditing, and risk auditing for your AWS account.
But if you need to dig into CloudTrail text data every time, it may be something hard to achieve. So learning data analytics helps a lot to unleash what you can do with all this compliance data. If you had data visualization tools on information produced by CloudTrail, you can have security dashboards containing graphics and alerts of unusual activities. If you suddenly start seeing logs of login-failure activities, it may be because someone is trying to log in to your AWS account, or because you changed the password and forgot it.
I usually say that data security analytics is not good only for compliance reports, but also very useful for troubleshooting. If you apply that concept to firewall packets, networking activity, load balancer, and server logs, and other kind of infrastructure topics, you can easily identify outliers and turn yourself into a quick problem solver. But always think about what else could you be using data analytics for, and how it helps you on getting stronger insights about what's going on, no matter if it is regarding security, product improvement, better customer experience, or any other part of the data analysis spectrum.
Since the sky is the limit, in the next video, I will be talking about why doing all these in the cloud gives you some serious advantages and how it helps on enabling data analytics everywhere, anytime, for everyone.
Did You Watch the Video?
Remember, the quiz asks about the video in this unit. If you haven't watched it yet, go back and do that now. Then you'll be ready to take the quiz.