Data is like fish … it decays over time
Data is like fish … it decays over time
Yes, this is a ridiculous metaphor, but I chose it because it’s memorable. There certainly is a time value on data. If we don’t act on data fast enough we can lose the opportunity that it presents. In this blog, we’ll look at the difference between streaming data and batch data and what drives business analytics.
What’s the difference?
Batch Data is where data is grouped together at a timed interval. It is often a large volume of data. Batch data is what we traditionally think of as data usage. These data are loaded in storage first before processing. In Batch Data is used you don’t need real-time analytics. Some examples are things that are produced at regular intervals — such as weekly payroll, monthly sales reports, quarterly financial reports.
Streaming Data is a continuous flow of data. It gives instant results. Analytics are run as the data is received, and there is less need for data storage. Streaming data is important when time matters and you have to make an instant business decision. Some examples are fraudulent activity monitoring, machine sensor notifications, responses to social media campaigns — things that allow you to take immediate response to the activity. In streaming data, we consider volume, velocity, and variety.
How did we get here?
Data is booming. A combination of new connected devices and sensors, online shopping and advertising, social media and digital platforms, and media consumption going digital make for a very rich data world.
A whitepaper from IDC-Seagate in 2018 predicts
“that nearly 30% of the Global Datasphere will be real-time by 2025.”
They also predict total global data to be 175 zettabytes by 2025. What the heck is a zettabyte? It is the equivalent of a trillion gigabytes. They helpfully point out that if you could store all these data on DVDs, you would have a stack that could reach the moon … 23 times.
What’s best option?
A better question is what do we want the data to do?
If we consider the three types of business analytics, it is more valuable to move the business away from plain old descriptive analytics and towards predictive and prescriptive analytics.
Descriptive Analytics uses reports to describe what happened in the past, but it doesn’t tell you about why that happened, or what will happen in the future.
Predictive Analytics is modeling past data to predict what will happen in the future. By identifying associations and relationships in the data, they discover patterns to estimate the probability of a future action. For example, which customers will respond to a type of advertising.
Prescriptive Analytics — uses analytics to suggest a course of action, helping people make decisions in their jobs — this could include things like optimization, what is the best price to charge for a product.
(See HBR-business analytics defined for a brief video on business analytics definitions)
Even in this new data world, Batch Data is not going away. Batch operations can be considered as strategic decisions where time sensitivity is low — developing long term plans, developing product strategy, where human involvement is required. So not all use cases require fresh data.
If we take the fish metaphor a step too far: If you’re going to make fish sticks, you don’t need Alaska Salmon plucked straight out of Bristol Bay.
We need to consider having a unified system for both batch and streaming data.
5 Ws and 1 H
Why — Speed and responsive. Able to scale
What — Continuous data pipeline, able to handle batches. Robust platform that gives consistent and correct answers to queries.
When — Continuous with batch
How — Scalable cloud-based
Where- Cloud-based means you can scale up and down, without limits on the users or workloads you can run. Have separate storage and compute space… more users, more data … more er fish
Who — Democratize data access within security and privacy guidelines. Users can range from Data Scientists to business analysts running Tableau or Excel.
Our end goal is a data-driven organization. Who wouldn’t want that? Well, maybe that’s not always enough. We can certainly use the data to deepen our understanding to a level never before possible, but we should never stop questioning. What is the user behavior that is driving the data? Even if the data is arriving in real-time, can we tell if this is a leading or lagging indicator? What are we not measuring that could lead to an inaccurate result?
HBR-business analytics defined by Thomas Davenport