What is the difference between batch and stream processing?
The typical answer when someone describes the difference between batch processing and stream processing is that batch data is collected, stored for a period of time, and processed and put to use at regular intervals (e.g. payroll, bank statements) while streaming data is processed and put to use as close to the instant it is generated (think of alerts from sensor data).
While accurate, this answer fails to capture why the difference is important and why companies are moving decisively toward stream processing architectures.
We experience the world as a constant stream of events. We make decisions by comparing this stream of information to our experiences and memories. We perceive and react to threats or recognize and seize opportunities. And often reacting in a timely fashion is rewarding – we avoid the snake bite or grab the best seat at the movie theater. Stream processing more closely reflects this very human mode of experience.
Enterprises ingest as many streams of information as they can handle, look for patterns in the data that represent threats or opportunities as it flows past, and when said patterns emerge, they act. The cost of not acting could be a data breach or a lost revenue opportunity.
Batch processing still works well when you need to process huge amounts of data and the results can be delivered at regular intervals. But if recent trends hold, more of these jobs will move to streaming because companies can't accept the hidden cost of batch any longer and remain competitive.
A great example is insider trading. The cost of detecting someone who is about to execute an insider trade is now much less than the cost of trying to unwind that trade later when batch processing picks it up. Even if the batch process runs every five minutes, that just means you'll find them sooner, not stop them. Ultimately stream vs. batch will show up in the balance sheet and the stock price.
The one potential argument against streaming is that it might not handle the amount of data as cost effectively as batch handles. However, with the advent of systems like Kafka, Flink, and their cloud analogues, such cases are getting rare.
More Q&As from our experts
- What's the difference between model-driven AI and data-driven AI?
- How are AI and machine learning changing risk management?
- How can bias be detected in machine learning?
- Batch Processing
- Stream Processing
- Real-Time Data Processing
- Data Modeling
- Information Assurance
- Web Content Management
- Comma-Separated Values File
Tech moves fast! Stay ahead of the curve with Techopedia!
Join nearly 200,000 subscribers who receive actionable tech insights from Techopedia.
- The CIO Guide to Information Security
- Robotic Process Automation: What You Need to Know
- Data Governance Is Everyone's Business
- Key Applications for AI in the Supply Chain
- Service Mesh for Mere Mortals - Free 100+ page eBook
- Do You Need a Head of Remote?
- Web Data Collection in 2022 - Everything you need to know