Data Analytics Overview(OLTP vs OLAP)
You’ve likely heard about the lucrative careers of high-level Data Analysts, but have you ever wondered how they actually handle "Big Data"? Of course, they aren’t just using Microsoft Excel. To understand the technology behind it, we must first distinguish between the two primary ways we process information: OLTP and OLAP.
1. OLTP
OLTP (Online Transactional Processing) is the DB used for company’s daily operations. Whenever you post on Reddit or place an order on Amazon, you are interacting with an OLTP database. It is triggered by the business's customer and handled by the backend server so that it can provide the business.
Fast: It handles millions of requests with millisecond response times.
Small&Frequent: It focuses on small, frequent transactions—usually just a few rows at a time.
Write-intensive: Its primary job is to record or update new data as it happens.
2. OLAP
OLAP (Online Analytical Processing) is the very technology we're looking for. These systems aren't built for the general customer; instead, they are designed for Data Analysts and Data Scientists.
Complex: Queries are often much more complex, involving functions like
EXTRACTorGROUP BYto summarize vast datasets.
- Read-intensive: It hardly writes or updates. It almost entirely reads.
Patience: Unlike the instant nature of OLTP, OLAP workflows can take seconds, minutes, or even hours. This is expected when processing historical data at scale.
3. Where is the Data Kept?
Since the requirements of an analyst differ so much from those of a customers, we use some specialized storage architectures: Data Warehouses and Data Lakes.
| Feature | Data Warehouse | Data Lake |
| Data Format | Highly structured (Schema-based) | Raw (Structured, semi, or unstructured) |
| Optimization | Built for fast analytical performance | Built for cost-effective, massive scale |
| Flexibility | Strict table definitions; high quality | Multiple engines can access the same files |
| Access | Managed via internal tools/governance | Open access; requires metadata catalogs |
The Data Warehouse
A Data Warehouse is like a meticulously organized library. Every piece of data must fit a specific "schema" (a table definition) before it enters. This makes exploration easy and ensures high data quality, though it is often less accessible to external, third-party tools.
The Data Lake
A Data Lake is more like a vast reservoir. It stores data in its native, raw format. This is highly cost-effective and flexible, as it allows various processing engines to study the same files. However, it requires careful metadata management and lineage tracking to prevent it from becoming an unorganized "data swamp."
댓글
댓글 쓰기