Data Analytics Overview(OLTP vs OLAP)

You’ve likely heard about the lucrative careers of high-level Data Analysts, but have you ever wondered how they actually handle "Big Data"? Of course, they aren’t just using Microsoft Excel. To understand the technology behind it, we must first distinguish between the two primary ways we process information: OLTP and OLAP.

1. OLTP

OLTP (Online Transactional Processing) is the DB used for company’s daily operations. Whenever you post on Reddit or place an order on Amazon, you are interacting with an OLTP database. It is triggered by the business's customer and handled by the backend server so that it can provide the business.

  • Fast: It handles millions of requests with millisecond response times.

  • Small&Frequent: It focuses on small, frequent transactions—usually just a few rows at a time.

  • Write-intensive: Its primary job is to record or update new data as it happens.

2. OLAP

OLAP (Online Analytical Processing) is the very technology we're looking for. These systems aren't built for the general customer; instead, they are designed for Data Analysts and Data Scientists.

Unlike transactional systems, there is no application "shield" here. These professionals interact with the data directly using SQL editors or Business Intelligence (BI) dashboards to ask deep questions about the business.
  • Complex: Queries are often much more complex, involving functions like EXTRACT or GROUP BY to summarize vast datasets.

  • Read-intensive: It hardly writes or updates. It almost entirely reads.

  • Patience: Unlike the instant nature of OLTP, OLAP workflows can take seconds, minutes, or even hours. This is expected when processing historical data at scale.

3. Where is the Data Kept?

Since the requirements of an analyst differ so much from those of a customers, we use some specialized storage architectures: Data Warehouses and Data Lakes.

FeatureData WarehouseData Lake
Data FormatHighly structured (Schema-based)Raw (Structured, semi, or unstructured)
OptimizationBuilt for fast analytical performanceBuilt for cost-effective, massive scale
FlexibilityStrict table definitions; high qualityMultiple engines can access the same files
AccessManaged via internal tools/governanceOpen access; requires metadata catalogs

The Data Warehouse

A Data Warehouse is like a meticulously organized library. Every piece of data must fit a specific "schema" (a table definition) before it enters. This makes exploration easy and ensures high data quality, though it is often less accessible to external, third-party tools.

The Data Lake

A Data Lake is more like a vast reservoir. It stores data in its native, raw format. This is highly cost-effective and flexible, as it allows various processing engines to study the same files. However, it requires careful metadata management and lineage tracking to prevent it from becoming an unorganized "data swamp."

댓글

이 블로그의 인기 게시물

Interface of Java

Leetcode