Metadata
Full Title Database Internals Author Alex Petrov Goodreads URL Goodreads Link
Part 1 - Storage Engines
- Primary role of a Database Management System is to reliably store data and make it available to users. It should centralize data storage, enhance application logic by abstracting infrastructure concerns.
- Databases are modular components and includes:
- Transport Layer accepting requests
- Query Processor determining the most efficient way to run queries
- Execution Engine carrying out the operations
- Storage Engine: Storage engines manage data storage, retrieval, and manipulation, acting as the foundational layer for database management systems.
1. Introduction and Overview
- Online Transaction Processing Database: These handle a large number of user-facing requests and transactions. Queries are often predefined and short-lived.
- Online Analytical Processing Database: These handle complex aggregations. OLAP databases are often used for analytics and data warehousing, and are capable of handling complex, long-running ad-hoc queries.
- Hybrid Transactional and Analytical Processing (HTAP): These databases combine properties of both OLTP and OLAP stores.
DBMS Architecture
- The design of database management system is highly customized and lacks a universal blueprint.
- Some sources have tried to describe the architecture:
- Below architecture demonstrates some of the common theme in these representations:
-
Query Processing
- Transport Subsystem
- Receives and hands over queries to the query processor.
- Responsible for communication within the database cluster.
- Query Processor
- Parses, interprets, and validates queries.
- Performs access control checks.
- Query Optimizer
- Eliminates impossible and redundant query parts.
- Finds the most efficient execution plan based on internal statistics and data placement.
- Handles relational operations and optimizations like index ordering and choosing access methods.
- Execution Engine
- Executes the chosen execution plan.
- Manages results from local and remote operations, including data replication.
- Transport Subsystem
-
Storage Engine Components
- Transaction Manager
- Schedules transactions.
- Ensures logical consistency.
- Lock Manager
- Manages locks on database objects.
- Ensures physical data integrity.
- Access Methods (storage structures)
- Buffer Manager
- Caches data pages in memory.
- Recovery Manager
- Maintains operation logs.
- Restores system state after failures.
- Transaction Manager
-
Transaction and Lock Managers are responsible for Concurrency Control
- Ensures logical and physical data integrity.
- Optimizes concurrent operations.
Memory-Versus Disk-Based DBMS
- Types of Storage Systems
- In-Memory DBMS
- Stores data primarily in memory, uses disk for recovery and logging.
- RAM is the main storage medium, offering faster access times.
- Find example of this type of database
- Disk-Based DBMS
- Holds most data on disk, uses memory for caching and temporary storage.
- More reliant on persistent storage devices like SSDs and HDDs.
- Find example of this type of database
- In-Memory DBMS
- Memory vs. Disk Access
- Memory access is significantly faster than disk access.
- Decreasing memory prices make in-memory storage more feasible, though RAM remains more expensive than persistent storage.
- Differences in Design and Optimization
- In-Memory DBMS
- Optimized for performance, lower access costs, and access granularity.
- Simpler programming due to OS abstracting memory management.
- Disk-Based DBMS
- Requires manual management of data references, serialization, and fragmentation.
- More complex due to varied data structures and organization techniques.
- In-Memory DBMS
- Challenges and Limitations
- In-Memory DBMS
- RAM volatility poses durability issues (e.g., data loss due to crashes or power outages).
- Solutions like uninterrupted power supplies and battery-backed RAM add complexity and cost.
- Disk-Based DBMS
- Easier to maintain with lower costs compared to in-memory solutions.
- In-Memory DBMS
- Future Trends
- Non-Volatile Memory (NVM)
- Promises to reduce or eliminate asymmetry between read and write latencies.
- Enhances performance with byte-addressable access.
- Potential to change the balance between in-memory and disk-based systems as NVM technology evolves.
- What is the state as of 2024? Is there anything better than NVM?
- Non-Volatile Memory (NVM)
- Summarizing Memory-Versus Disk-Based DBMS
- In-memory databases offer superior performance due to faster access times, but are limited by RAM volatility and cost.
- Disk-based databases, while slower, are easier to maintain and more cost-effective.
- The rise of Non-Volatile Memory (NVM) could significantly impact the future of database storage, potentially bridging the gap between memory and disk storage.
