ScaleDB Technology Overview
ScaleDB is a pluggable storage engine for MySQL that delivers enterprise-class scalability, dynamic cloud elasticity and high-availability. It delivers all of these advantages, while maintaining compatibility with MySQL applications and tools. ScaleDB uses a shared-data architecture, where each node in the cluster has access to the entire database. As a result, ScaleDB clusters handle failure or removal of database nodes, without a problem. Furthermore, as your database usage increases, you can simply add nodes to scale performance. In addition to leveraging shared-data clustering, ScaleDB employs an innovative Multi-Table Index that also delivers significant advantages decribed below.
ScaleDB for MySQL provides the following capabilities:
- Cloud elasticity (adding/removing nodes on the fly)
- Enterprise-class high-availability
- Compatibility with MySQL applications and tools
- ACID compliance
- Innovative high-speed indexing
- Graceful fault-tolerance
- Automatic data recovery
- Plug-and-Cluster™ simplicity
- Shared-data clustering architecture
- High-performance transaction processing
- Row-level locking
- Multi-node concurrency control
In keeping with the demands of the enterprise market, ScaleDB rigorously adheres to the principles of ACID compliance.
Shared-data clustering has long been associated with mission-critical applications. IBM’s IMS® and Oracle’s RAC® databases run many banks, ATM networks, and Global 2000 companies. Shared-data clustering delivers high-availability through redundant nodes that share the data and provide inherent fail-over and recovery. Unlike shared-nothing, which imposes a rigid design for static workloads, shared-data is able to adapt to the changing needs of the users. For this reason, shared-data is the architecture of choice for Online Transaction Processing (OLTP) applications. Because it enables multiple nodes to process data from a single data store, shared-data is also used with some Online Analytical Processing (OLAP) applications where the data is stored in a central location.
Until now, the shared-data architecture has only been available for extremely expensive commercial databases. ScaleDB brings this high-end architecture to MySQL, delivering it for pennies on the dollar compared to commercial databases.
Vern Watts, collaborated with Moshe Shadmon on ScaleDB’s architecture. Vern is known as the father of shared-data clustering. In his 48 year career at IBM, Vern was the architect of IBM’s IMS® database, which released a shared-data implementation in 1980, a full twenty years before Oracle released RAC. IBM IMS remains the workhorse of IT systems around the world.
The Main Benefits of ScaleDB’s Shared-Data Clustering Include:
- Eliminates data partitioning or sharding (and repartitioning)
- Elasticity: Database nodes can be added/removed on the fly
- Inherent and seamless fault-tolerance
- Simple Plug-and-Cluster expansion of your cluster
- Enables powerful clusters based on low-cost commodity hardware
- Dynamically adapts to changing usage demands
- Scales CPU performance by adding compute nodes
- Scales I/O performance by adding storage nodes
B-tree indexing only indexes the information inside distinct tables. ScaleDB’s Multi-Table Index indexes both the information inside the tables and the relationships between those tables. Because the index understands the relationships between tables, it is able to perform multi-tables joins with the performance of a single-table lookup.
Main Benefits include:
- The performance of materialized views, without the set-up, maintenance or stale data inherent in materialized views
- Referential integrity is inherently maintained with minimal overhead
- Small memory footprint (typically 15% – 25% of B-tree equivalent, depending on key length), reduced storage requirements
- Built and manipulated using SQL
- No special coding requirement in application design and development
ScaleDB’s Multi-Table Index technology leverages the indexing technologies utilized in routing and some in-memory databases. Significant improvements on these indexing technologies—developed over a 15 year period and patented by ScaleDB—enabling ScaleDB to deliver a significant advance to general purpose disk-based databases.
ScaleDB provides an open source API which can be found here.License type: GPL V2.
ScaleDB provides a clustered, shared-data storage engine. The shared-data architecture enables multiple nodes to share the same physical data, eliminating the need to partition your data.
The diagram below shows a multi-node cluster. Each node runs the ScaleDB storage engine in conjunction with a DBMS like MySQL. The ScaleDB Cluster Manager then orchestrates the interaction of these nodes.
The Cache Accelerator Server (CAS) provides both storage and shared cache. One role of the CAS is to provide a high-speed cache for sharing data between database nodes. Another other role is to provide a highly-available low-cost set of storage devices. The final role of the CAS is to provide tight integration between storage and computing. If a database node requires some filtered or processed data, instead of moving the entire data set over the network to the node, the CAS can process it locally and only send the much more compact result set to the node. This sort of local processing at the storage tier is similar to what Oracle Exadata does.
All data sent to the CAS is mirrored into two or more CAS servers. This provides the redundancy of high-end storage, while running on commodity servers or cloud instances.ScaleDB CAS are designed to share the data, in conjunction with the Cluster Manager, amongst all of the database nodes. This eliminates the cost, complexity and overhead of a cluster file system.
Finally, as your data grows, or if the storage layer becomes a performance bottleneck, you can add additional pairs of CAS servers. ScaleDB dynamically redistributes the data across the new set of CAS servers, providing improved storage throughput. Basically, if your database application is CPU bound, add more database nodes, if it is I/O bound, add more CAS nodes.
This architecture provides:
- Dynamic Compute Scalability: Additional nodes can be added to the cluster without the need to repartition your data
- Dynamic Storage Scalability: Add more CAS pairs, as needed, to expand your storage capacity and throughput
- High-Availability: There is no single point of failure. If a node fails, the remaining nodes gracefully take-over uncommitted transactions. If the Cluster Manager fails, the standby Cluster Manager takes over. The data on disk and in the cache is mirrored to protect against disk failure
- Cluster-Level Load Balancing: Since each node can address any database request in a master-only model, the load can be balanced across the entire cluster
- Lower Total Cost of Ownership (TCO): The load-balanced, master-only architecture eliminates the complex process of data partitioning and the synchronization between masters and slaves. This significantly reduces total cost of ownership
The following diagram describes the various modules of the ScaleDB architecture:
For Each Node:
- ScaleDB API – Exposes the ScaleDB functions to DBMS and applications that leverage ScaleDB.
- Transaction Manager – Ensures that the transactions are safe, guaranteeing completion or else rolling-back the uncommitted transaction in case of failure.
- Index Manager – This leverages our Multi-Table Indexing engine to facilitate rapid access to data.
- Data Manager – Coordinates reading and writing of data to files.
- Buffer Manager – Manages the machines’ local cache on each node to improve efficiency and performance. It coordinates with the Global Buffer Manager on the Cluster Manager to ensure that each node is aware of changes made by other nodes.
- Log Manager – Maintains the local log, which is used for rolling back uncommitted transactions and also for failure recovery.
- Lock Manager – Manages local lock management and coordinates with global locking. This insures, among other things, that no two nodes are changing the same information at the same time.
- Recovery Manager – Coordinates with the Global Recovery Manager to ensure that upon failure of the node, it can be recovered gracefully.
- Storage Manager – This coordinates the flushing of the data to disk.
For Each Centralized Cluster Manager:
- Global Buffer Manager – Orchestrates the interactions of the various local Buffer Managers on each node in the cluster.
- Global Lock Manager – Ensures that the various nodes in the cluster are orchestrated in their efforts to establish and release locks the data. ScaleDB implements row-level locking.
- Global Recovery Manager – Orchestrates the recovery process in case of a node failure such that the data integrity is maintained throughout and after the recovery process.
- CAS Storage Expansion – Manages the addition of CAS pairs and the redistribution of data among the nodes
For Each Cache Accelerator Server (CAS):
- CAS Interface – Provides the interface between the Cluster Manager and the various database nodes in the cluster, without the need for a cluster file system
- CAS Cache – Provides the shared cache, which enables the database nodes to share data while minimizing disk utilization. Both the cache and storage in the CAS nodes are mirrored for high-availability
- CAS Storage – Handles the storage for the CAS and interfaces with the cluster manager for redistributing data when additional CAS pairs are added to the storage tier.
- CAS Processing – Provides local processing of data in a manner that is analogous to Oracle Exadata