Deploying ScaleDB

This page describes ScaleDB from an operational and deployment perspective. It explains the components that make up a ScaleDB cluster and how they work together to enable a highly-available and elastic cluster.

Description

ScaleDB is a complete and integrated platform that extends MySQL, turning it into a clustered database that is ideally suited for both online transaction processing (OLTP) and data warehouse (DW) applications. ScaleDB uses a shared-data architecture, delivering high-performance and high-availability across a wide range of applications running either on commodity servers or in the cloud. ScaleDB is tuned to support large data sets. It is a disk-based database and storage platform that leverages caching at multiple tiers to enhance performance.

ScaleDB is deployed as a cluster of nodes connected by a network. It operates at multiple tiers – At the database tier, multiple database engines operate over shared data. The data is managed at the storage tier by a collection of storage nodes. All database operations are synchronized by a distributed lock manager.

ScaleDB enables sharing data among multiple MySQL server instances. Each of the MySQL server instances considers ScaleDB as a local storage engine. ScaleDB coordinates the processes at a cluster level, ensuring that all servers in the cluster maintain an identical view of the entire data set at all times.

ScaleDB is Comprised of 3 Components

  1. Database Nodes: MySQL passes data requests to ScaleDB via MySQL’s Storage Engine API. These low-level data requests are passed to ScaleDB’s open API, which is available here. Developers can also interact directly with the ScaleDB platform using this API.
  2. Cluster Manager Nodes: The Cluster Manager provides distributed locking functionality. It synchronizes the processes across the cluster, ensuring that conflicts are resolved while maintaining the highest possible performance.
  3. Storage Nodes: The Storage Nodes provide data persistence, pooled caching, parallel processing and high-availability. The data is stored in blocks, which are distributed and mirrored across the Storage Nodes. Each Storage Node maintains a cache of its owned data and the collection of Storage Nodes provides a pool of cache to all the Database Nodes at the database tier. The Disk I/Os are parallelized across the Storage Nodes, delivering superior performance. This architecture enables ScaleDB to ship certain processes from the database engine nodes to the storage nodes, enabling them to process the data in parallel, sending only the results back to the Database Node.

High Availability

ScaleDB has no single point of failure; each storage node is mirrored, so that if a storage node fails, the system continues to operate. The cluster manager has a standby image so that if it fails, the standby kicks in to manage the cluster operations. If a database node fails, its uncommitted transactions are rolled back and the queries are sent to any one of the surviving database nodes in the cluster.

ScaleDB Cluster Diagram

  1. Database Nodes (3): Each of the Database Nodes has a MySQL server instance and a ScaleDB database instance
  2. Cluster Manager Node and a Standby Cluster Manager for high-availability.
  3. ScaleDB Storage Nodes (5+5): Each Storage Node is a mirrored pair for high-performance and high-availability.

ScaleDB Cluster Diagram

Sizing Your ScaleDB Cluster

The numbers of nodes depends upon a number of factors: data volume, number of users, types of queries, transaction volume, whether your database is CPU-bound or I/O-bound. A cluster can start with two database nodes, a single pair of storage nodes (main and mirror) and a pair of cluster managers (main and standby). For a high level of processing at the database tier, additional database nodes are added, such that the processing is distributed among the additional database nodes. For I/O-bound databases, additional storage nodes can be added, enabling I/Os to be distributed over the additional nodes.

Configuring & Initiating a ScaleDB Cluster

Each node uses two configuration files, a public configuration file and a private configuration file. The public configuration file is common among all nodes in the cluster and defines the universal aspects of the cluster. The private configuration file defines the characteristics unique to the function of the specific node. The private configuration file contains the IP address and the port of the initial storage node in the cluster. When a node is added to the cluster, it connects to the initial storage node where it receives the public configuration file. The private configuration file also contains any node-specific modification of the public configuration file. For example, the public configuration file might define the standard cache size at 100GB for each node in the cluster. A specific node might have less memory available for caching, so that private configuration file might be set at 50GB for that particular node.

When bringing up a cluster, the storage nodes need to be initialized first, the cluster manager second and the database nodes third. When a node joins, it connects to the running storage node, gets the public configuration file, connects to the cluster manager using the parameters on the public configuration file (provided by the storage nodes) and joins the cluster.

Initiating a cluster is a 3 step process:

  1. Start the Storage Layer: The cluster is initiated by starting the storage nodes. The first storage node, and its mirror, are configured to read the public configuration file.
  2. Start One or Two Cluster Managers: The cluster manager starts by reading a private configuration file, which includes the IP address and port of the first storage node, where it gets the public configuration file.
  3. Start One or More Database Nodes: MySQL initiates ScaleDB as the storage engine. When ScaleDB starts, it reads a private configuration file, which directs it to the first storage node, where it receives the public configuration file.

Status

ScaleDB is in beta now and the General Availability release will be in 2014. ScaleDB is commercial software, it is not open source. However, it is free to download and try. You can download and try ScaleDB here. Here is a link to our Product Manual if you would like to learn more.

ScaleDB provides world-class professional services to assist you with designing, developing, deploying, and delivering applications built on the ScaleDB platform. ScaleDB professional services combines a deep expertise in all aspects of database development, with direct access to ScaleDB’s research and development team for resolving any issues that might arise. To receive more information about ScaleDB professional services, email us at support@scaledb.com or call (650) 587-8787.