What is DBaaS?

Database-as-a-Service (DBaaS) is a service that is managed by a cloud operator (public or private) that supports applications, without the application team assuming responsibility for traditional database administration functions. With a DBaaS, the application developers should not need to be database experts, nor should they have to hire a database administrator (DBA) to maintain the database. True DBaaS nirvana will be achieved when application developers can simply call a database service and it works without even having to consider the database. This would mean that the database would seamlessly scale and it would be maintained, upgraded, backed-up and handle server failure, all without impacting the developer in any way. From the developer’s perspective, this is the definition of DBaaS.

What is DBaaS to a Cloud Provider?

Cloud providers want to offer the “DBaaS nirvana” described above. In order to provide a complete DBaaS solution across large numbers of customers, the cloud providers need a high-degree of automation. unctions that have a regular time-based interval, like backups, can be scheduled and batched. Many o ther functions, such as elastic scale-out can be automated based on certain business rules. For example, providing a certain quality of service (QoS) according to the service level agreement (SLA) might require limiting databases to a certain number of connections or a peak level of CPU utilization, or some other criteria. When this criterion is exceeded, the DBaaS might automatically add a new database instance to share the load. The cloud provider also needs the ability to automate the creation and configuration of database instances. Much of the database administration process can be automated in this fashion, but in order to achieve this level of automation, the database management system (DBMS) underlying the DBaaS must expose these functions via an application programming interface (API).

DBaaS: Automating Database Administration

Cloud operators must work on hundreds, thousands or even tens of thousands of databases at the same time. This requires automation. In order to automate these functions in a flexible manner, the DBaaS solution must provide an API to the cloud operator. The standard API model used for cloud functions is REST (representational state transfer). One such interface that is starting to gain popularity is Trove for OpenStack, based on RedDwarf from Brian Aker’s team at HPCloud. Trove is still evolving, but it provides a standard mechanism for OpenStack cloud providers to automate many of the DBaaS functions exposed via MySQL

For a more complete list of the functions that are handled by the DBaaS see the following graphic.

 

section-3

The diagram above shows the lengthy list of traditional DBA functions that are provided by the DBaaS operator.

DBaaS: The Database Requirements

Virtualization has enabled tremendous benefits—it is the foundation for cloud computing—but running a database in a virtual machine is NOT database virtualization…not even close. The best way to describe one big difference between server virtualization and database virtualization is with an example.

section-4-1

section-4-2

The following diagram shows an example of a bank where the user has $10M on deposit. This is assuming that the database is running on a virtual machine which provides the typical cloud-based bursting whenever usage spikes. A spike in usage causes the database to elastically scale to additional virtual machines using the elastic cloud methods . The problem is that these virtual machines do not provide any mechanism for sharing the data and distributing locks . As a result, when requests to wire or withdraw money hit the three different copies of the database, each copy belie ves that the $10M is still available to satisfy these requests. Each database fulfills the request, removing a total of $30M from the account that only contained $10M. The bank just lost $20M. This is just one example of why database elasticit y—a key component of DBaaS—requires a virtualized database.

section-4-3

As the diagram above demonstrates, simply running a database in a virtual machine does not provide database virtualization. In order to provide database virtualization you also need: (a) shared data; and (b) distributed locking.

Shared Data: The various database instances must operate across a shared set of data so that each instance has a consistent view of the data. In other words, each database instance must see the exact same data at any point in time.

Distributed Locking: Whenever one database instance attempts to write to the database—such as a bank withdrawal—the other database instances must wait for this change to take effect. A distributed lock manager is required to coordinate these changes

DBaaS: Ecosystem Isn’t Important, It’s Everything!

Recognizing that an elastic cloud requires an elastic database, Amazon created the NoSQL wave, with SimpleDB. However Amazon underestimated the importance of the ecosystem to the database market. When it comes to databases, ecosystems aren’t important, they are everything.

Amazon created SimpleDB in order to provide an elastic database for their elastic compute cloud (EC2). Then they realized that their customers wanted MySQL, because it already had an ecosystem—applications, development tools, monitoring and tuning tools, books, trained DBA and application developers, and much more. It turns out that the vast majority of Amazon’s EC2 users getting raw instances and loading MySQL. Amazon quickly realized that the key to cloud database success meant providing better support for MySQL.

The NoSQL and NewSQL databases provide cloud elasticity, but they simply do not have an ecosystem with sufficient critical mass to win. It seems that there is always a new database technology that is poised to unseat “King SQL”. Each and every one has been ground to dust by the SQL ecosystem. The database graveyard includes object databases, graph databases, XML databases, object-relational databases, in-memory databases, and now NoSQL and NewSQL. These technologies tend to find a niche market, but they never achieve escape velocity because of the chicken-and-egg problem of: you cannot achieve market leadership until you have an ecosystem, you don’t get an ecosystem until you achieve market leadership.

MySQL was able to achieve market leadership based upon a confluence of once-in-a-lifetime advantages. MySQL benefited from a new and rapidly growing market that big databases were ignoring (the Web), an innovative free business model (open source), and becoming part of the standard “stack” for web development (the “M” in LAMP). NoSQL has a small chance of duplicating this feat in the cloud, but MySQL isn’t ignoring this platform. In fact, MySQL is embracing it. Now with database virtualization for MySQL, the flicker of hope for NoSQL databases to own the cloud is all but extinguished. The dominant ecosystem wins again.

Beware the “MySQL Compatible” trap. There are databases that support an interface that they claim is MySQL compatible. This presents some problems. First, these are almost invariably “almost 100% compatible”, meaning they aren’t 100% compatible. Secondly, the ecosystem includes tools that work with MySQL, but don’t work with these databases, because they aren’t MySQL. Similarly the processes that people rely on to administer MySQL are not the same as these other databases. If the goal is to leverage the MySQL ecosystem, then you really should use MySQL (or MariaDB) itself. Finally, MySQL is constantly improving and refining its capabilities. This makes it a fountain of technological innovation. It is far more fruitful to benefit from this innovation than to be constantly playing catch-up with a different database.

The Database Management System (DBMS) for DBaaS

The ultimate DBaaS requires full virtualization of the database and 100% compatibility with MySQL. There is only one solution that accomplishes both of these goals: ScaleDB. Because ScaleDB plugs into MySQL via the standard storage engine API, it exploits the entire MySQL ecosystem, without modification. ScaleDB also enables database nodes to scale elastically, while enabling them to operate over shared data under the guidance of a distributed lock manager.

In order to provide true DBaaS capabilities, the underlying database must also support the following capabilities:

  1. The ability to move database instances on the fly: Whether you need to move a database instance because of a busy server or network segment—to provide QoS—or to move to a larger or smaller instance, this is a core requirement of the DBMS upon which a DBaaS is built.
  2. High-Availability: A DBaaS must simply work, it cannot fail, or users will start to lose faith in having someone else run their database and they’ll start running their own database.
  3. Elasticity: The database must be able to scale onto additional servers. If the database cannot burst onto other servers, then the cloud is forced to dedicate servers to each database and run them at low utilization. This undermines the core business model of the cloud.
  4. No Change to Applications: If the DBaaS requires modifications to the application, as is required with sharding to handle routing and cross-shard functions like joins, then scaling is not seamless. Seamless scaling is a requirement for a DBaaS solution.

DBaaS: The Ultimate Goal

The ultimate goal of a DBaaS is that the customer doesn’t have to think about the database. Today, cloud users don’t have to think about server instances, storage and networking, they just work. Virtualization enables clouds to provide these services to customers while automating much of the traditional pain of buying, installing, configuring and managing these capabilities. Now database virtualization is doing the same thing for the cloud database and it is being provided as Database as a Service (DBaaS).