ScaleDB offers outstanding performance results for both ingestion and query rates, independent on how big the data table grows. In fact, as the cluster scales and the number of available database nodes resources increases, so does the ingestion and query rates. The performance improvements are linear and quickly outperform any other MySQL (and NoSQL) storage engines. Even a single node configuration, ScaleDB ONE, outperforms InnoDB by 10 times and cluster ScaleDB versions are far better than that.

ScaleDB has performed the following benchmarks for various cluster configurations on Amazon AWS and compared the results to each other and InnoDB. The results and details are presented below. A benchmark related to the previous version (15.10) on bare metal is available here.

 

Results

A ScaleDB Cluster 16.04 of up to 12 nodes in various configurations. The largest configuration was formed by 5 database nodes, 6 storage nodes and 1 cluster manager. The write workers are co-located on the database nodes. Each worker is based on a simple Perl script with 56 threads. The EC2 instances are c3.8xlarge on a 10G network and MariaDB 10.1. A 6 Storage nodes cluster supports ~5.1M inserts/second on Amazon instances while running the concurrent queries as detailed below. At 5M inserts/second, we reached the limit of the 10Gb network.

 

The Benchmark

We have identified a few “standard” time-series data benchmarks, including the Linear Road Benchmark, and the NexMark. We intend to have third parties test and publish the results of standard benchmarks soon.

 

Amazon EC2 Instance Description (c3.x8large)

Each Amazon instance has the following configuration:

 

Demo Specifications

We ran our tests with 5 configurations:

  • I1: 1 Node test with InnoDB – 1 Write Worker co-located with the database server, 1 Reader node
    I1
  • S1: 1 Node test with ScaleDB ONE – 1 Write Worker co-located with the Database, Storage and Cluster Manager node, 1 Reader node
    S1
  • S2: 2 Nodes test with ScaleDB Cluster – 2 Write Workers co-located with 2 Database nodes, 2 Storage nodes and 1 Cluster Manager node, 1 Reader node
    S2
  • S4: 4 Nodes test with ScaleDB Cluster – 4 Write Workers co-located with 4 Database nodes, 4 Storage nodes and 1 Cluster Manager node, 1 Reader node
    S4
  • S6: 6 Nodes test with ScaleDB Cluster – 5 Write Workers co-located with 5 Database nodes, 6 Storage nodes and 1 Cluster Manager node, 1 Reader node
    S6

 

Writes

We inject time series data that simulate till records from online and retail stores:

  • The row size is 105 bytes, it includes the transaction and item id, datetime, store, account, product and amount. The MySQL schema has row id and time indexes. (see details below)
  • We run tests to load up to 13 billion rows using extended inserts of 13,000 rows each.
  • The scripts load data in parallel. There are up to 24 parallel threads injecting data at the same time for each Database node.
  • Read operations are executed during the loading, (see details below).
  • Data is indexed by time.
  • The fact table during the test grows from 0 up to 12 Billion rows.
  • The storage files grow from 0 up to 2.5TB.

 

  • I1: 1 Node with InnoDB Inserts
    InsI1
  • S1: 1 Node with ScaleDB ONE Inserts
    InsS1
  • S2: 2 Storage Nodes Inserts
    InsS2
  • S4: 4 Storage Nodes Inserts
    InsS4
  • S6: 6 Storage Nodes Inserts
    InsS6

 

Reads

We execute these four queries:

Query 1. Count the rows inserted at a given time range (the time range will be related to approximately 100M rows)
Query 2. Filter approximately 15M most recent rows inserted to ca. 15K rows
Query 3. Group and analyze approximately 100M most recent rows inserted
Query 4. Group and analyze approximately 100M most recent rows inserted then join with dimension table

 

Results: 12 Billion Rows Load

The following table and graphs show the performance of the queries executed while the system is constantly loading data. The figures are in seconds and they represent the average query time used to execute the same query at intervals of 30 seconds. You can see from the results that read performance improve linearly as more nodes are added to the cluster. The second graph shows only the results from ScaleDB, as linearity is more evident without taking into consideration the results with InnoDB.
 
table

TimePerQueryWithInnoDBTimePerQuery

 

Database Schema


CREATE TABLE IF NOT EXISTS test.stores (
       name    char(25) NOT NULL DEFAULT ‘’,
       street  char(25) DEFAULT NULL,
       city    char(25) DEFAULT NULL,
       state   char(2)  DEFAULT NULL,
       zipcode char(5)  DEFAULT NULL,
       phone   char(12) DEFAULT NULL,
       PRIMARY KEY (name) )
       ENGINE=InnoDB DEFAULT CHARSET=latin1;


-- ScaleDB Version
CREATE TABLE IF NOT EXISTS test.payment (
       sale_id     bigint(20) UNSIGNED NOT NULL
                              AUTO_INCREMENT,
       create_time timestamp  NOT NULL
                              DEFAULT ‘0000-00-00 00:00:00’,
       account     bigint(20) UNSIGNED NOT NULL DEFAULT ‘0’,
       store       char(25)   NOT NULL DEFAULT ‘’,
       product     char(25)   NOT NULL DEFAULT ‘’,
       coupon      char(18)   NOT NULL DEFAULT ‘’,
       amount      decimal(8,2) NOT NULL,
       PRIMARY KEY (sale_id)     STREAMING_KEY = YES,
               KEY (create_time) RANGE_KEY     = SYSTEM )
       ENGINE=ScaleDB TABLE_TYPE = STREAMING;


-- InnoDB Version
CREATE TABLE IF NOT EXISTS test.payment (
      sale_id      bigint(20) UNSIGNED NOT NULL
                              AUTO_INCREMENT
      create_time  timestamp  NOT NULL
                              DEFAULT ‘0000-00-00 00:00:00’,
      account      bigint(20) UNSIGNED NOT NULL DEFAULT ‘0’,
      store        char(25)   NOT NULL DEFAULT ‘’,
      product      char(25)   NOT NULL DEFAULT ‘’,
      coupon       char(18)   NOT NULL DEFAULT ‘’,
      amount       decimal(8,2) NOT NULL,
      PRIMARY KEY (sale_id),
      KEY create_time (create_time) )
      ENGINE=InnoDB; 

 

INSERTs and SELECTs

 

INSERTs

  • Extended inserts are 13,000 rows each.
  • 24 concurrent write threads are executed by the Write Worker on each Database node.

The timeframe is adjusted to reflect 90-95 million rows scanned for queries 1, 3, and 4; and 16-17M rows for Query 2.

 

SELECTs

NOTE: The differences in insert rates between InnoDB, ScaleDB ONE and ScaleDB Cluster, must be accounted for when creating queries that address a fixed time range. Since InnoDB inserts approximately 180,000 rows per second and ScaleDB 6 Storage nodes Cluster 5,140,000 rows per second, it would not be fair to compare a 1-second query that only addresses 180,000 rows in InnoDB and 5,140,000 rows in ScaleDB Cluster. Please update the time-based queries below to normalize for insert rates. 
 

Query 1
SELECT count(*)
  FROM test.payment
 WHERE create_time > timestampadd(second,-120, now())
   AND create_time < timestampadd(second, -60, now());
 

 

Query 2
SELECT *
  FROM test.payment
 WHERE create_time > timestampadd(second, -70, now())
   AND create_time < timestampadd(second, -60, now());
   AND amount < 99.99
   AND store = ”Chicago1″
   AND coupon = ”save15″;
 

 

Query 3
SELECT store, count(*),
       sum(amount), avg(amount),
       min(amount), max(amount)
  FROM test.payment
 WHERE create_time > timestampadd(second,-120, now())
   AND create_time < timestampadd(second, -60, now())
 GROUP BY store;

 

Query 4
SELECT t1.store, street, phone, count, sum, min, max, avg
  FROM ( SELECT store,
                count(*) AS count,
                sum(amount) AS sum,
                min(amount) AS min,
                max(amount) AS max,
                avg(amount) AS avg
           FROM test.payment
          WHERE create_time > timestampadd(second,-120,now())
            AND create_time < timestampadd(second, -60,now())
       GROUP BY store ) t1, test.stores t2
 WHERE t1.store = t2.name;