ScaleDB offers outstanding performance results for both ingestion and query rates, independent on how big the data table grows. In fact, as the cluster scales and the number of available database nodes resources increases, so does the ingestion and query rates. The performance improvements are linear and quickly outperform any other MySQL (and NoSQL) storage engines. Even a single node configuration, ScaleDB ONE, outperforms InnoDB by 10 times and cluster ScaleDB versions are far better than that.
ScaleDB has performed the following benchmarks for various cluster configurations on Amazon AWS and compared the results to each other and InnoDB. The results and details are presented below. A benchmark related to the previous version (15.10) on bare metal is available here.
A ScaleDB Cluster 16.04 of up to 12 nodes in various configurations. The largest configuration was formed¬†by 5 database nodes, 6 storage nodes and 1 cluster manager. The write workers are co-located on the database nodes. Each worker is based on a simple Perl script with 56 threads. The EC2 instances are c3.8xlarge on a 10G network and MariaDB 10.1. A 6 Storage nodes cluster supports ~5.1M inserts/second on Amazon instances while running the concurrent queries as detailed below. At 5M inserts/second, we reached the limit of the 10Gb network.
We have identified a few ‚Äústandard‚ÄĚ time-series data benchmarks, including the Linear Road Benchmark, and the NexMark. We intend to have third parties test and publish the results of standard benchmarks soon.
Amazon EC2 Instance Description (c3.x8large)
Each Amazon instance has the following configuration:
- 32 vCPU
- 60GB memory
- 2x320GB SSD Storage
- 10Gb Network with Enhanced Networking Support
We ran our tests with 5 configurations:
- I1: 1 Node test with InnoDB – 1 Write Worker co-located with the database server, 1 Reader node
- S1: 1 Node test with ScaleDB ONE – 1 Write Worker co-located with the Database, Storage and Cluster Manager node, 1 Reader node
- S2: 2 Nodes test with ScaleDB Cluster – 2 Write Workers co-located with 2 Database nodes, 2 Storage nodes and 1 Cluster Manager node, 1 Reader node
- S4: 4 Nodes test with ScaleDB Cluster – 4 Write Workers co-located with 4 Database nodes, 4 Storage nodes and 1 Cluster Manager node, 1 Reader node
- S6: 6 Nodes test with ScaleDB Cluster – 5 Write Workers co-located with 5 Database nodes, 6 Storage nodes and 1 Cluster Manager node, 1 Reader node
We inject time series data that simulate till records from online and retail stores:
- The row size is 105 bytes, it includes the transaction and item id, datetime, store, account, product and amount. The MySQL schema has row id and time indexes. (see details below)
- We run tests to load up to 13 billion rows using extended inserts of 13,000 rows each.
- The scripts load data in parallel. There are up to 24 parallel threads injecting data at the same time for each Database¬†node.
- Read operations are executed during the loading, (see details below).
- Data is indexed by time.
- The fact table during the test grows¬†from 0 up to 12 Billion rows.
- The storage files grow from 0 up to¬†2.5TB.
- I1: 1 Node with InnoDB Inserts
- S1: 1 Node with ScaleDB ONE Inserts
- S2: 2 Storage Nodes Inserts
- S4: 4 Storage Nodes Inserts
- S6: 6 Storage Nodes Inserts
We¬†execute these four queries:
Query 1. Count the rows inserted at a given time range (the time range will be related to approximately 100M rows)
Query 2. Filter approximately 15M most recent rows inserted to ca. 15K rows
Query 3. Group and analyze approximately 100M most recent rows inserted
Query 4. Group and analyze approximately 100M most recent rows inserted then join with dimension table
Results: 12 Billion Rows Load
The following table and graphs show the performance of the queries executed while the system is constantly loading data. The figures are in seconds and they represent the average query time used to execute the same query at intervals of 30 seconds. You can see from the results that read performance improve linearly as more nodes are added to the cluster. The second graph shows only the results from ScaleDB, as linearity is more evident without taking into consideration the results with InnoDB.
CREATE TABLE IF NOT EXISTS test.stores ( name char(25) NOT NULL DEFAULT ‚Äė‚Äô, street char(25) DEFAULT NULL, city char(25) DEFAULT NULL, state char(2) DEFAULT NULL, zipcode char(5) DEFAULT NULL, phone char(12) DEFAULT NULL, PRIMARY KEY (name) ) ENGINE=InnoDB DEFAULT CHARSET=latin1; -- ScaleDB Version CREATE TABLE IF NOT EXISTS test.payment ( sale_id bigint(20) UNSIGNED NOT NULL AUTO_INCREMENT, create_time timestamp NOT NULL DEFAULT ‚Äė0000-00-00 00:00:00‚Äô, account bigint(20) UNSIGNED NOT NULL DEFAULT ‚Äė0‚Äô, store char(25) NOT NULL DEFAULT ‚Äė‚Äô, product char(25) NOT NULL DEFAULT ‚Äė‚Äô, coupon char(18) NOT NULL DEFAULT ‚Äė‚Äô, amount decimal(8,2) NOT NULL, PRIMARY KEY (sale_id) STREAMING_KEY = YES, KEY (create_time) RANGE_KEY = SYSTEM ) ENGINE=ScaleDB TABLE_TYPE = STREAMING; -- InnoDB Version CREATE TABLE IF NOT EXISTS test.payment ( sale_id bigint(20) UNSIGNED NOT NULL AUTO_INCREMENT create_time timestamp NOT NULL DEFAULT ‚Äė0000-00-00 00:00:00‚Äô, account bigint(20) UNSIGNED NOT NULL DEFAULT ‚Äė0‚Äô, store char(25) NOT NULL DEFAULT ‚Äė‚Äô, product char(25) NOT NULL DEFAULT ‚Äė‚Äô, coupon char(18) NOT NULL DEFAULT ‚Äė‚Äô, amount decimal(8,2) NOT NULL, PRIMARY KEY (sale_id), KEY create_time (create_time) ) ENGINE=InnoDB;¬†
INSERTs and SELECTs
- Extended inserts are¬†13,000 rows each.
- 24 concurrent write threads are executed by the Write Worker on each¬†Database node.
The¬†timeframe is adjusted to reflect 90-95 million rows scanned for¬†queries 1, 3, and 4; and 16-17M rows for Query 2.
NOTE: The differences in insert rates between InnoDB, ScaleDB ONE and ScaleDB Cluster,¬†must be accounted for when¬†creating queries that address a fixed time range. Since InnoDB inserts approximately 180,000 rows per second and ScaleDB 6 Storage nodes Cluster 5,140,000 rows per second, it would not be fair to compare a 1-second query that only addresses 180,000 rows in InnoDB and 5,140,000 rows in ScaleDB Cluster. Please update the time-based queries below to normalize for insert rates.¬†
SELECT count(*) FROM test.payment WHERE create_time > timestampadd(second,-120, now()) AND create_time < timestampadd(second, -60, now());
SELECT * FROM test.payment WHERE create_time > timestampadd(second, -70, now()) AND create_time < timestampadd(second, -60, now()); AND amount < 99.99 AND store = ‚ÄĚChicago1‚Ä≥ AND coupon = ‚ÄĚsave15‚Ä≥;
SELECT store, count(*), sum(amount), avg(amount), min(amount), max(amount) FROM test.payment WHERE create_time > timestampadd(second,-120, now()) AND create_time < timestampadd(second, -60, now()) GROUP BY store;
SELECT t1.store, street, phone, count, sum, min, max, avg FROM ( SELECT store, count(*) AS count, sum(amount) AS sum, min(amount) AS min, max(amount) AS max, avg(amount) AS avg FROM test.payment WHERE create_time > timestampadd(second,-120,now()) AND create_time < timestampadd(second, -60,now()) GROUP BY store ) t1, test.stores t2 WHERE t1.store = t2.name;