Internet of Things Database
- What is the Internet of Things (IoT)?
- Internet of Things: What it Enables
- Internet of Things: Communications
- Internet of Things: Data Management Requirements
- Data Architectures for the Internet of Things
- ScaleDB: The Database of Things
What is the Internet of Things (IoT)?
The Internet of Things is the next stage in the Internet’s evolution, where traditionally unintelligent items—such as thermostats, air conditioners, watches, etc.—begin communicating via the Internet, creating a seamless intelligence that enhances human existence. Imagine your home knowing that your car is approaching, so it opens the garage door, unlocks the door to the house, turns on music and warms the hot tub. Simple actions on your part can cause a cascade of actions perfectly tailored to your personal desires. This is just one example of what happens when everyday items are infused with intelligence and the ability to communicate with each other and with various disparate sources of information. How big will the Internet of Things become? According to Gartner we will have 26 Billion devices connected to the Internet by 2020, ABI says 30B devices by 2020, while Oracle says we will have 8B people and 50B devices connected by 2020.
Internet of Things: What it Enables
The Internet of Things, through coordinated learning, integration of disparate data sources and action, creates a personal orchestra of actions that seamlessly enrich your life. It also enriches your engagement with various things, people and the various companies that create the tapestry of your life. Think of it as things, people and companies working in harmony to tailor every experience and interaction to your personal tastes, all automatically.
Imagine you sit down at a slot machine and insert your affinity card. The slot machine welcomes you by name and, knowing that you were just watching the Nascar race in your room, modifies its graphics to look and act like a Nascar themed slot machine. It knows your favorite drink and orders one for you. Knowing that your wife and kids are in the room, it may offer them a free movie so that they don’t disturb you. After an hour of play, it recognizes that you have lost $400, so it comps you with a $100 gift certificate to the local restaurant, also tailored to your personal dietary preferences. This is a typical day in the life of the Internet of Things.
From the user’s perspective, all of these actions simply happen without any involvement from him. In reality, they involve coordination between various databases, web services and devices. Certain actions or combinations of actions might trigger other actions. In the example above, the loss of $400 in a defined period of time triggered a search for the user’s dining preference and generation and delivery of a coupon for the complimentary meal certificate.
Cisco’s Vision on the Internet of Things
Internet of Things: Communications
The Internet of Things is comprised of an almost infinite array of devices and data sources that need to communicate. Devices may communicate with each other in a peer-to-peer (or device-to-device, or D2D) manner, using technology such as proximal ad hoc connections where devices discover one another and interact. This may lead to service federation or chaining. There may be a central controller or application, such as a home controller, that devices can connect to in an ad hoc fashion, either directly or via service chaining. These central applications or controllers may or may not be connected to the Internet. Devices may also discover various gateways (e.g. WiFi or mobile phones) that provide connections to some central hub. Some device may have autonomous connections such as an automobile, enabling it to communicate without a gateway such as a mobile phone. Other devices will leverage low-power radio or chirp networks enabling autonomous connectivity.
In case the communications options aren’t diverse enough, there is also the type of communication to consider. Some communications are 1-way, while others are 2-way. In the case of 1-way communications, the data can be pushed from the device (such as device status information) or to the device (such as updates to the device’s local information, firmware or software).
In short, device communications will create an increasing flood of data, by virtue of their sheer numbers and the amount of information they can create and consume. Most people, if asked, would identify Twitter as the leading creator of of Big Fast Data, or large amounts of data flooding in at very high velocity. However, automobile data, even at this nascent stage dwarfs Twitter’s data volume and velocity. A single car can generate data from all of its various systems—brakes, suspension, engine, etc., and any number of onboard sensors—on a second-by-second basis. This is just one example, but there are many such things that are beginning to generate a flood of data.
Internet of Things: Data Management Requirements
Just as the Internet of Things involves weaving together multiple connected devices, the data requirements are defined by weaving together disparate data sources. It requires the integration of customer data, billing information, device data, web services data (e.g. weather and traffic data), and more. The diversity of the data is just one of the challenges, but it is a significant one.
In addition to data diversity, data volume and velocity are considerable challenges. The Internet of Things generates a massive flood of data that dwarfs human generated data such as Twitter and Google. Devices generate a flood of data that must be ingested, evaluated for trend data and anomalies, and used to trigger various actions. The two primary characteristics of the data flood are volume and velocity. The velocity challenge puts a massive load on data management technologies, because the data may be pouring in at a rate of millions of elements a second. One might consider using in-memory systems to handle this velocity, except that it also presents the twin challenge of data volume. The data must be stored to provide a historical context for evaluating trends over time. With memory costing 465-times more than disk-based storage systems, in-memory is simply too expensive for managing this volume of data. Even systems using high-powered compression systems cannot make in-memory systems affordable at this data volume.
The next challenge facing data management in the age of the Internet of Things is real-time data processing. Much of the value of the Internet of Things is the ability of everyday items to respond to your actions, more or less in real-time. Since this response is often an orchestrated effort leveraging a variety of device inputs and other data, the ability to process the data and respond in real-time puts an extra load on the data infrastructure. In these near-real-time scenarios, systems like Hadoop, which are based on a time-consuming batch process simply do not work.
In addition to real-time or near-real-time demands, time predicates are an integral aspect of many Internet of Things data requirements. For example information from your car’s braking system should be time-stamped so that the data can be evaluated to determine trends over time. You might, for example, see that the system is showing wear and braking response is declining, indicating that it is time to change your brake pads. Time predicates are an integral capability for much of the device data generated via the Internet of Things.
Much of the data generated by an Internet of Things demands transactional integrity and high-availability. Device data often triggers certain financial actions such as billing for usage, or usage tiers. For these reasons, a common requirement is transactional integrity—ensuring that each action is registered once and only once and that any failure in the process results in a roll-back and resubmission of the entire transaction. A capability that goes hand-in-hand with transactional systems and real-time demands is high-availability. You don’t want the system to go down, and you cannot afford to lose data in a worst case scenario.
Another data management challenge is the multiple use cases of the data. Device data may be used operationally—triggering certain actions in real-time—and it may also be leveraged to glean analytic insight regarding trends and for predictive purposes. These demands typically require very different tools, and introduce a significant data integration challenge.
Data Architectures for the Internet of Things
Most data architects look at the diverse set of requirements above and begin assembling a collection of specialized data stores and a flow of data between these stores. The typical architecture involves the following components:
SQL Database: Low-volume and low-velocity data such as customer data, billing, inventory and the like are typically stored in a SQL database that provides transactional integrity and some level of fail-over or high-availability for the mission-critical data.
NoSQL Database: The NoSQL database is typically used to address the fast data ingest problem for device data. In some cases, there may be a stream processor—e.g. Storm, Samza, Kinesis, etc.—addressing data filtering and routing and some lightweight processing, such as counts. However, the NoSQL database is typically used because, unlike most SQL databases, which top out at about 5,000 inserts/second, you can get up to 50,000 inserts/second from NoSQL databases. However, NoSQL databases are not designed to handle the analytic processing of the data or joins, which are common requirements for Internet of Things applications. NoSQL effectively provides a real-time data ingest engine for data that is then moved to Hadoop using an extract, transform and load (ETL) process.
Hadoop: Hadoop provides the low-cost storage and analytic processing to handle the sheer volume of device data. However, Hadoop’s batch process can take 6-48 hours, knocking it out of the real-time analytics space. Hadoop vendors are working to minimize this impact with incremental batching and in-memory systems but these create other challenges (rigidity and cost). The bottom line is that Hadoop provides a low-cost high-volume data warehouse, but it doesn’t address high-velocity data or ad hoc analytic processing.
ScaleDB: The Database of Things
ScaleDB is a dual-purpose database: delivering both transactional processing and analytic processing. By leveraging a storage tier—similar to Hadoop HDFS, but optimized for fine grain ad hoc database processing—that delivers parallel processing similar to MapReduce, ScaleDB is able to deliver best of breed analytic performance, while at the same time delivering more traditional SQL-based transactional functionality.
ScaleDB also delivers unparalleled data ingest support, that even NoSQL cannot touch, handling 1,500,000 inserts per second. Most shocking is that this performance is achieved using low-cost SATA disks. In fact, a cluster based on hardware costing less than $10,000 delivers a sustained 1.5M inserts/second and queries 20M rows/second. ScaleDB, which acts as a clustering infrastructure for MySQL, supports all MySQL tools and applications, providing a built-in ecosystem.
ScaleDB is fully transactional and does not have a single point of failure (SPOF), making it highly available. All data is mirrored, to avoid data loss and deliver high-availability (HA).
ScaleDB is also a time-series database. It is highly optimized for the storage and querying of time-series data. This is critical for time-sensitive device data and for analytics used to find or predict time-based trends in the data.
Knitting together SQL, NoSQL and Hadoop tools, which is time consuming, presents programming challenges, and results in a fragile system. ScaleDB delivers a single unified data management solution. It delivers the performance, scale and cost-efficiency needed to make it the database for the Internet of Things or, more appropriately, the Database of Things.
Note of thanks to Ivan Chong at Informatica for his market insights and IoT examples.