Tools MySQL Storage Engines Lead image: Lead Image © Maikloff, Fotolia.com
Lead Image © Maikloff, Fotolia.com
 

A storage engine for every use case

Data Machine

You can choose from among a number of open source database storage engines for both general and specialized data. By Federico Razzoli

MySQL knows very little about how to manage a table. Instead, it delegates such operations to a specific plugin type called the storage engine, which is the implementation of some important functions called by MySQL, including writing, deleting, and updating a row; reading rows; maintaining indexes; accessing indexes to find data more quickly; handling transactions (although many do not do this); and much more.

Today, the concept of a storage engine is implemented by several databases, such as MongoDB and Tarantool, the NoSQL database management system (DBMS) and Lua application server. However, MySQL probably invented this concept, because older competitors (e.g., Oracle, PostgreSQL) never implemented such a thing, knowing only one way to read or write a row.

One of the most famous contributors to MySQL and its fork MariaDB is Facebook. Last year, the social network spread the news of an important migration: Its MySQL instances, which relied on the default storage engine InnoDB, were migrated to their in-house, home-made storage engine, MyRocks, for their RocksDB high-performance database.

Not every company runs databases at the same scale as Facebook, so this piece of news gathered interest from database professionals, system administrators, and developers. Some people realized for the first time that MySQL supports storage engines, and those who already knew realized that MySQL storage engines are something to take extremely seriously. After all, it is difficult to doubt the quality of an important part of the Facebook infrastructure.

Why Storage Engines?

Why would MySQL implement storage engines? Maintaining multiple storage engines is expensive for a development team, even if this simply means fixing the most important bugs. Storage engines need to cooperate when different tables are used in a JOIN or when writes to different tables are performed in the same transaction, bringing a whole class of problems for developers to solve.

The MySQL architecture is complicated by the impossibility of making too many assumptions about what a storage engine can do. For this reason, MySQL has both transaction logs and a binary log. Although some storage engines write logs to implement transactions, or at least some form of crash-safeness, MySQL needs to maintain a centralized binary log to implement some features like replication.

Nevertheless, storage engine architecture is appreciated by many users because not all workloads are equal. In most cases, the default storage engine InnoDB is enough, although most users don't even realize they're using it. Sometimes, though, you need to optimize some aspects of your workload. Maybe you want to reduce the hardware requirements and grant your SSDs a longer life. Maybe you want extra reliability and consistency guarantees, or in another case, you might want to sacrifice reliability to gain more speed. The idea behind storage engines is that you should be able to make such choices without migrating to a different technology.

MySQL comes with some alternative storage engines that can be used in place of InnoDB. Its forks, MariaDB and Percona Server, come with some other storage engines maintained by their teams or by third parties.

Working with Storage Engines

MySQL expects storage engines to implement functions, accept certain parameters, and return a result that it is able to understand; otherwise, they have a high degree of freedom. For example, when MySQL asks the engine to write a row, it could maintain a cache and write data in a transactional way, write the table in-memory instead of to a disk, access a remote table located on another server, run a Perl script, or even do absolutely nothing. Although this might seem to be an unusual way of using database software, it actually opens a wide range of interesting possibilities. I'll mention more about this later when I talk about the least standard storage engines.

The list of available storage engines depends on the MySQL variant, its version, and the way it was installed. Typically, MySQL packages only contain some engines, and others can be installed with dedicated packages. The list can be retrieved with the SHOW ENGINES command. Listing 1 shows the results of this command in a MariaDB 10.3 container from the official MariaDB image.

Listing 1: MariaDB 10.3 Storage Engines

MySQL [(none)]> SHOW STORAGE ENGINES;
+--------------------+---------+----------------------------------------------------------------------------------+--------------+------+------------+
| Engine             | Support | Comment                                                                          | Transactions | XA   | Savepoints |
+--------------------+---------+----------------------------------------------------------------------------------+--------------+------+------------+
| MRG_MyISAM         | YES     | Collection of identical MyISAM tables                                            | NO           | NO   | NO         |
| CSV                | YES     | Stores tables as CSV files                                                       | NO           | NO   | NO         |
| MEMORY             | YES     | Hash based, stored in memory, useful for temporary tables                        | NO           | NO   | NO         |
| MyISAM             | YES     | Non-transactional engine with good performance and small data footprint          | NO           | NO   | NO         |
| Aria               | YES     | Crash-safe tables with MyISAM heritage                                           | NO           | NO   | NO         |
| InnoDB             | DEFAULT | Supports transactions, row-level locking, foreign keys and encryption for tables | YES          | YES  | YES        |
| PERFORMANCE_SCHEMA | YES     | Performance Schema                                                               | NO           | NO   | NO         |
| SEQUENCE           | YES     | Generated tables filled with sequential values                                   | YES          | NO   | YES        |
+--------------------+---------+----------------------------------------------------------------------------------+--------------+------+------------+

When you create a table, you can specify a storage engine as follows:

CREATE TABLE employee (
id INT UNSIGNED AUTO_INCREMENT,
  email VARCHAR(100) NOT NULL,
  full_name VARCHAR(200) NOT NULL,
  PRIMARY KEY (id)
) ENGINE = RocksDB;

To change the storage engine of an existing table, enter:

ALTER TABLE employee ENGINE = TokuDB;

Creating a table without specifying a storage engine uses the default, which is normally InnoDB:

MySQL [test]> SHOW GLOBAL VARIABLES LIKE 'default\_%storage_engine';
+----------------------------+--------+
| Variable_name              | Value  |
+----------------------------+--------+
| default_storage_engine     | InnoDB |
| default_tmp_storage_engine |        |
+----------------------------+--------+

To change the default, you can modify the default_storage_engine variable. Temporary tables also can use a different default engine with the default_tmp_storage_engine variable.

To use a different storage engine, you usually need to install a package, after which you will see it in the plugin directory. To find the plugin directory, enter:

MySQL [test]> SHOW GLOBAL VARIABLES LIKE 'plugin_dir';
+---------------+------------------------+
| Variable_name | Value                  |
+---------------+------------------------+
| plugin_dir    | /usr/lib/mysql/plugin/ |
+---------------+------------------------+

Storage engines are normally defined in files that start with the ha_ prefix and end with the .so extension (.dll on Windows). To install and uninstall the Sphinx engine, for example, enter:

INSTALL PLUGIN sphinx SONAME 'ha_sphinx.so';
UNINSTALL PLUGIN sphinx;

Before installing a storage engine, check its documentation, because it might require additional plugins.

Traditional MySQL Storage Engines

InnoDB is the default storage engine since MySQL 5.5, released in 2010. However, the first InnoDB version was released in 2001, and InnoDB was also the first part of MySQL acquired by Oracle, who made the acquisition in 2005.

From the beginning, InnoDB was important because it implemented foreign keys and implemented transactions more completely. Later, other MySQL engines implemented transactions, but none of them could beat InnoDB performance for the general case.

When InnoDB is configured for maximum reliability, it will not lose any data if a server or system crashes (unless, of course, the storage device is damaged). Its performance is very good for the most common use cases. InnoDB can (1) handle a high load of short transactions, and occasionally long ones, well, although it will behave a bit worse with a high load of long transactions; (2) support full-text searches and GIS data for geographical/geometrical queries; and (3) compress or encrypt tables.

In InnoDB, all rows are stored together with the primary key. Secondary indexes are ordered data structures that contain searchable columns and the corresponding primary key values. This structure has some important performance implications: Searches by primary key are very fast, whereas searches by secondary keys tend to be slower, because they usually imply a search on an index followed by a search on the primary key. Rows are physically sorted by the primary key. Insertions are only fast if the values are inserted in order (lower values first, higher values later). This is not a problem if the primary key is an AUTO_INCREMENT ID, but you should avoid, for example, the use of hashes or UUIDs as primary keys. Because primary key values are contained in secondary indexes, a big primary key implies big indexes. Again, an AUTO_INCREMENT ID is a good choice.

To make sure that InnoDB is fast, you need to make sure it has enough memory. Its buffer pool should contain all frequently accessed data to avoid frequent disk reads. Another aspect to care about is the size of the transaction logs. The official recommendation is to make sure they contain at least one hour of changes, although making it even a bit bigger is usually not a bad idea. If it is too small, InnoDB will need to flush changes to the table files often, which will slow it down.

In Percona Server, InnoDB is called XtraDB, which contains optimizations and bug fixes from Percona.

MyISAM is the MySQL historical default storage engine. It replaced the simpler ISAM in version 3.23, the version that made MySQL famous, and remained the default for some years. Before replacing MyISAM, InnoDB had to prove its stability and reliability. Even then, it took some time to develop full-text and GIS features that were a good alternative to MyISAM, which is why, even today, it is not too surprising to find old applications and old MySQL installations that still make use of many MyISAM tables.

By today's standards, MyISAM is quite primitive. It does not support transactions or foreign keys, and it is not even crash-safe. If MySQL crashes, some MyISAM tables will usually be corrupted. Although you can repair them with the REPAIR command or with the myisamchk utility, changes that were not flushed to disk before the crash are simply lost.

Despite all these problems, MyISAM still has some use cases. Indexes don't contain the primary key, so they are small compared with InnoDB, and MyISAM compresses tables at a very good compression ratio. Compressed MyISAM tables are read-only, but sometimes this is not a problem. The lack of crash safety should also not be a problem if the table is going to be read-only. Although some other storage engines reach very good compression ratios, they don't necessarily support secondary indexes, full text, or GIS.

The MERGE storage engine (aka Mrg_MyISAM) is described for historic reasons. Its purpose was to merge multiple MyISAM tables into a logical MERGE table that could be accessed with regular SQL queries.

The underlying MyISAM tables were also still writeable individually, which meant it worked around the system's maximum file size limit (although this should not be a problem today), and it distributed the workload between multiple MyISAM tables, with a MERGE table that acted as a larger data container. Nowadays you can achieve the same purpose with views and use any storage engine for the underlying tables, but in many cases, table partitioning is preferable.

The MEMORY storage engine creates in-memory tables. If MySQL is restarted, all data is lost. Transactions are not supported; it only supports hash indexes, which are only used to search an exact value (e.g., WHERE page = 'home'). Because of these characteristics, MEMORY is suitable as a way to cache frequently accessed data, but not much more.

In MySQL versions older than 8.0, as well as in MariaDB, internal temporary tables created to perform a certain operation are built with the MEMORY storage engine when possible – for example, if an ORDER BY cannot be performed by index. However, if the data to sort is too big or if it contains types not supported by MEMORY, a table must be written to disk. For this reason, MariaDB and Percona Server extended MEMORY to support TEXT and BLOB types.

BLACKHOLE is the most simple storage engine. When asked to write data, it does nothing and reports success. When it receives a SELECT, it returns an empty set. Basically, it is the SQL version of the /dev/null file.

Which problem does it solve, then? If you have a master and a slave and an InnoDB table, you can turn it into a BLACKHOLE table only on the slave. When you run INSERT statements, the master will write those rows, but the slave will not. The slave's users will have no way to access the data. Similarly, you could use BLACKHOLE on the master and not on the slave.

Although you might think this engine cannot have bugs because it does nothing (as stated during a MySQL conference by a member of the team), if you visit the MySQL bug page [1] and search for the string "blackhole," a few bugs do show up.

CSV, as the name says, allows you to use a CSV file as a table, which can be convenient when importing or exporting data and applying some transformation at the same time. However, you should note that the CSV engine is pretty limited: It does not support indexes or cache, so it is extremely slow with big datasets; NULL is not supported; and it does not allow transactions. In MariaDB, you can use the CONNECT engine (described below) instead.

FEDERATED was developed to allow the use of a table from a remote MySQL server as if it were on the local server. That is, if you connect to a MySQL instance and query a FEDERATED table, MySQL will forward the query to another server that contains the real data. The mechanism, transparent to the application, allows you to read and write data.

FederatedX is an evolution of FEDERATED that comes with MariaDB. Its main improvement is support for transactions.

ARCHIVE is a storage engine for compressed tables. Its compression rate is very good, but it comes with many limitations. For example, neither indexes nor transactions are supported. Also, this storage engine is append-only, with no way to delete or update rows.

Third-Party Storage Engines

A number of storage engines come with Percona Server and MariaDB. Each fork has a slightly modified version of these engines, mainly to support MariaDB-specific features, but you always should be able to recompile the versions in Percona Server for use in MySQL.

TokuDB was initially developed by a small company called TokuTek, later acquired by Percona. It is optimized for write-intensive workloads: Its indexes have a set of buffers (called a fractal tree) that reduce write operations, flushing a portion of the indexes to disk altogether; the compression rate is very good; and in a slave, the tables can skip the tests on unique indexes and trust the checks already made by the master.

Another interesting feature is clustering indexes. Whereas indexes usually speed up filtering, ordering, and grouping because they are ordered data structures, clustering indexes contain extra columns that don't affect the order of index entries. As a consequence, they don't slow down writes when index entries are added, deleted, or modified; however, when a query only reads columns from a clustering index, a single index search will find all relevant data, which saves time because it eliminates the additional, normally necessary operation of looking for some columns in the table data.

TokuDB supports transactions, and it is usable as a general-purpose engine. Its optimizations aim to reduce I/O operations – in particular, those that hit the disk.

RocksDB, which I mentioned previously as MyRocks , the storage engine from Facebook, is present in Percona Server and MariaDB. The terminology could be a bit confusing here. Facebook uses the name MyRocks, which is not used in MariaDB or Persona Server. Instead, the name RocksDB is used, which could lead to some confusion because its name is identical to the RocksDB NoSQL database. With this storage engine, part of the RocksDB technology is made available in MySQL, and usable via a SQL interface. RocksDB (from now on we will use this term for the storage engine) is optimized for flash storage, to MySQL.

According to various articles and conference speeches from the development team, MyRocks was created because Facebook was not happy with some flaws in RocksDB, in particular so-called write amplification, wherein each time something is written in an InnoDB table, several physical writes occur, which can shorten the life of SSD devices.

RocksDB has a very good compression ratio, which reduces the number of I/O operations and the need for storage space. In general, it is designed to serve most workloads, while keeping resource usage relatively low. CPU usage is generally low, and although both InnoDB and RocksDB performance benefits by adding more memory, it seems to be less critical with RocksDB. InnoDB write I/O operations per second (IOPS) is much more stable compared with InnoDB.

RocksDB does not necessarily outperform InnoDB in all cases, though. The InnoDB user base is much wider, which means it is more battle tested in a wide variety of use cases.

Storage Engines Distributed with MariaDB

Spider is design to implement transparent sharding (i.e., row-wise partitioning). The key to understanding Spider is to understand partitioning. The feature allows you to split a table into multiple physical chunks (partitions), which is generally useful with big tables, because each partition will be smaller than the whole table and therefore faster to read and write and easier to manage.

Each Spider partition can be linked to another table located on the same server or on a remote server. This idea is not much different from that behind the FEDERATED engine; however, instead of linking one table to a remote table, you can link each individual partition to different remote tables. In this way, data can be physically distributed on multiple servers.

Although this mechanism is transparent to the user, you need to remember that each query will result in one or more remote connections to other servers, and Spider doesn't cache data locally, which adds latency to each query.

The CONNECT storage engine defines tables by data type, which allows importing and exporting data stored in different formats (e.g., CSV, JSON, XML, HTML). It can access remote tables from other MySQL or MariaDB servers, or it can access any other DBMS through the Open Database Connectivity (ODBC) or Java Database Connectivity (JDBC) standard.

CONNECT can define a template to parse logfiles in several formats (e.g., by defining the information stored, it can read Apache logfiles), and it has special read-only table types that retrieve information from the system, such as the contents of a directory. CONNECT is not meant to be superfast. Before performing complex operations, it would be a good idea to copy data from a CONNECT table to InnoDB. Also note that transactions are not supported. However, CONNECT is a convenient way to perform complex import/export operations.

Aria is an evolution of MyISAM and was originally meant to replace InnoDB in MariaDB, because InnoDB belongs to Oracle. However, plans to support transactions fully in Aria seem to have been abandoned long ago, and this engine is simply "crash-safe." After a crash, Aria will not lose data and is the main reason it is considered an evolution of MyISAM, which is not crash-safe. However, Aria's write performance does not shine, and it never managed to reach the quality of InnoDB.

Sequence provides virtual tables that return integer sequences. Instead of creating tables, you just query an ephemeral table, keeping in mind its name pattern. For example,

SELECT seq FROM seq_1_to_10_step_2;

returns all numbers from 1 to 10, incremented by 2 (pair numbers). Slightly more complex queries can return different sequences. For example,

SELECT CHAR(s1.seq), CHAR(s2.seq) FROM seq_97_to_122 s1 JOIN seq_97_to_122 s2;

returns all combinations of lowercase characters from a to z.

In general, Sequence is a convenient way to populate a table with initial data (e.g., a sequence of numbers, character strings, or dates).

OQGRAPH (Open Query GRAPH) is a computation engine created to store hierarchical data or graphs (e.g., a directory tree or links between social network users) in a relational database, which was always a problem until MariaDB implemented the WITH syntax, which allows you to retrieve this kind of data easily.

An OQGRAPH table is built on a regular table (e.g., InnoDB) with references to the rows in the underlying table and arcs between them. The arcs can have a weight that represents the strength of the relation. Running a SELECT against the OQGRAPH table with a particular syntax allows you to find the shortest path quickly between two nodes of a graph.

SphinxSE (Sphinx storage engine) allows MySQL and MariaDB to connect to the Sphinx open source search engine to query its indexes, which is very useful to MySQL and MariaDB because their full-text functionality is not very advanced compared with technologies like Sphinx. Although Sphinx is not a trendy technology anymore, it is stable, and its full-text capabilities are battle tested. The plugin is maintained by the Sphinx team and works equally well with its fork Manticore.

Cassandra is a well-known NoSQL database. Although it now uses a subset of the SQL language, when this storage engine was written, Cassandra used the Apache Thrift protocol [2]. Although never widely used in production environments, Cassandra was an interesting proof of concept, showing that a MySQL/MariaDB plugin can allow interaction with the NoSQL world.

Using Multiple Engines

General-purpose storage engines like InnoDB, RocksDB, and TokuDB have a cache for the data and indexes accessed most often and for buffering writes to disk. Often, these caches are given as much memory as possible and a good number of background threads, which means that using different general-purpose engines on the same servers can lead to serious resource problems.

Transactions that involve multiple engines are also a problem for performance, because every engine coordinates its own transactions with different mechanisms. To make transactions consistent across different engines, MySQL has to use the binary log as a coordinator, as well as a two-phase commit internally that is, of course, slow. The general suggestion is to avoid these kinds of situations.

However, there is no reason to avoid the use of different engines on different servers. Even in the case of a master-slave replication chain, it is generally possible to use a certain storage engine on the master and a different one on the slaves – or on some of them, especially when the slaves must deal with a different workload.

Final Words

The MySQL default storage engine InnoDB is fast, feature-rich, extremely advanced, and normally the best choice for most tables.

Companies like Facebook that use MySQL on a very large scale and with intensive workloads have developed storage engines that are more tailored to their needs (e.g., to optimize physical and system resource usage). The same optimizations can prove useful for much smaller companies if the code is released as open source.

When it comes to deciding what data to write and how, you have a great deal of freedom in the choice of storage engines, many of which have been designed for specific needs, such as a very high compression ratio or importing and exporting data to different technologies.

Storage engines make MySQL extremely flexible, and exploring them can open interesting possibilities.