Thursday, November 30, 2017

Store Everything With Elixir and Mnesia

Store Everything With Elixir and Mnesia

In one of my previous articles I wrote about Erlang Term Storage tables (or simply ETS), which allow tuples of arbitrary data to be stored in memory. We also discussed disk-based ETS (DETS), which provide slightly more limited functionality, but allow you to save your contents to a file.

Sometimes, however, you may require an even more powerful solution to store the data. Meet Mnesia—a real-time distributed database management system initially introduced in Erlang. Mnesia has a relational/object hybrid data model and has lots of nice features, including replication and fast data searches.

In this article, you will learn:

  • How to create a Mnesia schema and start the whole system.
  • What table types are available and how to create them.
  • How to perform CRUD operations and what the difference is between "dirty" and "transactional" functions.
  • How to modify tables and add secondary indexes.
  • How to use the Amnesia package to simplify working with databases and tables.

Let's get started, shall we?

Introduction to Mnesia

So, as already mentioned above, Mnesia is an object and relational data model that scales really well. It has a DMBS query language and supports atomic transactions, just like any other popular solution (Postgres or MySQL, for example). Mnesia's tables may be stored on disk and in memory, but programs may be written without the knowledge of the actual data location. Moreover, you may replicate your data across multiple nodes. Also note that Mnesia runs in the same BEAM instance as all other code.

Since Mnesia is an Erlang module, you should access it using an atom:

Though it is possible to create an alias like this:

Data in Mnesia is organized into tables that have their own names represented as atoms (which is very similar to ETS). The tables can have one of the following types:

  • :set—the default type. You can't have multiple rows with exactly the same primary key (we'll see in a moment how to define a primary key). The rows are not being ordered in any particular manner.
  • :ordered_set—same as :set, but the data are ordered by the primary key. Later we will see that some read operations will behave differently with :ordered_set tables.
  • :bag—multiple rows may have the same key, but the rows still cannot be fully identical.

Tables have other properties that may be found in the official docs (we will discuss some of them in the next section). However, before starting to create tables, we need a schema, so let's proceed to the next section and add one.

Creating a Schema and Tables

To create a new schema, we will use a method with a quite unsurprising name: create_schema/1. Basically, it is going to create a new database for us on a disk. It accepts a node as an argument:

A node is an Erlang VM that handles its communications, memory, and other stuff. Nodes may connect to each other, and they are not limited to one PC—you can connect to other nodes via the Internet as well.

After you run the above code, a new directory named Mnesia.nonode@nohost will be created that is going to contain your database. nonode@nohost is the node's name here. Before we can create any tables, however, Mnesia has to be started. This is as simple as calling the start/0 function:

Mnesia should be started on all participating nodes, each of which normally has a folder to which the files will be written (in our case, this folder is named Mnesia.nonode@nohost). All the nodes that compose the Mnesia system are written to the schema, and later you may add or remove individual nodes. Moreover, upon starting, nodes exchange schema information to make sure that everything is okay.

If Mnesia started successfully, an :ok atom will be returned as a result. You may later stop the system by calling stop/0:

Now we can create a new table. At the very least, we should provide its name and a list of attributes for the records (think of them as columns):

If the system is not running, the table won't be created and an {:aborted, {:node_not_running, :nonode@nohost}} error will be returned instead. Also, if the table already exists, you will get an {:aborted, {:already_exists, :user}} error.

So our new table is called :user, and it has three attributes: :id, :name, and :surname. Note that the first attribute in the list is always used as the primary key, and we can utilize it to quickly search for a record. Later in the article, we'll see how to write complex queries and add secondary indexes.

Also, remember that the default type for the table is :set, but this may be changed quite easily:

You may even make your table read-only by setting the :access_mode to :read_only:

After the schema and the table are created, the directory is going to have a schema.DAT file as well as some .log files. Let's now proceed to the next section and insert some data to our new table!

Write Operations

To store some data in a table, you need to utilize a function write/1. For example, let's add a new user named John Doe:

Note that we've specified both the table's name and all the user's attributes to store. Try running the code... and it fails miserably with an {:aborted, :no_transaction} error. Why is this happening? Well, this is because the write/1 function should be executed in a transaction. If, for some reason, you do not want to stick with a transaction, the write operation may be done in a "dirty way" using dirty_write/1:

This approach is usually not recommended, so instead let's build a simple transaction with the help of the transaction function:

transaction accepts an anonymous function that has one or more grouped operations. Note that in this case the result is {:atomic, :ok}, not just :ok as it was with the dirty_write function. The main benefit here is that if something goes wrong during the transaction, all operations are rolled back.

Actually, that's an atomicity principle, which says that either all operations should occur or no operations should occur in case of an error. Suppose, for example, you are paying your employees their salaries, and suddenly something goes wrong. The operation stops, and you definitely do not want to end up in a situation when some employees got their salaries and some not. That's when atomic transactions are really handy.

The transaction function may have as many write operations as needed: 

Interestingly, data can be updated using the write function as well. Just provide the same key and new values for the other attributes:

Note, however, that this is not going to work for the tables of the :bag type. Because such tables allow multiple records to have the same key, you will simply end up with two records: [{:user, 2, "Kate", "Brown"}, {:user, 2, "Kate", "Smith"}]. Still, :bag tables do not allow fully identical records to exist.

Read Operations

All right, now that we have some data in our table, why don't we try to read them? Just as with write operations, you may perform read in either a "dirty" or "transactional" way. The "dirty way" is simpler of course (but that's the dark side of the Force, Luke!):

So dirty_read returns a list of found records based on the provided key. If the table is a :set or an :ordered_set, the list will have only one element. For :bag tables, the list can, of course, have multiple elements. If no records were found, the list would be empty.

Now let's try to perform the same operation but using the transactional approach:

Great!

Are there are any other useful functions for reading data? But of course! For example, you may grab the first or the last record of the table:

Both dirty_first and dirty_last have their transactional counterparts, namely first and last, that should be wrapped in a transaction. All of these functions return the record's key, but note that in both cases we get 2 as a result even though we have two records with the keys 2 and 3. Why is this happening?

It appears that for the :set and :bag tables, the dirty_first and dirty_last (as well as first and last) functions are synonyms because the data are not sorted in any specific order. If, however, you have an :ordered_set table, the records will be sorted by their keys, and the result would be:

It is also possible to the grab the next or the previous key by using dirty_next and dirty_prev (or next and prev):

If there are no more records, a special atom :"$end_of_table" is returned. Also, if the table is a :set or :bag, dirty_next and dirty_prev are synonyms.

Lastly, you may get all the keys from a table by using dirty_all_keys/1 or all_keys/1:

Delete Operations

In order to delete a record from a table, use dirty_delete or delete:

This is going to remove all records with a given key.

Similarly, you can remove the whole table:

There is no "dirty" counterpart for this method. Obviously, after a table is deleted, you cannot write anything to it, and an {:aborted, {:no_exists, :user}} error will be returned instead.

Lastly, if you are really in a deleting mood, the whole schema can be removed by using delete_schema/1:

This operation will return a {:error, {'Mnesia is not stopped everywhere', [:nonode@nohost]}} error if Mnesia is not stopped, so don't forget to do so:

More Complex Read Operations

Now that we have seen the basics of working with Mnesia, let's dig a bit deeper and see how to write advanced queries. First, there are match_object and dirty_match_object functions that can be used to search for a record based on one of the provided attributes:

The attributes that you do not care for are marked with the :_ atom. You may set only the surname, for example:

You may also provide custom searching criteria using select and dirty_select. To see this in action, let's firstly populate the table with the following values:

Now what I want to do is find all the records that have Will as the name and whose keys are less than 5, meaning that the resulting list should contain only "Will Smith" and "Will Smoth". Here is the corresponding code:

Things are a bit more complex here, so let's discuss this snippet step by step.

  • Firstly, we have the {:user, :"$1", :"$2", :"$3"} part. Here we are providing the table name and a list of positional parameters. They should be written in this strange-looking form so that we can utilize them later. $1 corresponds to the :id, $2 is the name, and $3 is the surname.
  • Next, there is a list of guard functions that should be applied to the given parameters. {:<, :"$1", 5} means that we'd like to select only the records whose attribute marked as $1 (that is, :id) is less than 5{:==, :"$2", "Will"}, in turn, means that we are selecting the records with the :name set to "Will".
  • Lastly, [:"$$"] means that we'd like to include all the fields in the result. You may say [:"$2"] to display only the name. Note, by the way, that the result contains a list of lists: [[3, "Will", "Smith"], [4, "Will", "Smoth"]].

You may also mark some attributes as the ones you do not care for using the :_ atom. For example, let's ignore the surname:

In this case, however, the surname won't be included in the result.

Modifying the Tables

Performing Transformations

Suppose now that we would like to modify our table by adding a new field. This can be done by using the transform_table function, which accepts the table's name, a function to apply to all the records, and the list of new attributes:

In this example we are adding a new attribute named :salary (it is provided in the last argument). As for the transform function (the second argument), we are setting this new attribute to a random value. You may also modify any other attribute inside this transform function. This process of changing the data is known as a "migration", and this concept should be familiar to developers coming from the Rails world.

Now you may simply grab information about the table's attributes by using table_info:

The :salary attribute is there! And, of course, your data are also in place:

You can find a slightly more complex example of using both the create_table and transform_table functions at the ElixirSchool website.

Adding Indexes

Mnesia allows you to make any attribute indexed by using the add_table_index function. For example, let's make our :surname attribute indexed:

If the index already exists, you will get an error {:aborted, {:already_exists, :user, 4}}.

As the documentation for this function states, indexes do not come for free. Specifically, they occupy additional space (proportional to the table size) and make insert operations a bit slower. On the other hand, they allow you to search for the data faster, so that's a fair trade-off.

You may search by an indexed field using either the dirty_index_read or index_read function:

Here we are using the secondary index :surname to search for a user. 

Using Amnesia

It may be somewhat tedious to work with the Mnesia module directly, but luckily there is a third-party package called Amnesia (duh!) that allows you to perform trivial operations with greater ease.

For example, you may define your database and a table like this:

This is going to define a database called Demo with a table User. The user is going to name a name, a surname, an e-mail (an indexed field), and an id (primary key set to autoincrement).

Next, you may easily create the schema using the built-in mix task:

In this case, the database will be a disk-based, but there are some other available options that you may set. Also there is a drop task that will, obviously, destroy the database and all the data:

It is possible to destroy both the database and the schema:

Having the database and schema in place, it is possible to perform various operations against the table. For example, create a new record:

Or get a user by id:

Moreover, you may define a Message table while establishing a relation to the User table with a user_id as a foreign key:

The tables may have a bunch of helper functions inside, for example, to create a message or get all the messages:

You may now find the user, create a message for them, or list all their messages with ease:

Quite simple, isn't it? Some other usage examples may be found at the Amnesia official website.

Conclusion

In this article, we talked about the Mnesia database management system available for Erlang and Elixir. We have discussed the main concepts of this DBMS and have seen how to create a schema, database, and tables, as well as performing all major operations: create, read, update, and destroy. On top of that, you have learned how to work with indexes, how to transform tables, and how to use the Amnesia package to simplify working with databases.

I really hope this article was useful and you are eager to try Mnesia in action as well. As always, I thank you for staying with me, and until the next time!


No comments:

Post a Comment