Wednesday, July 12, 2017

Working With the File System in Elixir

Working With the File System in Elixir

Working with the file system in Elixir does not really differ from doing so using other popular programming languages. There are three modules to solve this task: IO, File, and Path. They provide functions to open, create, modify, read and destroy files, expand paths, etc. There are, however, some interesting gotchas that you should be aware of.

In this article we will talk about working with the file system in Elixir while taking a look at some code examples.

The Path Module

The Path module, as the name suggests, is used to work with file system paths. The functions of this module always return UTF-8 encoded strings.

For instance, you can expand a path and then generate an absolute path easily:

Note, by the way, that in Windows, backslashes are replaced with forward slashes automatically. The resulting path can be passed to the functions of the File module, for example:

Here we are constructing a full path to the file and then writing some contents to it.

All in all, working with the Path module is simple, and most of its functions do not interact with the file system. We will see some use cases for this module later in the article.

IO and File Modules

IO, as the name implies, is the module to work with input and output. For example, it provides such functions as puts and inspect. IO has a concept of devices, which can be either process identifiers (PID) or atoms. For instance, there are :stdio and :stderr generic devices (which are actually shortcuts). Devices in Elixir maintain their position, so subsequent read or write operations start from the place where the device was previously accessed.

The File module, in turn, allows us to access files as IO devices. Files are opened in binary mode by default; however, you might pass :utf8 as an option. Also when a filename is specified as a character list ('some_name.txt'), it is always treated as UTF-8.

Now let's see some examples of using the modules mentioned above.

Opening and Reading Files With IO

The most common task is, of course, opening and reading files. To open a file, a function called open/2 can be used. It accepts a path to the file and an optional list of modes. For example, let's try to open a file for reading and writing:

You may then read this file using the read/2 function from the IO module as well:

Here we are reading the file line by line. Note the :eof atom that means "end of file".

You can also pass :all instead of :line to read the whole file at once:

In this case, :eof won't be returned—instead, we get an empty string. Why? Well, because, as we said earlier, devices maintain their position, and we start reading from the previously accessed place.

There is also an open/3 function, which accepts a function as the third argument. After the passed function has finished its work, the file is closed automatically:

Reading Files With File Module

In the previous section I've shown how to use IO.read in order to read files, but it appears that the File module actually has a function with the same name:

This function returns a tuple containing the result of the operation and a binary data object. In this example it contains "test", which is the contents of the file.

If the operation was unsuccessful, then the tuple will contain an :error atom and the error's reason:

Here, :enoent means that the file does not exist. There are some other reasons like :eacces (has no permissions).

The returned tuple can be used in pattern matching to handle different outcomes:

In this example, we either print out the file's contents or display an error reason.

Another function to read files is called read!/1. If you have come from the Ruby world, you've probably guessed what it does. Basically, this function opens a file and returns its contents in the form of a string (not tuple!):

However, if something goes wrong and the file cannot be read, an error is raised instead:

So, to be on the safe side, you can, for example, employ the exists?/1 function to check whether a file actually exists: 

Great, now we know how to read files. However, there is much more we can do, so let's proceed to the next section!

Writing to Files

To write something to a file, use the write/3 function. It accepts a path to a file, the contents, and an optional list of modes. If the file does not exist, it will be created automatically. If, however, it does exist, all its contents will be overwritten by default. To prevent this from happening, set the :append mode:

In this case, the contents will be appended to the file and :ok will be returned as a result. If something goes wrong, you'll get a tuple {:error, reason}, just like with the read function.

Also, there is a write! function that does pretty much the same, but raises an exception if the contents cannot be written. For example, we can write an Elixir program that creates a Ruby program that, in turn, prints "hello!":

Streaming Files

The files can indeed be pretty large, and when using the read function you load all the contents into the memory. The good news is that files can be streamed quite easily:

In this example, we open a file, stream it line by line, and inspect each line. The result will look like this:

Note that the new line symbols are not removed automatically, so you may want to get rid of them using the String.replace/4 function.

It is a bit tedious to stream a file line by line as shown in the previous example. Instead, you can rely on the stream!/3 function, which accepts a path to the file and two optional arguments: a list of modes and a value explaining how a file should be read (the default value is :line):

In this piece of code we are streaming a file while removing newline characters and then printing out each line. File.stream! is slower than File.read, but we don't need to wait until all lines are available—we can start processing the contents right away. This is especially useful when you need to read a file from a remote location.

Let's take a look at a slightly more complex example. I'd like to stream a file with my Elixir script, remove newline characters, and display each line with a line number next to it:

Stream.with_index/2 accepts an enumerable and returns a collection of tuples, where each tuple contains a value and its index. Next, we just iterate over this collection and print out the line number and the line itself. As a result, you'll see the same code with line numbers:

Moving and Removing Files

Now let's also briefly cover how to manipulate files—specifically, move and remove them. The functions we're interested in are rename/2 and rm/1. I won't bore you by describing all the arguments they accept as you can read the documentation yourself, and there is absolutely nothing complex about them. Instead, let's take a look at some examples.

First, I'd like to code a function that takes all files from the current directory based on a condition and then moves them to another directory. The function should be called like this:

So, here I want to grab all .txt files and move them to the texts directory. How can we solve this task? Well, firstly, let's define a module and a private function to prepare a destination directory:

mkdir!, as you've already guessed, tries to create a directory and returns an error if this operation fails.

Next, we need to grab all the files from the current directory. This can be done using the ls! function, which returns a list of file names:

Lastly, we need to filter the resulting list based on the provided function and rename each file, which effectively means moving it to another directory. Here is the final version of the program:

Now let's see the rm in action by coding a similar function that is going to remove all files based on a condition. The function will be called in the following way:

Here is the corresponding solution:

rm!/1 will raise an error if the file cannot be removed. As always, it has an rm/1 counterpart that will return a tuple with the error's reason if something goes wrong.

You may note that the remove_if and transfer_to functions are very similar. So why don't we remove code duplication as an exercise? I'll add yet another private function that takes all the files, filters them based on the provided condition, and then applies an operation to them:

Now simply utilize this function:

Third-Party Solutions

Elixir's community is growing, and fancy new libraries solving various tasks are emerging. The Awesome Elixir GitHub repo lists some popular solutions, and of course there is a section with libraries for working with files and directories. There are implementations for file uploading, monitoring, filename sanitization, and more.

For example, there is an interesting solution called Librex for converting your documents with the help of LibreOffice. To see it in action, you can create a new project:

Then add a new dependency to the mix.exs file:

After that, run:

Next, you can include the library and perform conversions:

In order for this to work, the LibreOffice executable (soffice.exe) must be present in the PATH. Otherwise, you'll need to provide a path to this file as a third argument:

Conclusion

That's all for today! In this article, we've seen the IO, File and Path modules in action and discussed some useful functions like open, read, write, and others. 

There are lots of other functions available for use, so be sure to browse Elixir's documentation. Also, there is an introductory tutorial on the official website of the language that can come in handy as well.

I hope you enjoyed this article and now feel a bit more confident about working with the file system in Elixir. Thank you for staying with me, and until next time!


No comments:

Post a Comment