In my previous article we were talking about Open Telecom Platform (OTP) and, more specifically, the GenServer abstraction that makes it simpler to work with server processes. GenServer, as you probably remember, is a behaviour—to use it, you need to define a special callback module that satisfies the contract as dictated by this behaviour.
What we have not discussed, however, is error handling. I mean, any system may eventually experience errors, and it is important to take of them properly. You can refer to the How to Handle Exceptions in Elixir article to learn about the try/rescue
block, raise
, and some other generic solutions. These solutions are very similar to the ones found in other popular programming languages, like JavaScript or Ruby.
Still, there is more to this topic. After all, Elixir is designed to build concurrent and fault-tolerant systems, so it has other goodies to offer. In this article we will talk about supervisors, which allow us to monitor processes and restart them after they terminate. Supervisors are not that complex, but pretty powerful. They can be easily tweaked, set up with various strategies on how to perform restarts, and used in supervision trees.
So today we will see supervisors in action!
Preparations
For demonstration purposes, we are going to use some sample code from my previous article about GenServer. This module is called CalcServer
, and it allows us to perform various calculations and persist the result.
All right, so firstly, create a new project using the mix new calc_server
command. Next, define the module, include GenServer
, and provide the start/1
shortcut:
# lib/calc_server.ex defmodule CalcServer do use GenServer def start(initial_value) do GenServer.start(__MODULE__, initial_value, name: __MODULE__) end end
Next, provide the init/1
callback that will be run as soon as the server is started. It takes an initial value and uses a guard clause to check if it's a number. If not, the server terminates:
def init(initial_value) when is_number(initial_value) do {:ok, initial_value} end def init(_) do {:stop, "The value must be an integer!"} end
Now code interface functions to perform addition, division, multiplication, calculation of square root, and fetching the result (of course, you can add more mathematical operations as needed):
def sqrt do GenServer.cast(__MODULE__, :sqrt) end def add(number) do GenServer.cast(__MODULE__, {:add, number}) end def multiply(number) do GenServer.cast(__MODULE__, {:multiply, number}) end def div(number) do GenServer.cast(__MODULE__, {:div, number}) end def result do GenServer.call(__MODULE__, :result) end
Most of these functions are handled asynchronously, meaning we are not waiting for them to complete. The latter function is synchronous because we actually want to wait for the result to arrive. Therefore, add handle_call
and handle_cast
callbacks:
def handle_call(:result, _, state) do {:reply, state, state} end def handle_cast(operation, state) do case operation do :sqrt -> {:noreply, :math.sqrt(state)} {:multiply, multiplier} -> {:noreply, state * multiplier} {:div, number} -> {:noreply, state / number} {:add, number} -> {:noreply, state + number} _ -> {:stop, "Not implemented", state} end end
Also, specify what to do if the server is terminated (we're playing Captain Obvious here):
def terminate(_reason, _state) do IO.puts "The server terminated" end
The program can now be compiled using iex -S mix
and used in the following way:
CalcServer.start(6.1) CalcServer.sqrt CalcServer.multiply(2) CalcServer.result |> IO.puts # => 4.9396356140913875
The problem is that the server crashes when an error is raised. For example, try to divide by zero:
CalcServer.start(6.1) CalcServer.div(0) # [error] GenServer CalcServer terminating # ** (ArithmeticError) bad argument in arithmetic expression # (calc_server) lib/calc_server.ex:44: CalcServer.handle_cast/2 # (stdlib) gen_server.erl:601: :gen_server.try_dispatch/4 # (stdlib) gen_server.erl:667: :gen_server.handle_msg/5 # (stdlib) proc_lib.erl:247: :proc_lib.init_p_do_apply/3 # Last message: {:"$gen_cast", {:div, 0}} # State: 6.1 CalcServer.result |> IO.puts # ** (exit) exited in: GenServer.call(CalcServer, :result, 5000) # ** (EXIT) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started # (elixir) lib/gen_server.ex:729: GenServer.call/3
So the process is terminated and cannot be used anymore. This is indeed bad, but we are going to fix this really soon!
Let It Crash
Every programming language has its idioms, and so does Elixir. When dealing with supervisors, one common approach is to let a process crash and then do something about it—probably, restart and keep going.
Many programming languages use only try
and catch
(or similar constructs), which is a more defensive style of programming. We are basically trying to anticipate all the possible problems and provide a way to overcome them.
Things are very different with supervisors: if a process crashes, it crashes. But the supervisor, just like a brave battle medic, is there to help a fallen process recover. This may sound a bit strange, but in reality that is a very sane logic. What's more, you can even create supervision trees and this way isolate errors, preventing the whole application from crashing if one of its parts is experiencing problems.
Imagine driving a car: it is composed of various subsystems, and you cannot possibly check them every time. What you can do is fix a subsystem if it breaks (or, well, ask a car mechanic to do so) and continue your journey. Supervisors in Elixir do just that: they monitor your processes (referred to as child processes) and restart them as needed.
Creating a Supervisor
You can implement a supervisor using the corresponding behaviour module. It provides generic functions for error tracing and reporting.
First of all, you would need to create a link to your supervisor. Linking is quite an important technique as well: when two processes are linked together and one of them terminates, another receives notification with an exit reason. If the linked process terminated abnormally (that is, crashed), its counterpart exits as well.
This can be demonstrated using the spawn/1 and spawn_link/1 functions:
spawn(fn -> IO.puts "hi from parent!" spawn_link(fn -> IO.puts "hi from child!" end) end)
In this example, we are spawning two processes. The inner function is spawned and linked to the current process. Now, if you raise an error in one of them, another will terminate as well:
spawn(fn -> IO.puts "hi from parent!" spawn_link(fn -> IO.puts "hi from child!" raise("oops.") end) :timer.sleep(2000) IO.puts "unreachable!" end) # [error] Process #PID<0.83.0> raised an exception # ** (RuntimeError) oops. # gen.ex:5: anonymous fn/0 in :elixir_compiler_0.__FILE__/1
So, to create a link when using GenServer, simply replace your start
functions with start_link:
defmodule CalcServer do use GenServer def start_link(initial_value) do GenServer.start_link(__MODULE__, initial_value, name: __MODULE__) end # ... end
It's All About Behaviour
Now, of course, a supervisor should be created. Add a new lib/calc_supervisor.ex file with the following contents:
defmodule CalcSupervisor do use Supervisor def start_link do Supervisor.start_link(__MODULE__, nil) end def init(_) do supervise( [ worker(CalcServer, [0]) ], strategy: :one_for_one ) end end
There is a lot going on here, so let's move at a slow pace.
start_link/2 is a function to start the actual supervisor. Note that the corresponding child process will be started as well, so you won't have to type CalcServer.start_link(5)
anymore.
init/2 is a callback that must be present in order to employ the behaviour. The supervise
function, basically, describes this supervisor. Inside you specify which child processes to supervise. We are, of course, specifying the CalcServer
worker process. [0]
here means the initial state of the process—it is the same as saying CalcServer.start_link(0)
.
:one_for_one
is the name of the process restart strategy (resembling a famous Musketeers motto). This strategy dictates that when a child process terminates, a new one should be started. There are a handful of other strategies available:
:one_for_all
(even more Musketeer-style!)—restart all the processes if one terminates.:rest_for_one
—child processes started after the terminated one are restarted. The terminated process is restarted as well.:simple_one_for_one
—similar to :one_for_one but requires only one child process to be present in the specification. Used when the supervised process should be dynamically started and stopped.
So the overall idea is quite simple:
- Firstly, a supervisor process is started. The
init
callback must return a specification explaining what processes to monitor and how to handle crashes. - The supervised child processes are started according to the specification.
- After a child process crashes, the information is sent to the supervisor thanks to the established link. Supervisor then follows the restart strategy and performs the necessary actions.
Now you can run your program again and try to divide by zero:
CalcSupervisor.start_link CalcServer.add(10) CalcServer.result # => 10 CalcServer.div(0) # => error! CalcServer.result # => 0
So the state is lost, but the process is running even though an error has happened, which means that our supervisor is working fine!
This child process is quite bulletproof, and you literally will have a hard time killing it:
Process.whereis(CalcServer) |> Process.exit(:kill) CalcServer.result # => 0 # HAHAHA, I am immortal!
Note, however, that technically the process is not restarted—rather, a new one is being started, so the process id won't be the same. It basically means that you should give your processes names when starting them.
The Application
You may find it somewhat tedious to start the supervisor manually every time. Luckily, it is quite easy to fix by using the Application module. In the simplest case, you will only need to make two changes.
Firstly, tweak the mix.exs file located in the root of your project:
# ... def application do # Specify extra applications you'll use from Erlang/Elixir [ extra_applications: [:logger], mod: {CalcServer, []} # <== add this line ] end
Next, include the Application
module and provide the start/2 callback that will be run automatically when your app is started:
defmodule CalcServer do use Application use GenServer def start(_type, _args) do CalcSupervisor.start_link end # ... end
Now after executing the iex -S mix
command, your supervisor will be up and running right away!
Infinite Restarts?
You may wonder what is going to happen if the process constantly crashes and the corresponding supervisor restarts it again. Will this cycle run indefinitely? Well, actually, no. By default, only 3
restarts within 5
seconds are allowed—no more than that. If more restarts happen, the supervisor gives up and kills itself and all the child processes. Sounds horrifying, eh?
You can easily check it by quickly running the following line of code over and over again (or doing it in a cycle):
Process.whereis(CalcServer) |> Process.exit(:kill) # ... # ** (EXIT from #PID<0.117.0>) shutdown
There are two options that you can tweak in order to change this behaviour:
:max_restarts
—how many restarts are allowed within the timeframe:max_seconds
—the actual timeframe
Both of these options should be passed to the supervise
function inside the init
callback:
def init(_) do supervise( [ worker(CalcServer, [0]) ], max_restarts: 5, max_seconds: 6, strategy: :one_for_one ) end
Conclusion
In this article, we've talked about Elixir Supervisors, which allow us to monitor and restart child processes as needed. We've seen how they can monitor your processes and restart them as needed, and how to tweak various settings, including restart strategies and frequencies.
Hopefully, you found this article useful and interesting. I thank you for staying with me and until the next time!
No comments:
Post a Comment