Monday, August 21, 2017

Asynchronous I/O With Python 3

Asynchronous I/O With Python 3

In this tutorial you'll go through a whirlwind tour of the asynchronous I/O facilities introduced in Python 3.4 and improved further in Python 3.5 and 3.6. 

Python previously had few great options for asynchronous programming. The new Async I/O support finally brings first-class support that includes both high-level APIs and standard support that aims to unify multiple third-party solutions (Twisted, Gevent, Tornado, asyncore, etc.).

It's important to understand that learning Python's async IO is not trivial due to the rapid iteration, the scope, and the need to provide a migration path to existing async frameworks. I'll focus on the latest and greatest to simplify a little.

There are many moving parts that interact in interesting ways across thread boundaries, process boundaries, and remote machines. There are platform-specific differences and limitations. Let's jump right in. 

Pluggable Event Loops

The core concept of async IO is the event loop. In a program, there may be multiple event loops. Each thread will have at most one active event loop. The event loop provides the following facilities:

  • Registering, executing and cancelling delayed calls (with timeouts).
  • Creating client and server transports for various kinds of communication.
  • Launching subprocesses and the associated transports for communication with an external program.
  • Delegating costly function calls to a pool of threads. 

Quick Example

Here is a little example that starts two coroutines and calls a function in delay. It shows how to use an event loop to power your program:

The AbstractEventLoop class provides the basic contract for event loops. There are many things an event loop needs to support:

  • Scheduling functions and coroutines for execution
  • Creating futures and tasks
  • Managing TCP servers
  • Handling signals (on Unix)
  • Working with pipes and subprocesses

Here are the methods related to running and stopping the event as well as scheduling functions and coroutines:

Plugging in a new Event Loop

Asyncio is designed to support multiple implementations of event loops that adhere to its API. The key is the EventLoopPolicy class that configures asyncio and allows the controlling of every aspect of the event loop. Here is an example of a custom event loop called uvloop based on the libuv, which is supposed to be much faster that the alternatives (I haven't benchmarked it myself):

That's it. Now, whenever you use any asyncio function, it's uvloop under the covers.

Coroutines, Futures, and Tasks

A coroutine is a loaded term. It is both a function that executes asynchronously and an object that needs to be scheduled. You define them by adding the async keyword before the definition:

If you call such a function, it doesn't run. Instead, it returns a coroutine object, and if you don't schedule it for execution then you'll get a warning too:

To actually execute the coroutine, we need an event loop:

That's direct scheduling. You can also chain coroutines. Note that you have to call await when invoking coroutines:

The asyncio Future class is similar to the concurrent.future.Future class. It is not threadsafe and supports the following features:

  • adding and removing done callbacks
  • cancelling
  • setting results and exceptions

Here is how to use a future with the event loop. The take_your_time() coroutine accepts a future and sets its result after sleeping for a second.

The ensure_future() function schedules the coroutine, and wait_until_complete() waits for the future to be done. Behind the curtain, it adds a done callback to the future.

This is pretty cumbersome. Asyncio provides tasks to make working with futures and coroutines more pleasant. A Task is a subclass of Future that wraps a coroutine and that you can cancel. 

The coroutine doesn't have to accept an explicit future and set its result or exception. Here is how to perform the same operations with a task:

Transports, Protocols, and Streams

A transport is an abstraction of a communication channel. A transport always supports a particular protocol. Asyncio provides built-in implementations for TCP, UDP, SSL, and subprocess pipes.

If you're familiar with socket-based network programming then you'll feel right at home with transports and protocols. With Asyncio, you get asynchronous network programming in a standard way. Let's look at the infamous echo server and client (the "hello world" of networking). 

First, the echo client implements a class called EchoClient that is derived from the asyncio.Protocol. It keeps its event loop and a message it will send to the server upon connection. 

In the connection_made() callback, it writes its message to the transport. In the data_received() method, it just prints the server's response, and in the connection_lost() method it stops the event loop. When passing an instance of the EchoClient class to the loop's create_connection() method, the result is a coroutine that the loop runs until it completes. 

The server is similar except that it runs forever, waiting for clients to connect. After it sends an echo response, it also closes the connection to the client and is ready for the next client to connect. 

A new instance of the EchoServer is created for each connection, so even if multiple clients connect at the same time, there will be no problem of conflicts with the transport attribute.

Here is the output after two clients connected:

Streams provide a high-level API that is based on coroutines and provides Reader and Writer abstractions. The protocols and the transports are hidden, there is no need to define your own classes, and there are no callbacks. You just await events like connection and data received. 

The client calls the open_connection() function that returns the reader and writer objects used naturally. To close the connection, it closes the writer. 

The server is also much simplified.

Working With Sub-Processes

Asyncio covers interactions with sub-processes too. The following program launches another Python process and executes the code "import this". It is one of Python's famous Easter eggs, and it prints the "Zen of Python". Check out the output below. 

The Python process is launched in the zen() coroutine using the create_subprocess_exec() function and binds the standard output to a pipe. Then it iterates over the standard output line by line using await to give other processes or coroutines a chance to execute if output is not ready yet. 

Note that on Windows you have to set the event loop to the ProactorEventLoop because the standard SelectorEventLoop doesn't support pipes. 

Conclusion

Don’t hesitate to see what we have available for sale and for study in the marketplace, and don't hesitate to ask any questions and provide your valuable feedback using the feed below.

Python's asyncio is a comprehensive framework for asynchronous programming. It has a huge scope and supports both low-level as well as high-level APIs. It is still relatively young and not well understood by the community. 

I'm confident that over time best practices will emerge, and more examples will surface and make it easier to use this powerful library.


No comments:

Post a Comment