all repos — gemini-redirect @ 697f46b06bde25e3fbc5ea43480eedc1ac516769

content/asyncio/post.md (view raw)

  1```meta
  2created: 2018-06-13
  3updated: 2018-06-15
  4```
  5
  6An Introduction to Asyncio
  7==========================
  8
  9Index
 10-----
 11
 12* [Background](#background)
 13* [Input / Output](#inputoutput)
 14* [Diving In](#divingin)
 15* [A Toy Example](#toyexample)
 16* [A Real Example](#example)
 17* [Extra Material](#extra)
 18
 19
 20Background
 21----------
 22
 23After seeing some friends struggle with `asyncio` I decided that it could be a good idea to write a blog post using my own words to explain how I understand the world of asynchronous IO. I will focus on Python's `asyncio` module but this post should apply to any other language easily.
 24
 25So what is `asyncio` and what makes it good? Why don't we just use the old and known threads to run several parts of the code concurrently, at the same time?
 26
 27The first reason is that `asyncio` makes your code easier to reason about, as opposed to using threads, because the amount of ways in which your code can run grows exponentially. Let's see that with an example. Imagine you have this code:
 28
 29```python
 30def method():
 31	line 1
 32	line 2
 33	line 3
 34	line 4
 35	line 5
 36```
 37
 38And you start two threads to run the method at the same time. What is the order in which the lines of code get executed? The answer is that you can't know! The first thread can run the entire method before the second thread even starts. Or it could be the first thread that runs after the second thread. Perhaps both run the "line 1", and then the line 2. Maybe the first thread runs lines 1 and 2, and then the second thread only runs the line 1 before the first thread finishes.
 39
 40As you can see, any combination of the order in which the lines run is possible. If the lines modify some global shared state, that will get messy quickly.
 41
 42Second, in Python, threads *won't* make your code faster. It will only increase the concurrency of your program, allowing you to run several things at the same time, so using threads for speed isn't a real advantage. Indeed, your code will probably run slower under the most common Python implementation, CPython, which makes use of a Global Interpreter Lock (GIL) that only lets a thread run at once.
 43
 44
 45Input / Output
 46--------------
 47
 48Before we go any further, let's first stop to talk about input and output, commonly known as "IO". There are two main ways to perform IO operations, such as reading or writing from a file or a network socket.
 49
 50The first one is known as "blocking IO". What this means is that, when you try performing IO, the current application thread is going to *block* until the Operative System can tell you it's done. Normally, this is not a problem, since disks are pretty fast anyway, but it can soon become a performance bottleneck. And network IO will be much slower than disk IO!
 51
 52```python
 53# "open" will block until the OS creates a new file in the disk.
 54#        this can be really slow if the disk is under heavy load!
 55with open('hello.txt', 'w') as fd:
 56    fd.write('hello!\n')
 57
 58    # "flush" will block until the OS has written all data to disk*
 59    fd.flush()
 60
 61# * the reality is a bit more complicated, since writes to disk are
 62#   quite expensive, the OS will normally keep the data in RAM until
 63#   it has more stuff to write to disk, and then it will `sync`
 64#   everything after a few seconds
 65```
 66
 67Blocking IO offers timeouts, so that you can get control back in your code if the operation doesn't finish. Imagine that the remote host doesn't want to reply, your code would be stuck for as long as the connection remains alive!
 68
 69But wait, what if we make the timeout small? Very, very small? If we do that, we will never block waiting for an answer. That's how asynchronous IO works, and it's the opposite of blocking IO (you can also call it non-blocking IO if you want to).
 70
 71How does non-blocking IO work if the IO device needs a while to answer with the data? In that case, the operative system responds with "not ready", and your application gets control back so it can do other stuff while the IO device completes your request. It works a bit like this:
 72
 73```python
 74<app> Hey, I would like to read 16 bytes from this file
 75<OS> Okay, but the disk hasn't sent me the data yet
 76<app> Alright, I will do something else then
 77(a lot of computer time passes)
 78<app> Do you have my 16 bytes now?
 79<OS> Yes, here they are! "Hello, world !!\n"
 80```
 81
 82In reality, you can tell the OS to notify you when the data is ready, as opposed to polling (constantly asking the OS whether the data is ready yet or not), which is more efficient.
 83
 84But either way, that's the difference between blocking and non-blocking IO, and what matters is that your application gets to run more without ever needing to wait for data to arrive, because the data will be there immediately when you ask, and if it's not yet, your app can do more things meanwhile.
 85
 86
 87Diving In
 88---------
 89
 90Now we've seen what blocking and non-blocking IO is, and how threads make your code harder to reason about, but they give concurrency (yet not more speed). Is there any other way to achieve this concurrency that doesn't involve threads? Yes! The answer is `asyncio`.
 91
 92So how does `asyncio` help? First we need to understand a very crucial concept before we can dive any deeper, and I'm talking about the *event loop*. What is it and why do we need it?
 93
 94You can think of the event loop as a *loop* that will be responsible for calling your `async` functions:
 95
 96![The Event Loop](eventloop.svg)
 97
 98That's silly you may think. Now not only we run our code but we also have to run some "event loop". It doesn't sound beneficial at all. What are these events? Well, they are the IO events we talked about before!
 99
100`asyncio`'s event loop is responsible for handling those IO events, such as file is ready, data arrived, flushing is done, and so on. As we saw before, we can make these events non-blocking by setting their timeout to 0.
101
102Let's say you want to read from 10 files at the same time. You will ask the OS to read data from 10 files, and at first none of the reads will be ready. But the event loop will be constantly asking the OS to know which are done, and when they are done, you will get your data.
103
104This has some nice advantages. It means that, instead of waiting for a network request to send you a response or some file, instead of blocking there, the event loop can decide to run other code meanwhile. Whenever the contents are ready, they can be read, and your code can continue. Waiting for the contents to be received is done with the `await` keyword, and it tells the loop that it can run other code meanwhile:
105
106![Step 1, await keyword](awaitkwd1.svg)
107
108![Step 2, await keyword](awaitkwd2.svg)
109
110Start reading the code of the event loop and follow the arrows. You can see that, in the beginning, there are no events yet, so the loop calls one of your functions. The code runs until it has to `await` for some IO operation to complete, such as sending a request over the network. The method is "paused" until an event occurs (for example, an "event" occurs when the request has been sent completely).
111
112While the first method is busy, the event loop can enter the second method, and run its code until the first `await`. But it can happen that the event of the second query occurs before the request on the first method, so the event loop can re-enter the second method because it has already sent the query, but the first method isn't done sending the request yet.
113
114Then, the second method `await`'s for an answer, and an event occurs telling the event loop that the request from the first method was sent. The code can be resumed again, until it has to `await` for a response, and so on.
115
116There are some important things to note here. The first is that we only need one thread to be running! The event loop decides when and which methods should run. The second is that we know when it may run other methods. Those are the `await` keywords! Whenever there is one of those, we know that the loop is able to run other things until the resource (again, like network) becomes ready.
117
118So far, we already have two advantages. We are only using a single thread so the cost for switching between methods is low, and we can easily reason about where our program may interleave operations.
119
120Another advantage is that, with the event loop, you can easily schedule when a piece of code should run, such as using the method [`loop.call_at`](https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.loop.call_at), without the need for spawning another thread at all.
121
122To tell the `asyncio` to run the two methods shown above, we can use [`asyncio.ensure_future`](https://docs.python.org/3/library/asyncio-future.html#asyncio.ensure_future), which is a way of saying "I want the future of my method to be ensured". That is, you want to run your method in the future, whenever the loop is free to do so. This method returns a `Future` object, so if your method returns a value, you can `await` this future to retrieve its result.
123
124What is a `Future`? This object represents the value of something that will be there in the future, but might not be there yet. Just like you can `await` your own `async def` functions, you can `await` these `Future`'s.
125
126The `async def` functions are also called "coroutines", and Python does some magic behind the scenes to turn them into such. The coroutines can be `await`'ed, and this is what you normally do.
127
128
129A Toy Example
130-------------
131
132That's all about `asyncio`! Let's wrap up with some example code. We will create a server that replies with the text a client sends, but reversed. First, we will show what you could write with normal synchronous code, and then we will port it.
133
134Here is the **synchronous version**:
135
136```python
137# server.py
138import socket
139
140
141def server_method():
142	# create a new server socket to listen for connections
143	server = socket.socket()
144
145	# bind to localhost:6789 for new connections
146	server.bind(('localhost', 6789))
147
148	# we will listen for one client at most
149	server.listen(1)
150
151	# *block* waiting for a new client
152	client, _ = server.accept()
153
154	# *block* waiting for some data
155	data = client.recv(1024)
156
157	# reverse the data
158	data = data[::-1]
159
160	# *block* sending the data
161	client.sendall(data)
162
163	# close client and server
164	server.close()
165	client.close()
166
167
168if __name__ == '__main__':
169	# block running the server
170	server_method()
171```
172
173```python
174# client.py
175import socket
176
177
178def client_method():
179	message = b'Hello Server!\n'
180	client = socket.socket()
181
182	# *block* trying to stabilish a connection
183	client.connect(('localhost', 6789))
184
185	# *block* trying to send the message
186	print('Sending', message)
187	client.sendall(message)
188
189	# *block* until we receive a response
190	response = client.recv(1024)
191	print('Server replied', response)
192
193	client.close()
194
195
196if __name__ == '__main__':
197	client_method()
198```
199
200From what we've seen, this code will block on all the lines with a comment above them saying that they will block. This means that for running more than one client or server, or both in the same file, you will need threads. But we can do better, we can rewrite it into `asyncio`!
201
202The first step is to mark all your `def`initions that may block with `async`. This marks them as coroutines, which can be `await`ed on.
203
204Second, since we're using low-level sockets, we need to make use of the methods that `asyncio` provides directly. If this was a third-party library, this would be just like using their `async def`initions.
205
206Here is the **asynchronous version**:
207
208```python
209# server.py
210import asyncio
211import socket
212
213# get the default "event loop" that we will run
214loop = asyncio.get_event_loop()
215
216
217# notice our new "async" before the definition
218async def server_method():
219	server = socket.socket()
220	server.bind(('localhost', 6789))
221	server.listen(1)
222
223	# await for a new client
224	# the event loop can run other code while we wait here!
225	client, _ = await loop.sock_accept(server)
226
227	# await for some data
228	data = await loop.sock_recv(client, 1024)
229	data = data[::-1]
230
231	# await for sending the data
232	await loop.sock_sendall(client, data)
233
234	server.close()
235	client.close()
236
237
238if __name__ == '__main__':
239	# run the loop until "server method" is complete
240	loop.run_until_complete(server_method())
241```
242
243```python
244# client.py
245import asyncio
246import socket
247
248loop = asyncio.get_event_loop()
249
250
251async def client_method():
252	message = b'Hello Server!\n'
253	client = socket.socket()
254
255	# await to stabilish a connection
256	await loop.sock_connect(client, ('localhost', 6789))
257
258	# await to send the message
259	print('Sending', message)
260	await loop.sock_sendall(client, message)
261
262	# await to receive a response
263	response = await loop.sock_recv(client, 1024)
264	print('Server replied', response)
265
266	client.close()
267
268
269if __name__ == '__main__':
270	loop.run_until_complete(client_method())
271```
272
273That's it! You can place these two files separately and run, first the server, then the client. You should see output in the client.
274
275The big difference here is that you can easily modify the code to run more than one server or clients at the same time. Whenever you `await` the event loop will run other of your code. It seems to "block" on the `await` parts, but remember it's actually jumping to run more code, and the event loop will get back to you whenever it can.
276
277In short, you need an `async def` to `await` things, and you run them with the event loop instead of calling them directly. So this…
278
279```python
280def main():
281	...  # some code
282
283
284if __name__ == '__main__':
285	main()
286```
287
288…becomes this:
289
290```python
291import asyncio
292
293
294async def main():
295	...  # some code
296
297
298if __name__ == '__main__':
299	asyncio.get_event_loop().run_until_complete(main)
300```
301
302This is pretty much how most of your `async` scripts will start, running the main method until its completion.
303
304
305A Real Example
306--------------
307
308Let's have some fun with a real library. We'll be using [Telethon](https://github.com/LonamiWebs/Telethon) to broadcast a message to our three best friends, all at the same time, thanks to the magic of `asyncio`. We'll dive right into the code, and then I'll explain our new friend `asyncio.wait(...)`:
309
310```python
311# broadcast.py
312import asyncio
313import sys
314
315from telethon import TelegramClient
316
317# (you need your own values here, check Telethon's documentation)
318api_id = 123
319api_hash = '123abc'
320friends = [
321	'@friend1__username',
322	'@friend2__username',
323	'@bestie__username'
324]
325
326# we will have to await things, so we need an async def
327async def main(message):
328	# start is a coroutine, so we need to await it to run it
329	client = await TelegramClient('me', api_id, api_hash).start()
330
331	# wait for all three client.send_message to complete
332	await asyncio.wait([
333		client.send_message(friend, message)
334		for friend in friends
335	])
336
337	# and close our client
338	await client.disconnect()
339
340
341if __name__ == '__main__':
342	if len(sys.argv) != 2:
343		print('You must pass the message to broadcast!')
344		quit()
345
346	message = sys.argv[1]
347	asyncio.get_event_loop().run_until_complete(main(message))
348```
349
350Wait… how did that send a message to all three of
351my friends? The magic is done here:
352
353```python
354[
355	client.send_message(friend, message)
356	for friend in friends
357]
358```
359
360This list comprehension creates another list with three
361coroutines, the three `client.send_message(...)`.
362Then we just pass that list to `asyncio.wait`:
363
364```python
365await asyncio.wait([...])
366```
367
368This method, by default, waits for the list of coroutines to run until they've all finished. You can read more on the Python [documentation](https://docs.python.org/3/library/asyncio-task.html#asyncio.wait). Truly a good function to know about!
369
370Now whenever you have some important news for your friends, you can simply `python3 broadcast.py 'I bought a car!'` to tell all your friends about your new car! All you need to remember is that you need to `await` on coroutines, and you will be good. `asyncio` will warn you when you forget to do so.
371
372
373Extra Material
374--------------
375
376If you want to understand how `asyncio` works under the hood, I recommend you to watch this hour-long talk [Get to grips with asyncio in Python 3](https://youtu.be/M-UcUs7IMIM) by Robert Smallshire. In the video, they will explain the differences between concurrency and parallelism, along with others concepts, and how to implement your own `asyncio` "scheduler" from scratch.