/ Node.js

Understanding the Node.js Event Loop

If you're building Node.js applications, one of the most important concepts to understand is Node.js' concurrency model for handling I/O operations and its use of callback functions. Node.js is an event-driven, non-blocking, single-threaded JavaScript runtime environment built on Chrome's V8 engine which allows it to execute JavaScript code on the server-side.

In Node's architecture, this event-driven model is provided by a mechanism called the event loop. The event loop is what allows Node.js to handle high throughput I/O operations and it is provided by a multi-platform C library called libuv.

When talking about I/O operations in Node.js, the term usually refers to accessing external resources such as disks or network resources, which are the slowest type of operations. These types of I/O operations are the most time-expensive since they take longer time to complete.

Node.js' event loop was designed around this idea that waiting for I/O operations to complete are the most wasteful in computer programming. With Node's asynchronous, or 'event-driven' model for handling I/O, it can handle hundreds or thousands of connections efficiently. This means hundreds of connections will only mean hundreds of socket or state objects, and not hundreds of processes or threads as used in other programming languages. Asynchronicity is not an afterthought, but it's baked right in the design of Node.js.

Handling I/O Basics

There are three popular methods to deal with slow I/O operations. The easiest way is to process requests synchronously, that is, handle one request at a time. But this is a bad approach because it will hold up other requests. No other request will be processed until the current request is completed.

Another method is to fork a new process to handle each new request. While this is an improvement over the first approach, this does not scale well for hundreds or thousands of requests. Forking a new process is memory-expensive since resources will be allocated to each new process even when idle.

The most popular method to handling I/O requests is threads. In this method, a new thread is created to handle each new request. While this uses less memory and resources, things can get complicated easily since you have to deal with managing shared resources.

An example that uses the multithreaded approach is the Apache web server. Apache is multithreaded, it spawns a new thread for every new request. If you have used Apache before, you can see how it increasingly consumes memory as the number of concurrent connection increases and more threads are created to handle new requests. On the other hand, Nginx, a popular alternative to Apache is single-threaded. It doesn't have a high memory usage even when it handles a large number of concurrent requests.

Nginx, similar to Node.js, is single-threaded but event-driven and also uses an event loop to handle multiple requests in a single thread. Its event loop is used to handle requests for slow I/O operations without blocking the main thread of execution.

The Event Loop

The event loop is the mechanism that allows Node.js to perform non-blocking I/O operations, by offloading I/O operations to the OS (operating system) kernel whenever possible. Despite being single-threaded, Node.js takes advantage of the kernel being multithreaded which can handle multiple operations in the background.

Whenever Node.js executes your JavaScript code, the event loop starts automatically. This event loop is what enables the asynchronous style of programming in Node.js. To describe it in simple terms, it is a semi-infinite loop that performs polling and blocking calls to the system kernel until one of the operations is completed. The kernel will notify Node.js when an operation is completed so that its corresponding callback function may be added to Node's poll queue for eventual execution. Node.js exits when it no longer has any events to process.

Contrary to popular event loop diagrams found on the web, the event loop in Node.js does not run through and process a stack. The event loop goes through a set of phases with specific tasks that are handled in a round-robin fashion.

Another common misconception is that asynchronous operations in Node are always loaded off to the thread pool provided by libuv. The truth is, Node only utilizes the thread pool provided by libuv for asynchronous I/O only when there is no other way. As much as possible, the event loop will utilize first asynchronous interfaces available in the operating system before it uses the thread pool of libuv.

Libuv, by default, provides a pool of four threads. But these threads will only be used if the event loop has no other option but to use them. This default value can be overwritten to improve perfromance by setting a higher value to an environment variable called UV_THREADPOOL_SIZE.

Phases Overview

Understanding what occurs in each phase of the event loop is the key to fully understand the event loop. The following are the phases of each run of the event loop:

  • Timers - callbacks scheduled by setTimeout() and setInterval() will be processed in this phase
  • I/O callbacks - this is where callbacks from the user code will be processed
  • Poll - this is the phase of retrieving new I/O events. Node.js will also block when necessary
  • Set Immediate - where setImmediate() callbacks are processed
  • Close - where all on(close) callbacks are handled

A more detailed explanation of what happens in each phase can be found on the Node.js website.

It's also important to know that there's a FIFO (first in first out) queue of callbacks for execution in each phase. When the event loop enters a phase, it will perform any operations specific to its current phase, until the queue has been exhausted or the maximum number of callbacks has been executed. When the queue of the current phase has been exhausted or reached the callback limit, the event loop will move to the next phase, and so on.

Summary

This is just short overview of the Node.js's event-driven concurrency model and how the event loop works. Despite being single-threaded, Node.js can handle I/O operations very efficiently because of its low resource requirement per connection. This is also the reason why Node.js scales very well.

Understanding how the event loop works is an important concept to grasp so you'll know how to improve the performance of your Node.js applications. It will also help you write efficient code that can handle asynchronous calls properly.

References