The Resolving Algorithm

Learn about the resolving algorithm, how it prevents dependency hell, and circular dependencies.

We'll cover the following

The term dependency hell describes a situation whereby two or more dependencies of a program in turn depend on a shared dependency, but require different incompatible versions. Node.js solves this problem elegantly by loading a different version of a module depending on where the module is loaded from. All the merits of this feature go to the way Node.js package managers (such as npm or yarn) organize the dependencies of the application, and also to the resolving algorithm used in the require() function.

Let’s now get a quick overview of this algorithm. As we saw, the resolve() function takes a module name (which we will call moduleName) as input and it returns the full path of the module. This path is then used to load its code and also to identify the module uniquely. The resolving algorithm can be divided into the following three major branches:

  • File modules: If moduleName starts with /, it’s already considered an absolute path to the module, and it’s returned as it is. If it starts with ./, then moduleName is considered a relative path, which is calculated starting from the directory of the requiring module.

  • Core modules: If moduleName is not prefixed with / or ./, the algorithm first tries to search within the core Node.js modules.

  • Package modules: If no core module is found matching moduleName, then the search continues by looking for a matching module in the first node_ modules directory that is found by navigating up in the directory structure starting from the requiring module. The algorithm continues to search for a match by looking into the next node_modules directory up in the directory tree, until it reaches the root of the filesystem.

For file and package modules, both files and directories can match moduleName. In particular, the algorithm tries to match the following:

  • <moduleName>.js

  • <moduleName>/index.js

  • The directory/file specified in the main property of <moduleName>/package.json

The node_modules directory is actually where the package managers install the dependencies of each package. This means that, based on the algorithm we just described, each package can have its own private dependencies. For example, consider the following directory structure:

Press + to interact
Directory structure
Directory structure

In the previous example, myApp, depB, and depC all depend on depA. However, they all have their own private version of the dependency! Following the rules of the resolving algorithm, using require('depA') loads a different file depending on the module that requires it, for example:

• Calling require('depA') from /myApp/foo.js will load /myApp/node_modules/depA/index.js

• Calling require('depA') from /myApp/node_modules/depB/bar.js will load /myApp/node_modules/depB/node_modules/depA/index.js

• Calling require('depA') from /myApp/node_modules/depC/foobar.js will load /myApp/node_modules/depC/node_modules/depA/index.js

The resolving algorithm is the core part behind the robustness of the Node.js dependency management, and it makes it possible to have hundreds or even thousands of packages in an application without having collisions or problems of version compatibility.

Note: The resolving algorithm is applied transparently for us when we invoke the require() function. However, if needed, it can still be used directly by any module by simply invoking require.resolve().

The module cache

Each module is only loaded and evaluated the first time it is required because any subsequent call to require() will simply return the cached version. This should be clear by looking at the code of our custom require() function. Caching is crucial for performance, but it also has some important functional implications:

• It makes it possible to have cycles within module dependencies.

• It guarantees, to some extent, that the same instance is always returned when requiring the same module from within a given package.

The module cache is exposed via the require.cache variable, so it’s possible to directly access it if needed. A common use case is to invalidate any cached module by deleting the relative key in the require.cache variable, a practice that can be useful during testing but very dangerous if applied in normal circumstances.

Circular dependencies

Many consider circular dependencies an intrinsic design issue, but it’s something that might actually happen in a real project, so it’s useful for us to know at least how this works with CommonJS. If we look again at our custom require() function, we immediately get a glimpse of how this might work and what its caveats are.

But let’s walk together through an example to see how CommonJS behaves when dealing with circular dependencies. Let’s suppose we have the scenario represented in the illustration below:

A module called main.js requires a.js and b.js. In turn, a.js requires b.js, but b.js relies on a.js as well! It’s obvious that we have a circular dependency here as the a.js module requires the b.js module, and the b.js module requires the a.js module. Let’s have a look at the code of these two modules:

  • The a.js module:

Press + to interact
exports.loaded = false;
const b = require('./b')
module.exports = {
b,
loaded: true // overrides the previous export
}
  • The b.js Module:

Press + to interact
exports.loaded = false;
const a = require('./a');
module.exports = {
a,
loaded: true
};

Now, let’s see how these modules are required by the main.js module:

Press + to interact
main.js
a.js
b.js
const a = require('./a')
const b = require('./b')
console.log('a ->', JSON.stringify(a, null, 2))
console.log('b ->', JSON.stringify(b, null, 2))

The result reveals the caveats of circular dependencies with CommonJS; different parts of our application will have a different view of what is exported by the a.js module and the b.js module, depending on the order in which those dependencies are loaded. While both the modules are completely initialized as soon as they’re required from the main.js module, the a.js module will be incomplete when it is loaded from the b.js module. In particular, its state will be the one that it reached the moment b.js was required.

In order to understand what happens behind the scenes better, let’s analyze step by step how the different modules are interpreted and how their local scope changes along the way:

The steps are as follows:

  1. The processing starts in the main.js module, which immediately requires the a.js module.

  2. The first thing that the a.js module does is set an exported value called loaded to false.

  3. At this point, the a.js module requires the b.js module.

  4. Like the a.js module, the first thing that the b.js module does is set an exported value called loaded to false.

  5. Now, the b.js module requires a.js (cycle).

  6. Since a.js has already been traversed, its currently exported value is immediately copied into the scope of the b.js module.

  7. The b.js module finally changes the loaded value to true.

  8. Now that the b.js module has been fully executed, the control returns to the a.js module, which now holds a copy of the current state of the b.js module in its own scope.

  9. The last step of the a.js module is to set its loaded value to true.

  10. The a.js module is now completely executed, and the control returns to the main.js module, which now has a copy of the current state of the a.js module in its internal scope.

  11. The main.js module requires the b.js module, which is immediately loaded from cache.

  12. The current state of the b.js module is copied into the scope of the main.js module where we can finally see the complete picture of what the state of every module is.

As described earlier, the issue here is that the b.js module has a partial view of the a.js module, and this partial view gets propagated over when the b.js module is required in the main.js module. This behavior should spark an intuition that can be confirmed if we swap the order in which the two modules are required in the main.js module. If we actually try this, we’ll see that this time it’ll be the a.js module that will receive an incomplete version of the b.js module.

We understand that this can become quite a fuzzy business if we lose control of which module is loaded first, which can happen quite easily if the project is big enough, but don’t worry; we’ll go through everything in detail.