The journey to fast production asset builds with Webpack
We’re proud to say that our Webpack-powered build system, responsible for over 13,200 assets and their source maps, finishes in four minutes on average. This fast build time is the result of countless hours of optimizing. What follows is our journey to achieving such speed, and what we’ve discovered along the way.
At Etsy, our frontend consists of over 12,000 modules that eventually get bundled into over 1,200 static assets. Each asset needs to be localized and minified, both of which are time-consuming tasks. Furthermore, our production asset builds were limited to using 32 CPU cores and 64GB of RAM. Etsy had not yet moved to the cloud when we started migrating to Webpack, and these specs were of the beefiest on-premise hosts available. This meant we couldn’t just add more CPU/RAM to achieve faster builds.
So, to recap:
- Our frontend consists of over 1,200 assets made up of over 12,000 modules.
- Each asset needs to be localized and minified as part of productionization.
- We are limited to 32 CPU cores and 64GB of RAM.
- Production asset builds need to finish in less than five minutes on average.
We got this.
General caching solutions help reduce build times, idempotent of localization’s multiplicative factor. After we solved the essential problems of resolving our module dependencies and loading our custom code with Webpack, we incorporated community solutions like cache-loader and babel-loader’s caching options. These solutions cache intermediary artifacts of the build process, which can be time-consuming to calculate. As a result, asset builds after the initial one finish much faster. Still though, we needed more than caching to build localized assets quickly.
One of the first search results for Webpack localization was the now-deprecated i18n-webpack-plugin. It expects a separate Webpack configuration for each locale, leading to a separate production asset build per locale. Even though Webpack supports multiple configurations via its MultiCompiler mode, the documentation crucially points out that “each configuration is only processed after the previous one has finished processing.” At this stage in our process, we measured that a single production asset build without minification was taking ~3.75 minutes with no change to modules and a hot cache (a no-op build). It would take us ~3.75 × 11 = ~41.25 minutes to process all localized configurations for a no-op build.
We also ruled out using this plugin with a common solution like parallel-webpack to process configurations in parallel. Each parallel production asset build requires additional CPU and RAM, and the sum far exceeded the 32 CPU cores and 64GB of RAM available. Even when we limited the parallelism to stay under our resource limits, we were met with overall build times of ~15 minutes for a no-op build. It was clear we need to approach localization differently.
For a different locale, the message catalog contains analogous localization strings for the locale. We programmatically handle generating analogous message catalogs with a custom Webpack loader that applies whenever Webpack encounters an import for localizations. If we wanted to build Spanish assets, for example, the loader would look something like this:
Second, once we build the localized code and output localized assets, the only differing lines between copies of the same asset from different locales are the lines with localization strings; the rest are identical. When we build the above example with English and Spanish localizations, the diff of the resulting assets confirms this:
Even when caching intermediary artifacts, our Webpack configuration would spend over 50% of the overall build time constructing the bundled code of an asset. If we provided separate Webpack configurations for each locale, we would repeat this expensive asset construction process eleven times.
We could never finish this amount of work within our build-time constraints, and as we saw before, the resulting localized variants of each asset would be identical except for the few lines with localizations. What if, rather than locking ourselves into loading a specific locale’s localization and repeating an asset build for each locale, we returned a placeholder where the localizations should go?
We call this approach “localization inlining”, and it was actually how Builda localized its assets too. Although our production asset builds write these sentinel assets to disk, we do not serve them to users. They are only used to derive the localized assets.
With localization inlining, we were able to generate all of our localized assets from one production asset build. This allowed us to stay within our resource limits; most of Webpack’s CPU and RAM usage is tied to calculating and generating assets from the modules it has loaded. Adding additional files to be written to disk does not increase resource utilization as much as running an additional production asset build does.
Now that a single production asset build was responsible for over 13,200 assets, though, we noticed that simply writing this many assets to disk substantially increased build times. It turns out, Webpack only uses a single thread to write a build’s assets to disk. To address this bottleneck, we included logic to write a new localized asset only if the localizations or the sentinel asset have changed — if neither have changed, then the localized asset hasn’t changed either. This optimization greatly reduced the amount of disk writing after the initial production asset build, allowing subsequent builds with a hot cache to finish up to 1.35 minutes faster. A no-op build without minification consistently finished in ~2.4 minutes. With a comprehensive solution for localization in place, we then focused on adding minification.
Out of the box, Webpack includes the terser-webpack-plugin for asset minification. Initially, this plugin seemed to perfectly address our needs. It offered the ability to parallelize minification, cache minified results to speed up subsequent builds, and even extract license comments into a separate file.
When we added this plugin to our Webpack configuration, though, our initial asset build suddenly took over 40 minutes and used up to 57GB of RAM at its peak. We expected the initial build to take longer than subsequent builds and that minification would be costly, but this was alarming. Enabling any form of production source maps also dramatically increased the initial build time by a significant amount. Without the terser-webpack-plugin, the initial production asset build with localizations would finish in ~6 minutes. It seemed like the plugin was adding an unknown bottleneck to our builds, and ad hoc monitoring with htop during the initial production asset build seemed to confirmed our suspicions:
At some points during the minification phase, we appeared to only use a single CPU core. This was surprising to us because we had enabled parallelization in terser-webpack-plugin’s options. To get a better understanding of what was happening, we tried running strace on the main thread to profile the minification phase:
At the start of minification, the main thread spent a lot of time making memory syscalls (mmap and munmap). Upon closer inspection of terser-webpack-plugin’s source code, we found that the main thread needed to load the contents of every asset to generate parallelizable minification jobs for its worker threads. If source maps were enabled, the main thread also needed to calculate each asset’s corresponding source map. These lines explained the flood of memory syscalls we noticed at the start.
Further into minification, the main thread started making recvmsg and write syscalls to communicate between threads. We corroborated these syscalls when we found that the main thread needed to serialize the contents of each asset (and source maps if they were enabled) to send it to a worker thread to be minified. After receiving and deserializing a minification result received from a worker thread, the main thread was also solely responsible for caching the result to disk. This explained the stat, open, and other write syscalls we observed because the Node.js code promises to write the contents. The underlying epoll_wait syscalls then poll to check when the writing finishes so that the promise can be resolved.
The main thread can become a bottleneck when it has to perform these tasks for a lot of assets, and considering our production asset build could produce over 13,200 assets, it was no wonder we hit this bottleneck. To minify our assets, we would have to think of a different way.
We opted to minify our production assets outside of Webpack, in what we call “post-processing”. We split our production asset build into two stages, a Webpack stage and a post-processing stage. The former is responsible for generating and writing localized assets to disk, and the latter is responsible for performing additional processing on these assets, like minification:
For minification, we use the same terser library the terser-webpack-plugin uses. We also baked parallelization and caching into the post-processing stage, albeit in a different way than the plugin. Where Webpack’s plugin reads the file contents on the main thread and sends the whole contents to the worker threads, our parallel-processing jobs send just the file path to the workers. A worker is then responsible for reading the file, minifying it, and writing it to disk. This reduces memory usage and facilitates more efficient parallel-processing. To implement caching, the Webpack stage passes along the list of assets written by the current build to tell the post-processing stage which files are new. Sentinel assets are also excluded from post-processing because they aren’t served to users.
Splitting our production asset builds into two stages does have a potential downside: our Webpack configuration is now expected to output un-minified text for assets. Consequently, we need to audit any third-party plugins to ensure they do not transform the outputted assets in a format that breaks post-processing. Nevertheless, post-processing is well worth it because it allows us to achieve the fast build times we expect for production asset builds.
Bonus: source maps
We don’t just generate assets in under five minutes on average — we also generate corresponding source maps for all of our assets too. Source maps allow engineers to reconstruct the original source code that went into an asset. They do so by maintaining a mapping of the output lines of a transformation, like minification or Webpack bundling, to the input lines. Maintaining this mapping during the transformation process, though, inherently adds time.
Coincidentally, the same localization characteristics that enable localization inlining also enable faster source map generation. As we saw earlier, the only differences between localized assets are the lines containing localization strings. Subsequently, these lines with localization strings are the only differences between the source maps for these localized assets. For the rest of the lines, the source map for one localized asset is equivalently accurate for another because each line is at the same line number between localized assets.
If we were to generate source maps for each localized asset, we would end up repeating resource-intensive work only to result in nearly identical source maps across locales. Instead, we only generate source maps for the sentinel assets the localized assets are derived from. We then use the sentinel asset’s source map for each localized asset derived from it, and accept that the mapping for the lines with localization strings will be incorrect. This greatly speeds up source map generation because we are able to reuse a single source map that applies to many assets.
For the minification transformation that occurs during post-processing, terser accepts a source map alongside the input to be minified. This input source map allows terser to account for prior transformations when generating source mappings for its minification. As a result, the source map for its minified results still maps back to the original source code before Webpack bundling. In our case, we pass terser the sentinel asset’s source map for each localized asset derived from it. This is only possible because we aren’t using terser-webpack-plugin, which (understandably) doesn’t allow mismatched asset/source map pairings.
Through these source map optimizations, we are able to maintain source maps for all assets while only adding ~1.7 minutes to our build time average. Our unique approach can result in up to a 70% speedup in source map generation compared to out-of-the-box options offered by Webpack, a dramatic reduction in the time.
Our journey to achieving fast production builds can be summed up into three principles: reduce, reuse, recycle.
Reduce the workload on Webpack’s single thread. This goes beyond applying parallelization plugins and implementing better caching. Investigating our builds led us to discover single-threaded bottlenecks like minification, and after implementing our own parallelized post-processing we observed significantly faster build times.
The more existing work our production build can reuse, the less it has to do. Thanks to the convenient circumstances of our production setup, we are able to reuse source maps and apply them to more than one asset each. This avoids a significant amount of unnecessary work when generating source maps, a time-intensive process.
When we can’t reuse existing work, figuring out how to recycle it is equally valuable. Deriving localized assets from sentinel assets allows us to recycle the expensive work of producing an asset from an entrypoint, further speeding up builds.
While some implementation details may become obsolete as Webpack and the frontend evolve, these principles will continue to guide us towards faster production builds.
Posted by Allie Jones on 04 Nov, 2021