Why asset pre-processing?
While working on installing Sitecore 9 in a Docker environment I soon realised that moving some steps out of the container/image build process may save quite some time during the development process.
On each build test, all assets (files) that are added to the container must be transferred from the development machine to the Docker daemon location.
The Docker cache, in charge of speeding up the build process, start working once the files have already been transferred to the daemon’s build context, therefore the cache cannot avoid the step of copying the assets. Even, when the container’s image is already built, the local files must be copied every time the build process is run.
A Sitecore package such as “Sitecore 9.0.2 rev. 180604 (WDP XM1 packages).zip” weights near 500 MB. Copying it takes time and makes the container grow a lot!
Best practices suggest that only the assets that are strictly need in the container should be copied in. Extracting the Sitecore packages that are needed before copying them to the Docker build helps to adhere to best practices.
For a more detailed discussion about the alternatives to asset pre-processing, please see section “Discarded Alternatives”, below.
Asset Pre-processing Implementation
In this context, it makes sense to separate common processing logic from the details of each specific package installation.
With this idea in mind, I created the Bendev.Assets.Management module, which encapsulates the common logic required to decompress nested zip files and also to transform some of the assets such as the JSON configuration files for the Sitecore Installation Framework (SIF).
The Bendev.Assets.Management module takes a configuration that contains the details of the Sitecore package to be processed and follows the declarative instructions saving the results in the local container context (in the development environment). This way, when the Docker build command is executed, only the required files are transferred to the Docker daemon context.
Avoiding copying assets to the container is just part of the time saving. The Bendev.Assets.Management module detects whether the assets have already been decompressed and skips doing it if it is not necessary. This also saves the time of extracting the zip files once and again during the image development. It is a quite relevant save of time.
There are four ways to get assets into containers without pre-processing:
- COPY instruction in the Dockerfile
- ADD instruction in the Dockerfile
- Download the asset with PowerShell within the same RUN instruction, in the Docker file file, that deletes it after its use.
- Multi-staging: with this technique, assets are added to a temporary container from which are copied again to the final container. This is a kind of pre-processing that is run within a temporary container.
The two first alternatives create a layer, let’s call it Li, within the Docker image with the included assets, the rest of installation steps, defined with the clause RUN in the Dockerfile file, are stored on a new layer, let’s call it Li+1, top of the previous one (Li), without altering it. Therefore, even if the package is deleted with PowerShell immediately after it has been added, the overall size of the Docker image does not reduce because the deletion is a logical operation made over one layer (Li+1) while the original package is till available in the previous layer (Li).
Although we will not get into the detail of why images should be as small as possible, it worth to mention that keeping them as thin as possible is a best practice.
The third alternative above, can help to reduce the size of the container because the transference of the asset is done in the same RUN instruction that its deletion (within the same layer), therefore the file is deleted. However, it presents the following limitations while working with Sitecore:
- The official Sitecore packages require the credentials of a certified Sitecore developer. Automating the build process by providing personal credentials is not ideal.
- The need of personal credentials in an automated build process can be avoided by manually placing the Sitecore packages in a private location where they are accessible by the Docker daemon during the build process. However, this would require custom infrastructure that would make the build process very bounded to the infrastructure where it runs. Best practices do not recommend it.
- The issue with the time required to transfer the assets into the container is still not solved with this method.
The fourth method above, the Multi-stage one, can solve the issue with the container size. However, it increases the time to build the image because the assets are transferred twice. The first time, from the development environment to the staging container and the second, from the staging container to the final image.