/
4.14. Procedural Image Optimization (PIO)

4.14. Procedural Image Optimization (PIO)

Problem description

Images uploaded to a Magento shop are often high-resolution images slightly compressed in the .jpg or png formats.

Replacing images with modern formats like webp or avif also isn't an easy task.

Even though webp support is extensive, not all browsers on the market support that format yet.

Similar challenges occur with avif, which can represent similar quality images with an ever smaller file size.

Another factor is that image codecs work in very different ways, which might cause situations where eg. a png encoded image with lossless compression might create a smaller file than lossy avif for specific images.

Also, we need to keep in mind that jpeg does not support transparency, therefore it cannot be used for some images as a target format.

Usually, avif can encode the smallest files followed by webp, but the optimization process requires large amounts of CPU power.

Solution

Browser support

To make sure that we do not send unsupported image formats to browsers, we check Accept header in the incoming requests. It contains a list of mime types that the browser is recognizing, therefore we can confirm if webp and avif are supported and that we can serve those formats. If webp nor avif are detected as supported, we will assume that jpeg and png available as a legacy format.

Image quality

All lossy codecs contain quality parameters that represent how hard it should try to represent the same image. However, this might result in different visual quality regarding the quality parameters in different codecs. Therefore we use the SSIM visual similarity metric to determine how different the image is from the original. We then try to find quality parameters using bisection that results in close visual similarity to the specified target. This allows us to use much smaller quality parameters for avif that is usually able to reproduce images very well even with low-quality parameters, which allows us to further decrease image size.

Smallest format

In most cases file sizes depend on format in the following way: png > jpeg > webp > avif. But this isn't always true, therefore we always try to optimize images in all supported target formats, and choose the smallest format. We also optimize some formats in lossless variants as those sometimes use different algorithms that might result in smaller file sizes than lossy variants.

Resize

For thumbnails, we do not need full resolutions, as details are not visible either way. For that reason, we resize images to multiple different resolutions to use in different places on a website. Changing resolution is a lossy process as we need to replace multiple pixels with a single one, there are multiple algorithms that vary in resulting image quality and speed of performing resize. We use the lanczos3 algorithm that provides the best results but this is the most CPU-expensive algorithm.

Transparency

Because we try to optimize images in all possible formats, we have to consider the availability of transparency. When we detect that the source image contains transparency, we exclude formats that do not support that feature (jpeg). Some thumbnail URLs might request images without transparency and in those cases we will still allow the use of jpeg.

Latency

Because the optimization process might take even a few minutes per image and very few images can saturate all available CPU cores, fully optimized images might take a long time until they are available. This could result in missing images on the website and would be very undesirable. To resolve that we use 2-stage optimization. When an optimized image is not available we schedule optimization, and look for other available versions, eg. if jpeg,png,webp and avif formats are supported by the requester and we already have jpeg/png version optimized, we will send this version in the meantime. If there isn't any version available, we will perform a quick conversion: we use a less expensive image resize algorithm, that still provides good enough quality, and we target only a single file format using only an initial guess for quality parameter. Also we use the quickest preset, which results in a much larger file, but can very quickly generate images that for thumbnails are still usually smaller than full size original file. We also use separate queues, that request conversion without competing with proper optimization images in queues.

To make sure that this temporary image isn't cached for too long on CDN servers, we set only 15 minutes of cache time in response. This way we can provide good enough results in the meantime.

Prioritization

We track how active requested images are: we use a 15-minute cache for CDNs, when the image is frequently requested, we should get a request for that image every 15 minutes per CDN server. We use that fact to prioritize frequently viewed images to generate fully optimized versions for those files. This allows to reduce overall traffic quicker, especially in initial optimization when there might be millions of files to generate.

CMS images / media

We are not limited to product thumbnails, but we also can optimize static images used on CMS pages and in headers/footers. Those will always be generated with full resolution but otherwise the process is identical.