4.14. Procedural Image Optimization (PIO)
Problem description
Images uploaded to magento shop are often high resolution lightly compressed images in jpg or png format.
Replacing images with modern formats like webp or avif isn't also easy task.
Even if webp support is very broad, not all browsers on market support that format yet.
Even worse is with avif, that can represent similar quality images with even less bytes.
Another factor is that image codecs work in very different ways, that might cause situations where eg. png encoded image with lossless compression might create smaller file than lossy avif for specific images.
Also we need to keep in mind that jpeg does not support transparency, therefore it cannot be used for some images as target format.
Avif is able to encode usually smallest files followed by webp, but optimization process required very large amounts of CPU power.
Solution
Browser support
To make sure that we do not send unsupported image format to browsers, we check Accept
header in incoming request, it contains list of mime types that browser is recognizing, therefore we can confirm if webp and avif is supported and that we can serve those formats. If webp nor avif is detected as supported, we will assume that jpeg and png available as legacy format.
Image quality
All lossy codecs contain quality parameter that represent how hard it should try to represent the same image. But this might result in different visual quality in regard to quality parameter in different codecs. Therefore we use SSIM visual similarity metric to determine how different image is from original. We then try to find quality parameter using bisection that result in closes visual similarity to specified target. This allows us to use much smaller quality parameters for avif that is usually able to very well reproduce images even with low quality parameter, that allows us to further decrease image size.
Smallest format
In most cases file sizes depend from format in following way: png > jpeg > webp > avif. But this isn't always true, therefore we always try do optimize image in all supported by target formats, and choice smallest format. We also optimize some format in lossless variant as those sometimes use different algorithm that might result in smaller file size than lossy variant.
Resize
For thumbnails, we do not need full resolutions, as details are not visible either way. For that reason we resize image to multiple different resolutions to use in different places on a website. Changing resolution is lossy process as we need to replace multiple pixels with single one, there are multiple algorithms that vary in resulting image quality and speed of performing resize. We use lanczos3 algorithm that provides best results but this is most CPU expensive algorithm.
Transparency
Because we try try to optimize image in all possible formats, we have to consider availability of transparency. When we detect that source image contains transparency, we exclude formats that do not support that feature (jpeg). Some thumbnail urls might request images without transparency and in those cases we will still allow use of jpeg.
Latency
Because optimization process might take even few minutes per image and very few images can saturate all available cpu cores, fully optimized images might take long time until they will be available. This could result in missing images on website and would be very undesirable. To resolve that we use 2 stage optimization. When optimized image is not available we schedule optimization, and look for other available versions, eg. if jpeg,png,webp and avif formats are supported by requester and we already have jpeg/png version optimized, we will send this version in meantime. If there isn't any version available, we will perform quick conversion: we use less expensive image resize algorithm, that is still provide good enough quality, and we target only single file format using only initial guess for quality parameter. Also we use quickest preset, that results much larger file, but is able to very quickly generate image that for thumbnails is still usually smaller than full size original file. We also use separate queue, that request conversion without competing with proper optimization images in queues.
To make sure that this temporary image isn't cached for too long on CDN servers, we set only 15min cache time in response. This way we can provide good enough results in meantime.
Prioritization
We track how active requested images are, because we use 15min cache for CDNs, whe image is frequently requested, we should get request for that image every 15min per CDN server. We use that fact to prioritize frequently viewed images to generate fully optimized versions for those files. This allows to reduce overall traffic quicker, especially in initial optimization when there might be millions of files to generate.
CMS images / media
We are not limited to product thumbnails, but we also are able to optimize static images used on CMS pages and in headers/footers. Those wil always be generated with full resolution, but otherwise process is identical.