Image recognition: finding the right image among thousands

Neil Cooper

October 23, 2019

10 mins

EcommerceProduct updatesGuides

Good product imagery is key to ensuring conversion on your e-commerce site. According to one study, 56% of users interact with product imagery before any other element of a product page. While many of these images are generated by professional studios, an increasing amount of imagery is generated by your users, content that’s been found to drive conversion by as much as 24%. Combine all your professional images with the ever expanding amount of user generated content and you have a problem- how do you find the right image among thousands, or millions of assets?

Cataloguing or tagging assets within a Digital Asset Management system is one solution, but it is often a labour intensive task. While some of this can be automated, these solutions usually require a lot of analysis, custom integrations and infrastructure to track assets through from purchasing, through photography, retouching, and eventually to your site. But there is another way to help manage and classify your assets: image recognition.

In this blog we’ll explain how image recognition can be used to take the pain out of managing thousands of product images.

What is Image Recognition?

Image Recognition is a process where software is used to analyse an image to identify features such as objects, faces and text. There are several image recognition services available, most of which can recognise a subset of everyday objects (people, clothes and vehicles, for example), scenes (such as a beach, city or room), dominant colours, faces, pornographic or “unsafe“ images and text. All of these services make use of machine learning technology.

Detected faces is one of the most interesting features of this set and the level of detail is amazing. One provider detects almost 30 landmarks on a face.

These landmarks, or the absence of them, are used to determine a number of interesting pieces of information:

Whether the face matches an existing face in your collection, or a predefined collection of celebrities
Emotion
Pose
Smile / No smile
Mouth open / closed
Eyes open / closed

The face landmarks can also detect

Beard / No Beard
Moustache / No moustache
Age (albeit with a fairly wide margin for error)
Glasses, sunglasses

If you don’t want to use one of the existing solutions, there are also a number of machine learning technologies out there to help you build your own models. These require a large data set to train and developers with experience in machine learning to build something that would be a viable alternative.

If you’re a fashion retailer, there are a couple of data sets that are of interest: DeepFashion and DeepFashion2. These both provide information about fashion that could be used to train a model to identify specific styles and features specifically relevant to clothing.

Why is Image Recognition Useful?

So how does image recognition help us to solve the core problem in asset management: organising and finding assets among thousands, or even tens of thousands? And how can it help automate the manually intensive task of classifying and tagging our assets?

Image recognition solutions are built into a number of Digital Asset Management (DAM) platforms. When assets are ingested they are automatically tagged based on features that the image recognition service discovers. This process doesn’t require any human intervention to tag assets and allows users to quickly find assets tagged as containing certain features.

You can also make use of the co-ordinates detected for objects and faces. So if your DAM or Content Management System (CMS) provides image manipulation functionality, you can crop images on the fly using the co-ordinates of, for example, someone’s face. This means that you can reuse the same image in multiple sizes and aspect ratios and be sure that a model’s face is always the focus of an image.

Viewing what image recognition detects

We combine all these features in our Content Hub and Dynamic Media products. When the image recognition service is enabled, assets are ingested and then passed to the service which captures text, faces, objects and unsafe content labels, as well as bounding box information for those labels. It also includes all the extra metadata that’s associated with faces that we showed earlier.

In Content Hub you can choose to search and filter images using the labels that have been added to the metadata of each image processed with the image recognition service. This makes it easier to find the images you need by searching using information about the image itself, rather than just its name. You could choose to display images showing models aged between 20 and 30 and carrying a bag, or men with beards wearing sunglasses, or filter using whatever combination of detected features you need to find the right image.

Dynamically cropping an image using a detected face

You can see image recognition data displayed in Content Hub in the image below. We've highlighted the bounding box for the model's face.

We can use this information with picture tags to create a responsive template that utilises Dynamic Media. Dynamic Media uses a simple URL based API that allows you to manipulate images on the fly. You can store a commonly used set of parameters in a transformation template to reuse parameters and make your URLs easier to read.

In this case we created a transformation template called $facePOI$ that includes the following:

1poi={$this.metadata.detectedFaces.data.boundingBox.left},{$this.metadata.detectedFaces.data.boundingBox.top},{$this.metadata.detectedFaces.data.boundingBox.width},{$this.metadata.detectedFaces.data.boundingBox.height}&scaleFit=poi

This uses the bounding box data from the detected face as the point of interest of the image. When the image size changes it is scaled and cropped so that, even if the orientation is changed, the focal point is always the model's face.

1<picture>
2    <source srcset="https://i1.adis.ws/i/ampproduct/woman-sitting-towerblock?$facePoi$&w=1024&sm=aspect&aspect=3:1" media="(min-width: 1024px)">
3
4    <source srcset="https://i1.adis.ws/i/ampproduct/woman-sitting-towerblock?$facePoi$&w=760&sm=aspect&aspect=2:1" media="(min-width: 760px)">
5
6    <source srcset="https://i1.adis.ws/i/ampproduct/woman-sitting-towerblock?$facePoi$&w=580&sm=aspect&aspect=16:9" media="(min-width: 580px)">
7
8    <source srcset="https://i1.adis.ws/i/ampproduct/woman-sitting-towerblock?$facePoi$&w=320&sm=aspect&aspect=1:1" media="(min-width: 320px)">
9
10    <img class="d-block img-fluid" src="https://i1.adis.ws/i/ampproduct/woman-sitting-towerblock?$facePoi$&w=1024&sm=aspect&aspect=3:1" width="100%" border=0 alt="" title=""/>
11</picture>

This enables us to render our images responsively without having to expend any effort maintaining two templates, or manually setting points of interest.

For both these images, Dynamic Media uses the co-ordinates of the face detected to ensure it remains in the crop, while taking the required aspect ratio and the asset aspect ratio into account to decide how to crop the asset. No extra metadata needs to be added manually by the user.

Some Final Thoughts…

Image Recognition enables you to automate tagging of media with useful information, providing a good enough solution to replace the very labour intensive job of identifying and classifying image content. This information can be used to make it easy to search and find media. Helping you find that one image in a thousand.

The richer positional data can be used to manipulate media dynamically and enable a level of personalisation by using tags to drive image choices. You can think of it as an enabling technology for Agile content production and content personalisation.

You can find more about how Content Hub and Dynamic Media make use of image recognition on our docs site.