Skip to main content

Image Recognition

Release date: 5th February 2019

In this release we've introduced an image recognition service that makes it possible to identify objects, faces and text within images uploaded to Content Hub. When the service is enabled on your account, images uploaded to Content Hub are enriched with metadata that can be used to search and filter your assets. The service uses machine learning technology to identify objects, faces and text and this information is added as image metadata.

On this page we'll provide an overview of the image recognition service, show what kind of information is detected and how to use the metadata to search for images in Content Hub and Dynamic Content.

Overview
Link copied!

Images are usually uploaded to Content Hub using bulk upload scripts and will often have filenames consisting of just a product code, for example, making them difficult to search for. Image recognition makes it easier to search for and organise images using the metadata about the image itself, rather than just its name.

You can use the image recognition metadata to search for images containing particular objects such as bags and shoes, or models within a particular age range. You can even filter out unsuitable images, something that is especially useful when used together with our User Generated Content service. You can also search for images in the Dynamic Content media browser using the image recognition metadata.

Image recognition enables all kinds of possibilities when used with features such as point of interest, allowing you to detect and focus an image around a particular product.

In order to use the image recognition service, it must be enabled on your Content Hub account. Contact your Customer Success Manager to request to have the service set up. The service can be enabled on one or more of your asset stores.

Viewing image recognition metadata in Content Hub
Link copied!

When an image is uploaded to Content Hub, a request is sent to the image recognition service to analyse the image. It will return information about the image that will then be stored in the image metadata. The service will detect four types of data: objects, faces, text and unsafe content and this will correspond to the "Detected objects", "Detected text", "Detected Faces" and "Unsafe Content" sections displayed when viewing the metadata in Content Hub.

For each section there are two fields: Data and Labels. "Data" contains all the information returned by the image recognition service, while "Labels" contain the text that can be used for searching and filtering. In the woman with red beret image shown below, the service has detected objects classified as apparel, skirt, bag and handbag and we can search and filter on these labels.

Viewing the image recognition metadata

The data returned includes a confidence level, for example if someone is wearing sunglasses or if a detected object is a bag, and we only include labels for those objects and faces that have a high confidence level value.

You can also retrieve the metadata using the metadata API as explained in the Retrieving the full metadata section.

Detected faces
Link copied!

If there are images detected in the image, this information will be added to the metadata, with an entry for each face detected.

In the image below there are two faces detected and labels are added including information such as the estimated age range, facial expression and whether someone is wearing sunglasses. In this example two faces have been detected, so there are two "Detected Faces" sections.

Image recognition metadata for detected faces

Filtering and searching using recognition metadata
Link copied!

In Content Hub you can choose to filter using detected objects, faces, text and unsuitable images. In this example we've chosen Detected Objects as a filter. The menu is populated with the labels that have been added to the metadata of each image that has been processed with the image recognition service.

From the list of labels we choose "purse".

Choosing to filter from the set of labels for detected objects

With the "purse" filter is applied, only those images that contain a purse are displayed.

Filtering on those images containing a purse

We can also search within our filtered content. So if we just want those images containing bags with a tartan pattern, we can just type "tartan" in the search box and only the images containing this label will be displayed. In this case it's just a single image.

Narrowing our selection to an image containing a tartan handbag

Searching image recognition metadata in Dynamic Content
Link copied!

You can also search image recognition metadata from the media browser in Dynamic Content. This is a great way to find images to add to your content.

In the example shown below, we have chosen to add an image to a content item. In the media browser we type "tartan" in the search box and the image with the tartan bag is displayed. We can then select this image and add it to our content.

To make use of image recognition metadata with Dynamic Content, you will need to ensure that the Content Hub account linked to your Dynamic Content account has the feature enabled.

Searching for an image in Dynamic Content using image recognition metadata

Detecting text in an image
Link copied!

The image recognition service also detects text that is part of an image. In this example the image shows a woman walking with an umbrella that has some text emblazoned on it- "#rain". The service detects the text and it is added as a label to the "Detected Text" section of the image metadata. We can then search for this text.

Detected text in an image

Retrieving the full metadata
Link copied!

You can retrieve the full information returned by the image recognition service by using the metadata API and adding .json?metadata=true to the end of the image URL. Note that you need to ensure that the image recognition metadata data schema is made publishable in order to be able to retrieve the metadata using this approach. Your Customer Success Manager will be able to arrange this with our provisioning team.

The full JSON is also shown in the 'data' field of the metadata pane in Content Hub.

Here's an example of the full metadata returned for the woman in a red beret image. Retrieving the full metadata is useful if you want to know the position of an object, to use it with point of interest, for example.

{
"isImage":true,
"alpha":false,
"width":3065,
"height":4598,
"format":"JPEG",
"metadata":{
"detectedText":{
"data":null,
"textLines":null,
"id":"5d13acdb-2b52-4d32-b7d7-2b84b26aa319"
},
"detectedFaces":{
"data":{
"ageRange":{
"high":36,
"low":19
},
"boundingBox":{
"top":0.092249945,
"left":0.41700655,
"width":0.12484362,
"height":0.13042621
},
"sunglasses":{
"confidence":100,
"value":false
},
"mouthOpen":{
"confidence":59.774826,
"value":false
},
"emotions":[
{
"confidence":0.5497097,
"type":"ANGRY"
},
{
"confidence":1.3447995,
"type":"HAPPY"
},
{
"confidence":52.429367,
"type":"SURPRISED"
},
{
"confidence":10.905305,
"type":"SAD"
},
{
"confidence":0.16092487,
"type":"DISGUSTED"
},
{
"confidence":0,
"type":"CONFUSED"
},
{
"confidence":26.975567,
"type":"CALM"
}
],
"gender":{
"confidence":97.6458,
"value":"Female"
},
"beard":{
"confidence":99.86307,
"value":false
},
"pose":{
"roll":11.668715,
"pitch":6.941703,
"yaw":20.281464
},
"confidence":99.999985,
"landmarks":[
{
"x":0.46199894,
"y":0.14075498,
"type":"eyeLeft"
},
{
"x":0.521149,
"y":0.14828268,
"type":"eyeRight"
},
{
"x":0.45666224,
"y":0.18921822,
"type":"mouthLeft"
},
{
"x":0.5050401,
"y":0.19534743,
"type":"mouthRight"
},
{
"x":0.4942966,
"y":0.16859493,
"type":"nose"
},
{
"x":0.43792748,
"y":0.1277409,
"type":"leftEyeBrowLeft"
},
{
"x":0.48055947,
"y":0.12784877,
"type":"leftEyeBrowRight"
},
{
"x":0.461382,
"y":0.12387163,
"type":"leftEyeBrowUp"
},
{
"x":0.5155438,
"y":0.13239746,
"type":"rightEyeBrowLeft"
},
{
"x":0.5442879,
"y":0.1409053,
"type":"rightEyeBrowRight"
},
{
"x":0.53228,
"y":0.13247447,
"type":"rightEyeBrowUp"
},
{
"x":0.4498693,
"y":0.13957764,
"type":"leftEyeLeft"
},
{
"x":0.47307774,
"y":0.14277685,
"type":"leftEyeRight"
},
{
"x":0.46232894,
"y":0.138623,
"type":"leftEyeUp"
},
{
"x":0.46148655,
"y":0.1429718,
"type":"leftEyeDown"
},
{
"x":0.50867873,
"y":0.14721212,
"type":"rightEyeLeft"
},
{
"x":0.52974313,
"y":0.14937758,
"type":"rightEyeRight"
},
{
"x":0.5215662,
"y":0.14590569,
"type":"rightEyeUp"
},
{
"x":0.519888,
"y":0.15011361,
"type":"rightEyeDown"
},
{
"x":0.47700205,
"y":0.17342311,
"type":"noseLeft"
},
{
"x":0.49826175,
"y":0.17563091,
"type":"noseRight"
},
{
"x":0.4856338,
"y":0.18528166,
"type":"mouthUp"
},
{
"x":0.48096976,
"y":0.19928539,
"type":"mouthDown"
},
{
"x":0.46199894,
"y":0.14075498,
"type":"leftPupil"
},
{
"x":0.521149,
"y":0.14828268,
"type":"rightPupil"
},
{
"x":0.40742865,
"y":0.13887839,
"type":"upperJawlineLeft"
},
{
"x":0.41126552,
"y":0.19017029,
"type":"midJawlineLeft"
},
{
"x":0.47100845,
"y":0.2234472,
"type":"chinBottom"
},
{
"x":0.519395,
"y":0.20340215,
"type":"midJawlineRight"
},
{
"x":0.5419624,
"y":0.1553815,
"type":"upperJawlineRight"
}
],
"mustache":{
"confidence":99.99686,
"value":false
},
"smile":{
"confidence":97.04133,
"value":false
},
"quality":{
"brightness":70.10285,
"sharpness":86.86019
},
"eyesOpen":{
"confidence":97.59302,
"value":false
},
"eyeglasses":{
"confidence":99.99999,
"value":false
}
},
"faceId":"5d13acdb-2b52-4d32-b7d7-2b84b26aa319_0",
"labels":[
"20-40",
"female",
"eyes-closed",
"mouth-closed"
]
},
"detectedObjects":{
"data":[
{
"instances":null,
"confidence":99.79362,
"name":"Apparel",
"parents":null
},
{
"instances":[
{
"boundingBox":{
"top":0.4801911,
"left":0.28257713,
"width":0.49396697,
"height":0.42884958
},
"confidence":99.79362
}
],
"confidence":99.79362,
"name":"Skirt",
"parents":[
{
"name":"Clothing"
}
]
},
{
"instances":null,
"confidence":99.79362,
"name":"Clothing",
"parents":null
},
{
"instances":null,
"confidence":92.96846,
"name":"Accessories",
"parents":null
},
{
"instances":null,
"confidence":92.96846,
"name":"Accessory",
"parents":null
},
{
"instances":null,
"confidence":91.75725,
"name":"Human",
"parents":null
},
{
"instances":[
{
"boundingBox":{
"top":0.044115435,
"left":0.24598205,
"width":0.5202878,
"height":0.9390389
},
"confidence":91.75725
}
],
"confidence":91.75725,
"name":"Person",
"parents":null
},
{
"instances":[
{
"boundingBox":{
"top":0.041444357,
"left":0.3652041,
"width":0.25975224,
"height":0.12184165
},
"confidence":86.01483
}
],
"confidence":86.01483,
"name":"Hat",
"parents":[
{
"name":"Clothing"
}
]
},
{
"instances":null,
"confidence":85.07383,
"name":"Handbag",
"parents":[
{
"name":"Bag"
},
{
"name":"Accessories"
}
]
},
{
"instances":null,
"confidence":85.07383,
"name":"Bag",
"parents":null
},
{
"instances":null,
"confidence":75.28065,
"name":"Armor",
"parents":null
},
{
"instances":null,
"confidence":75.28065,
"name":"Chain Mail",
"parents":[
{
"name":"Armor"
}
]
},
{
"instances":null,
"confidence":72.7458,
"name":"Purse",
"parents":[
{
"name":"Handbag"
},
{
"name":"Bag"
},
{
"name":"Accessories"
}
]
},
{
"instances":null,
"confidence":70.483055,
"name":"Sleeve",
"parents":[
{
"name":"Clothing"
}
]
},
{
"instances":null,
"confidence":64.96331,
"name":"Coat",
"parents":[
{
"name":"Clothing"
}
]
},
{
"instances":null,
"confidence":64.96331,
"name":"Overcoat",
"parents":[
{
"name":"Coat"
},
{
"name":"Clothing"
}
]
}
],
"id":"5d13acdb-2b52-4d32-b7d7-2b84b26aa319",
"labels":[
"apparel",
"skirt",
"clothing",
"accessories",
"accessory",
"human",
"person",
"hat",
"handbag",
"bag",
"armor",
"chain mail",
"purse",
"sleeve",
"coat",
"overcoat"
]
},
"detectedUnsafeContent":{
"data":null,
"id":"5d13acdb-2b52-4d32-b7d7-2b84b26aa319",
"labels":null
}
},
"status":"ok"
}