Zero Shot Learning

The ability of an algorithm to classify an image that it has not seen before.

Zero shot learning refers to the ability of an algorithm to classify (i.e., label) an image of something that it has not seen before during training. For example, say the algorithm is classifying images of fruits. During training it might have seen images labelled as bananas, and apples, but it has never seen an image labelled as an orange. Yet, when the algorithm is tested, it is able to correctly classify bananas, apples and oranges. This might seem like the algorithm is very good at guessing. However, this ability actually relies on the algorithm being capable of ‘inferring’ (i.e., reasoning) from what is called ‘auxiliary information.’

Auxiliary information is additional information that might describe an image but not actually label it. For example, during training the algorithm might see an image of a fruit basket which contains 6 apples, 2 bananas, and 5 oranges. The description of the image might only say “this is a basket containing a number of different fruits, including 5 oranges.” The algorithm has never been explicitly told that the oranges are oranges, but it can ‘infer’ that the object that appears five times is an orange. This means that the next time it sees an image of an orange tree it is able to correctly classify it. 

The same approach can be used to train algorithms to correctly classify medical images, for example images of lung cancer, without having to provide the algorithm with a labelled training dataset that includes an example of every single possible lung cancer tumour. This reduces the need to provide algorithms with very large, accurately labelled datasets which can be extremely difficult, and so makes algorithm training more efficient.

Zero shot learning is a type of transfer learning that has become extremely important in the fields of natural language programming and generative AI because it enables a model to take on a new task that it was not explicitly trained to do. This capability, which has traditionally been used for image classification, reduces the need to provide algorithms with large perfectly labelled datasets during training; reduces the need for one-algorithm-per-task set-ups; and makes the process of developing new and innovative algorithms more efficient. 

To give an example, using zero shot-learning, an algorithm trained to classify chest X-ray images of patients with pneumonia and asthma could be adapted to also classify chest X-ray images of patients with COVID-19 without having to re-train the algorithm which could be expensive, time-consuming, and potentially even impossible at the beginning of the pandemic when there were not sufficient X-ray images in existence. 

First the algorithm would be pre-trained on a labelled dataset of ‘seen classes’ i.e., a series of chest X-rays of patients with pneumonia or asthma. Next the algorithm would be provided with auxiliary information, for example a text prompt, describing a new class of data, i.e., a description of chest X-rays for COVID-19 patients and how these images compare to those of the patients with pneumonia or asthma. 

Finally, the algorithm would be presented with a series of COVID-19 chest X-rays and use the auxiliary information it was provided in the prompt to infer the COVID-19 classification. This process could then be repeated many times for other types of chest X-ray images, including those with lung cancer at different stages. 

In essence, Zero-shot learning is a method for making algorithms capable of ‘multitasking.’

An Owkin example

Further reading