Training

The process of teaching an algorithm how to do its job.

The word ‘training’ refers to the process of teaching an algorithm how to do its job i.e., how to classify images (i.e., tell the difference between a normal scan and an abnormal scan) or how to predict whether a person is at risk of developing a particular disease or not. It’s largely a process of repetitive trial and error. 

Think, for example, of trying to teach a child to recognise colors. You might give the child a stack of cards each a different primary color, every time they see a blue card you point to it and say ‘blue’, each time a red card comes up you say ‘red’ and so forth. After a while, the child will be able to point to each of the cards and correctly ‘label’ them on their own without your help. If the child gets the label wrong, you can correct them. Then, eventually, the child will be able to label other objects, such as a mug, with the correct colour, even when the card is not present. 

The process of training an algorithm is very similar. For example, an algorithm may be given a dataset containing many different chest X-Rays (images of lungs) and told to label the X-Rays as either healthy or unhealthy. If it is a supervised algorithm, at the start it might be given some help, as the dataset might already have some labels (such as ‘pneumonia’ or ‘fracture’ for a dataset of X-Rays) and all the algorithm has to do is look at the images lots of times until it understands why some images are healthy and some are not. If it is unsupervised, it might simply be told to go and look for the differences itself. 

The algorithm will make lots of guesses and if it gets it wrong the AI scientist will correct it (in the same way as you would correct the child mentioned) until the algorithm can correctly label all of the images in the training dataset without any input. 

There are multiple stages in this learning process:

  1. Error analysis
    Patterns such as consistently misclassified examples (e.g. recognizing ‘fracture’ where there is none on an X-Ray) are investigated to understand why the algorithm is struggling. You can think of this like a child needing a different teaching method to understand a challenging new concept. 
  2. Experimentation and fine tuning
    Once the underlying cause is diagnosed, scientists modify data or parameters used for training the algorithm, as a teacher could introduce new approaches or resources to address a child’s learning difficulties. 
  3. Regularisation techniques and data augmentation
    After repeated training of an algorithm, overfitting issues may arise as the model becomes too complex and specific to the training data. Regularisation techniques are used to reduce these errors by encouraging the model to prefer simpler solutions which are more readily generalisable. Data augmentation techniques like flipping, rotating or changing the resolution of images helps the algorithm better understand which factors are most important for it to learn, which increases the likelihood it can correctly label ‘fracture’ on the X-Ray. This is similar to how more diverse experiences and interactions can further enrich a child’s understanding of the learning material. 

Finally, the algorithm will then be given a different set of images (like a child being given different objects) and will then be tested to see if it can still label the new images correctly. This ‘testing’ process is known as validation. Learning is a dynamic process which involves a continuous cycle of monitoring, analysis, and experimentation to improve performance over time.

In the context of AI, and more specifically machine learning, training is the process of ‘teaching’ an algorithm the correct answer to a specific clinical problem, such as: “which of these chest X-ray images show evidence of cancer” or “which of these patients are most at risk of developing type II diabetes.” 

This process is data-led in the sense that it involves providing the algorithm with a training dataset from which it can learn the correct answer (i.e., the target attribute or output). It is crucially important that the training dataset is of sufficient quantity and quality, otherwise the success (or the accuracy) of the algorithm will be undermined by the ‘rubbish in, rubbish out’ problem. 

The exact way in which the algorithm learns from the training dataset depends on what type of algorithm it is, i.e., whether it is a supervised learning algorithm, an unsupervised learning algorithm, an algorithm trained using zero-shot learning, or an algorithm trained using reinforcement learning (or trial and error). 

Once the training phase is ‘complete,’ the algorithm must then be validated, i.e., shown another unseen subset of the training data to see if it can still complete its task accurately and to check for problems such as overfitting or bias. If the algorithm ‘passes’ validation, it must be evaluated in a real-world setting. Training data and real-world data are very rarely the same quality – particularly in clinical settings. For example, consider an algorithm trained to recognise cancerous moles via an app that consumers can download at home. 

This algorithm will likely have been trained using professional-quality photos taken by a dermatologist in a well-lit setting, not photos taken on somebody’s old phone in a poorly lit living room. It’s important, therefore, to test how these differences in data quality change the overall performance of the algorithm. Likewise, the algorithm must be tested for generalisability, otherwise it may not work when taken out of the lab and used in the real world on a broader population. 

Often this process of train, validate, test, is presented as being linear, which can give the impression that each of these ‘tasks’ is only completed once as a ‘tick-box’ exercise. In reality, the process of training, validating, and testing an algorithm is recursive and there may need to be multiple rounds of training before the target accuracy level is reached. Additionally, algorithms may need to be re-trained after they have been deployed if there are changes in the demographic make-up of the population on which the algorithm is being used (population drift), or if the performance of the algorithm declines overtime (model drift).  

An Owkin example

Computational pathology is revolutionizing the field of pathology by integrating advanced computer vision and machine learning technologies into diagnostic workflows. It offers unprecedented opportunities for improved efficiency in treatment decisions by allowing pathologists to achieve higher precision and objectivity in disease classification, tumor microenvironment description and identification of new biomarkers. However, the potential of computational pathology in personalized medicine comes with significant challenges, particularly in humans annotating whole slide images (WSI), which is time-consuming, costly and subject to inter-observer variability. To address these challenges, Self-Supervised Learning (SSL) has emerged as a promising solution for algorithms to learn representations from histology patches and leverage large volumes of unlabelled WSI without requiring human intervention. 

Owkin scientists explored self-supervised learning for histopathology, whereby our 86 million parameter model Phikon was trained on 40 million images from 16 different cancer types. Among others, we publicly released histology features from in-house self-supervised learning models for 60 million images of The Cancer Genome Atlas (TCGA). The code, models and features are publicly available on GitHub and Hugging Face.

Further reading
  • Awaysheh, Abdullah et al. 2019. ‘Review of Medical Decision Support and Machine-Learning Methods’. Veterinary Pathology 56(4): 512–25.
  • Chang Ho Yoon, Robert Torrance, and Naomi Scheinerman. 2022. ‘Machine Learning in Medicine: Should the Pursuit of Enhanced Interpretability Be Abandoned?’ Journal of Medical Ethics 48(9): 581.
  • Chen, Po-Hsuan Cameron, Yun Liu, and Lily Peng. 2019. ‘How to Develop Machine Learning Models for Healthcare’. Nature Materials 18(5): 410–14.
  • Deo, Rahul C. 2015. ‘Machine Learning in Medicine’. Circulation 132(20): 1920–30.
  • Eckhardt, Christina M. et al. 2023. ‘Unsupervised Machine Learning Methods and Emerging Applications in Healthcare’. Knee Surgery, Sports Traumatology, Arthroscopy 31(2): 376–81.
  • Javaid, Mohd et al. 2022. ‘Significance of Machine Learning in Healthcare: Features, Pillars and Applications’. International Journal of Intelligent Networks 3: 58–73.
  • Kocak, Burak, Ece Ates Kus, and Ozgur Kilickesmez. 2021. ‘How to Read and Review Papers on Machine Learning and Artificial Intelligence in Radiology: A Survival Guide to Key Methodological Concepts’. European Radiology 31(4): 1819–30.
  • Matheny, Michael E., Lucila Ohno-Machado, Sharon E. Davis, and Shamim Nemati. 2023. ‘Chapter 7 - Data-Driven Approaches to Generating Knowledge: Machine Learning, Artificial Intelligence, and Predictive Modeling’. In Clinical Decision Support and Beyond (Third Edition), eds. Robert A. Greenes and Guilherme Del Fiol. Oxford: Academic Press, 217–55. https://www.sciencedirect.com/science/article/pii/B9780323912006000310.
  • Ngiam, Kee Yuan, and Ing Wei Khor. 2019. ‘Big Data and Machine Learning Algorithms for Health-Care Delivery’. The Lancet Oncology 20(5): e262–73.
  • Nwanosike, Ezekwesiri Michael, Barbara R. Conway, Hamid A. Merchant, and Syed Shahzad Hasan. 2022. ‘Potential Applications and Performance of Machine Learning Techniques and Algorithms in Clinical Practice: A Systematic Review’. International Journal of Medical Informatics 159: 104679.
  • Obermeyer, Ziad, and Ezekiel J. Emanuel. 2016. ‘Predicting the Future — Big Data, Machine Learning, and Clinical Medicine’. New England Journal of Medicine 375(13): 1216–19.
  • Sidey-Gibbons, Jenni A. M., and Chris J. Sidey-Gibbons. 2019. ‘Machine Learning in Medicine: A Practical Introduction’. BMC Medical Research Methodology 19(1): 64.