Transfer Learning using EfficientNet

Gerry Chng

Co-chair, Singapore Artificial Intelligence Technical Committee (AITC) | Certified AI Ethics & Governance (Expert) | Cybersecurity Advisor | GRC

Published Jun 4, 2021

Summary

In a recent assignment, one of the tasks was to build an image classification model to accurately classify images belonging to one of 72 possible categories.

This short article links to some codes on how one can use Transfer Learning to rapidly build a model with high accuracy on the training and test sets. Specifically, it highlights a two-stage approach that can be used as a template for future tasks.

The first pass will quickly adjust the weights based on available pre-trained weights (using ImageNet in this case since the distributions are very similar). The second pass then unfreezes some blocks and allows the model to then fine-tune the weights based on the training images.

The diagram below shows how the algorithm achieves close to around 85% accuracy in less than 10 epochs, and it plateaus around that accuracy level. After unfreezing the final block (EfficientNets require the entire block to be retrained because of the skip connections) and retraining the weights with a lowered learning rate, the model was able to achieve significantly bumped up accuracy results.

The purpose of this short article is to share the lessons learnt from doing this assignment, so I will not be going into the details of the code. I referred to a codebase from a TensorFlow tutorial that was trained on MobileNetV2 but adapted it for the EfficientNetB5 algorithm in my own work.

Key takeaways

It is relatively easy nowadays to develop your own models using readily available models and pre-trained weights to achieve SOTA accuracy in lab environments. Resources such as TensorFlow Hub and Google Colab, Kaggle Notebooks, or AWS Sagemaker makes it easy to start without having to set up your own environments if that is not your interests.

In doing this assignment, there are other key challenges that are learning from. Admittedly, once these questions have been properly answered, the model training was the easiest part of the assignment as it was just doing its own work in the background.

Ensuring the distribution of training data reflects the expected use case

It is important to have access to a reasonably-sized data set that has a similar distribution to your use case. In our case, it was a pure academic assignment, and the different groups were tasked to collect images for each of the 72 classes that we were supposed to build the model on.

However, in real-world deployments, one of the key tasks is to understand what your final use case is. It is important that you train your model using training data consistent with what you expect to infer. In our example, we saw examples across different groups with different resolutions, orientation, and some were even downloads from the Internet.

Making sure this is properly planned is important in real-world deployments to ensure that the model continues to perform as expected.

One obvious thing lacking in the academic assignment is also how the model should handle images that are completely out of distribution (i.e. not in one of the 72 classes). Such exception handling should also be carefully considered based on the use case.

Build a data pipeline for rapid prototyping and experimentation

Admittedly, the easiest part of the assignment was training the algorithm. The codes to build the model, train, save, infer, etc. are fairly standardised. I recall most of the frustrations were leading up to watching the progress bar on the training move. Once that happens, the rest is easy.

The tough part was hence getting the data ready for that stage.

The chart here by Cognilytica quite accurately reflects the challenge, which relates to making sure that you have a consistent data pipeline. Achieving this allows you to operationalise the necessary steps from raw data to what the model expects. This would allow the developers to consistently repeat the work and can drive faster prototyping, experimentation, and hyperparameter tuning.

The referenced Jupyter Notebook at the end has some basic components of data ingestion, pre-splitting, resizing, data augmentation, etc. which is built into a pipeline.

Think ahead about how and where you will deploy the trained models

Obviously, the referenced Notebook was for academic assignment submission. However, in real-world deployment, one has to think ahead on how and where the models are going to be deployed.

There will be trade-off decisions around model accuracy, inference speeds, and model size. These need to be properly evaluated before all the activities as there are huge dependencies on how you build the data pipeline, collection, and evaluation criteria.

Conclusion

This short sharing highlights how it is easy to kick-start the journey with readily available model zoos and online platforms such as Google Colab, Kaggle Notebooks, or AWS Sagemaker.

There are challenges around the data pipelining and mindset around operationalising the work. More awareness, training, and practice in those areas are needed to make sure we train engineers with the necessary practical skills.

Attaching my work here (sans the actual data used) - feedback welcomed!

Transfer Learning using EfficientNet

Gerry Chng

Co-chair, Singapore Artificial Intelligence Technical Committee (AITC) | Certified AI Ethics & Governance (Expert) | Cybersecurity Advisor | GRC

Summary

Key takeaways

Ensuring the distribution of training data reflects the expected use case

Build a data pipeline for rapid prototyping and experimentation

Think ahead about how and where you will deploy the trained models

Conclusion

More articles by this author

Insights from the community

Others also viewed

Using Knowledge Graphs to create dynamic learning paths for neurodiverse learning using the semantic tree concept - part two

Top 10 Datacamp Courses to Learn AI, Machine Learning and ChatGPT in 2024

Our E-Book is finally out!

Integration of Machine Learning in the Education Industry – An Overview

The 13 Best Machine Learning Courses on LinkedIn Learning to Consider

Top 20 Machine Learning Courses for Beginner Level

How does Machine Learning work?

Personalized Learning Assistant: AI Project – PART 3

Article 3: Building Your AI and ML Skillset: Learning Pathways

10 Best Artificial Intelligence Courses for Beginners in 2022

Explore topics

Summary

Key takeaways

Ensuring the distribution of training data reflects the expected use case

Build a data pipeline for rapid prototyping and experimentation

Think ahead about how and where you will deploy the trained models

Conclusion

Will a single AI Governance Regulation emerge?

Jun 1, 2023

Wordle - Frequency analysis approach

Jan 30, 2022

Building a recursive Sudoku solver

Apr 23, 2021

The Collaboration Imperative

Feb 22, 2021

Lessons learnt solving The Tower of Hanoi

Jan 30, 2021

2021 - The Great Reset

Dec 31, 2020

When on leave, be on leave

Dec 18, 2020

Building a Smart Nation starts with trust

Nov 13, 2017

Insights from the community

Others also viewed

Using Knowledge Graphs to create dynamic learning paths for neurodiverse learning using the semantic tree concept - part two

Top 10 Datacamp Courses to Learn AI, Machine Learning and ChatGPT in 2024

Our E-Book is finally out!

Integration of Machine Learning in the Education Industry – An Overview

The 13 Best Machine Learning Courses on LinkedIn Learning to Consider

Top 20 Machine Learning Courses for Beginner Level

How does Machine Learning work?

Personalized Learning Assistant: AI Project – PART 3

Article 3: Building Your AI and ML Skillset: Learning Pathways

10 Best Artificial Intelligence Courses for Beginners in 2022

Explore topics