Transfer Learning using EfficientNet
Summary
In a recent assignment, one of the tasks was to build an image classification model to accurately classify images belonging to one of 72 possible categories.
This short article links to some codes on how one can use Transfer Learning to rapidly build a model with high accuracy on the training and test sets. Specifically, it highlights a two-stage approach that can be used as a template for future tasks.
The first pass will quickly adjust the weights based on available pre-trained weights (using ImageNet in this case since the distributions are very similar). The second pass then unfreezes some blocks and allows the model to then fine-tune the weights based on the training images.
The diagram below shows how the algorithm achieves close to around 85% accuracy in less than 10 epochs, and it plateaus around that accuracy level. After unfreezing the final block (EfficientNets require the entire block to be retrained because of the skip connections) and retraining the weights with a lowered learning rate, the model was able to achieve significantly bumped up accuracy results.
The purpose of this short article is to share the lessons learnt from doing this assignment, so I will not be going into the details of the code. I referred to a codebase from a TensorFlow tutorial that was trained on MobileNetV2 but adapted it for the EfficientNetB5 algorithm in my own work.
Key takeaways
It is relatively easy nowadays to develop your own models using readily available models and pre-trained weights to achieve SOTA accuracy in lab environments. Resources such as TensorFlow Hub and Google Colab, Kaggle Notebooks, or AWS Sagemaker makes it easy to start without having to set up your own environments if that is not your interests.
In doing this assignment, there are other key challenges that are learning from. Admittedly, once these questions have been properly answered, the model training was the easiest part of the assignment as it was just doing its own work in the background.
Ensuring the distribution of training data reflects the expected use case
It is important to have access to a reasonably-sized data set that has a similar distribution to your use case. In our case, it was a pure academic assignment, and the different groups were tasked to collect images for each of the 72 classes that we were supposed to build the model on.
However, in real-world deployments, one of the key tasks is to understand what your final use case is. It is important that you train your model using training data consistent with what you expect to infer. In our example, we saw examples across different groups with different resolutions, orientation, and some were even downloads from the Internet.
Making sure this is properly planned is important in real-world deployments to ensure that the model continues to perform as expected.
One obvious thing lacking in the academic assignment is also how the model should handle images that are completely out of distribution (i.e. not in one of the 72 classes). Such exception handling should also be carefully considered based on the use case.
Build a data pipeline for rapid prototyping and experimentation
Admittedly, the easiest part of the assignment was training the algorithm. The codes to build the model, train, save, infer, etc. are fairly standardised. I recall most of the frustrations were leading up to watching the progress bar on the training move. Once that happens, the rest is easy.
The tough part was hence getting the data ready for that stage.
The chart here by Cognilytica quite accurately reflects the challenge, which relates to making sure that you have a consistent data pipeline. Achieving this allows you to operationalise the necessary steps from raw data to what the model expects. This would allow the developers to consistently repeat the work and can drive faster prototyping, experimentation, and hyperparameter tuning.
The referenced Jupyter Notebook at the end has some basic components of data ingestion, pre-splitting, resizing, data augmentation, etc. which is built into a pipeline.
Think ahead about how and where you will deploy the trained models
Obviously, the referenced Notebook was for academic assignment submission. However, in real-world deployment, one has to think ahead on how and where the models are going to be deployed.
There will be trade-off decisions around model accuracy, inference speeds, and model size. These need to be properly evaluated before all the activities as there are huge dependencies on how you build the data pipeline, collection, and evaluation criteria.
Conclusion
This short sharing highlights how it is easy to kick-start the journey with readily available model zoos and online platforms such as Google Colab, Kaggle Notebooks, or AWS Sagemaker.
There are challenges around the data pipelining and mindset around operationalising the work. More awareness, training, and practice in those areas are needed to make sure we train engineers with the necessary practical skills.
Attaching my work here (sans the actual data used) - feedback welcomed!