Data Science | AI | DataOps | Engineering
backgroundGrey.png

Blog

Data Science & Data Engineering blogs

Opening the eyes of the machine: Computer vision with AutoML (Part 2)

By Gavita Regunath and Dan Lantos

Do you want to get started with computer vision but do not really know where to start? Or would you like to quickly get an idea as to which state-of-the-art (SoTA) computer vision model is best for your task? If you are, then this article will give you a good starting point to jumpstart your computer vision journey.

In our last article, we briefly introduced AutoML, computer vision and demonstrated how computer vision is revolutionising businesses and industries. Developing a computer vision solution requires a team of experts and in-depth knowledge to build effective models. If you do not have the technical know-how or want to increase your team's productivity, leveraging the AutoML solution for computer vision is a great starting point.

What is Azure Machine Learning?

Azure Machine Learning, often abbreviated as Azure ML, is a cloud service with features that make it easier to prepare and create datasets, train models and deploy them as web services. Microsoft released AutoML for Computer Vision feature (Public Preview) within Azure Machine Learning. With AutoML for computer vision tasks, Azure ML customers can easily build models trained on image data using minimal code. Note that the AutoML for computer vision tasks is currently supported via the Azure Machine Learning Python SDK. However, the resulting experimentation runs, models, and outputs are all accessible from the Azure Machine Learning studio UI.

In this article, we will be looking at how we can leverage this feature to train an object detection model to identify if workers are wearing the necessary safety equipment.

 What is the dataset about?

The public dataset was acquired from Roboflow, containing 7041 images of workers in a workplace setting. The dataset has already been split into 75/25 train-test split sets. All images are in JPEG format, with a COCO annotations file, a specific JSON structure dictating how labels and metadata are saved for an image dataset. An example of some of the images we will be working on is shown in the figure below.

Examples of images showing workers wearing safety hard hats

 

A high-level summary of the workflow consists of the following 5 steps:

  • Workspace setup

  • Data Preparation

  • Registering Dataset

  • Configuring and submitting an AutoML run

  • Use the best trained model for inference 

Ok, so let's get started!

Workspace setup - Azure Machine Learning:

 

1.       Now it's time to start working with Azure ML. The first thing you need to do is to create an Azure ML Workspace. If you don't have one created, this can easily be done using the Azure Portal.

2.       We will be using the notebooks within Azure ML, so we will need to initiate a compute instance to work within the notebook. If little heavy lifting is done, a lightweight compute instance will suffice. We used a 'STANDARD_DS11_V2'.

Creating a compute instance in Azure ML

 

Data Preparation:

 Azure ML data labeling can be used for labelling images.

3.      Next, upload the dataset into Azure blob storage (or your container service of choice). Azure blob storage containers can be attached directly as datastores within the Azure ML workspace through the UI. The datastore allows you to upload/download data and interact with it from your remote compute targets. It is an abstraction over Azure Storage.

 4.       The hard hat dataset has already been labelled and will need to be converted to JSONL file format (JSONL file is a JSON file created in the JSON Lines format). To do this, we use a script that has been generated by Microsoft to convert files from COCO to JSONL. This file can be readily obtained from here.

(Note however that if you are working with a dataset that needs the images to be labelled, then you can use the  Azure ML's data labeling option to label images)

 

Register Dataset in Datastore:

5.       Once we have the annotations in the right format (i.e., JSONL), we're ready to register the datasets in the Datastore. Our JSONL file acts as a definition for our image dataset, describing the locations of images, alongside the labels used for object detection.

Configuring and submitting an AutoML run for object detection task:

The following SOTA open-source algorithms, YoloV5, Faster-RCNN, RetinaNet are available for object detection tasks.

6.       We need to create a suitable GPU instance for the run; Microsoft recommends using the NCsv3-series (with v100 GPUs) for faster computer vision training. AutoML models for image tasks require GPU SKUs and support NC and ND families. We used a  cluster of 10 'STANDARD_NC6's for this hard hat object detection.

 

7.       We are ready to submit an AutoML run for object detection. With this hard hat object detection, we selected YoloV5 and Faster-RCNN as models, both pre-trained on COCO, a large-scale object detection, segmentation, and captioning dataset containing over 200K labelled images with over 80 label categories. We used the config provided in Microsoft’s documentation.

 
 

8.       We also decided to perform hyperparameter tuning such that a sweep over a defined parameter space is performed to provide an optimal model. The result of the hyperparameter sweep can be observed in the Experiments section of Azure ML Studio.

Results of the AutoML for the hard hat object detection run

And that's it – we have successfully gone through all the steps to be able to start training a computer vision model to perform an object detection!

What does inference look like? Can the model trained using AutoML detect hard hats in images? Let’s have a look!

 

Register the optimal vision model from the AutoML run:

9.       Once the run completes, we register the model that was created from the best run.

 

10.      We use an ONNX model format to generate predictions without deploying our model. Once the models are in the ONNX format, they can be run on various platforms and devices. For this step, we followed the guidelines provided here, to make predictions with ONNX on computer vision models from AutoML.

11.       We then visualise the predictions, for the images. Below, on the left, is the prediction of hard hats (in red bounding boxes) that the model, and on the right, is the ground truth of the same image (hard hat in yellow bounding boxes).

Predictions of hard hats from the model

Ground truth image of the similar hard hat image

What did we do?

We used Azure AutoML for computer vision to perform object detection. As mentioned previously, leveraging AutoML can increase the productivity of data scientists by allowing them to explore multiple SoTA models for computer vision tasks.

 The next step is to deploy this model as a web service. There are two options available, (1)  deploy your trained model as a web service on Azure Container Instances (ACI) or (2) Azure Kubernetes Service (AKS). ACI is the perfect option for testing deployments, while AKS is better suited for high-scale production usage. This is something we have not done here but will be looking to implement next!