Serving ML Models with TorchServe | by Andrey Golovin | Mar, 2023

[ad_1]

A complete end-to-end example of serving an ML model for image classification task

This post will walk you through a process of serving your deep learning Torch model with the TorchServe framework.

There are quite a bit of articles about this topic. However, typically they are focused either on deploying TorchServe itself or on writing custom handlers and getting the end results. That was a motivation for me to write this post. It covers both parts and gives end-to-end example.
The image classification challenge was taken as an example. At the end of the day you will be able to deploy TorchServe server, serve a model, send any random picture of a clothes and finally get the predicted label of a clothes class. I believe this is what people may expect from an ML model served as API endpoint for classification.

Say, your data science team designed a wonderful DL model. It’s a great accomplishment with no doubts. However, to make a value out of it the model needs to be somehow exposed to the outside world (if it’s not a Kaggle competition). This is called model serving. In this post I’ll not touch serving patterns for batch operations as well as streaming patterns purely based on streaming frameworks. I’ll focus on one option of serving a model as API (never mind if this API is called by a streaming framework or by any custom service). More precisely, this option is the TorchServe framework.
So, when you decide to serve your model as API you have at least the following options:

web frameworks such as Flask, Django, FastAPI etc
cloud services like AWS Sagemaker endpoints
dedicated serving frameworks like Tensorflow Serving, Nvidia Triton and TorchServe

All have its pros and cons and the choice might be not always straightforward. Let’s practically explore the TorchServe option.

The first part will briefly describe how a model was trained. It’s not important for the TorchServe however I believe it helps to follow the end-to-end process. Then a custom handler will be explained.
The second part will focus on deployment of the TorchServe framework.
Source code for this post is located here: git repo

For this toy example I selected the image classification task based on FashionMNIST dataset. In case you’re not familiar with the dataset it’s 70k of grayscale 28×28 images of different clothes. There are 10 classes of the clothes. So, a DL classification model will return 10 logit values. For the sake of simplicity a model is based on the TinyVGG architecture (in case you want to visualize it with CNN explainer): simply few convolution and max pooling layers with RELU activation. The notebook model_creation_notebook in the repo shows all the process of training and saving the model.
In brief the notebook just downloads the data, defines the model architecture, trains the model and saves state dict with torch save. There are two artifacts relevant to TorchServe: a class with definition of the model architecture and the saved model (.pth file).

Two modules need to be prepared: model file and custom handler.

Model file
As per documentation “A model file should contain the model architecture. This file is mandatory in case of eager mode models.
This file should contain a single class that inherits from torch.nn.Module.”
So, let’s just copy the class definition from the model training notebook and save it as model.py (any name you prefer):

[ad_2]

Source link