Revision history [back]

The process for deploying multiple models with Azure Batch for inference involves the following steps:

Prepare your models: Ensure your models are trained and ready for deployment. This involves creating the model architecture, training the model with relevant data, and evaluating the model's performance. Save the models in a format compatible for inference, such as ONNX or TensorFlow.
Create Azure Batch pool: Create an Azure Batch pool that includes the required resources for inferencing, such as GPUs or CPUs, depending on the models and workload.
Upload models to Azure Storage: Upload the saved models to Azure Storage. This can be done using Azure Blob Storage or Azure Data Lake Storage.
Create a Batch job: Create a Batch job that references the uploaded models and defines the inferencing task.
Define the inferencing task: Define the inference task that specifies which model to use, the input data to be used for inference, and the output location for the results.
Submit the Batch job: Submit the Batch job to the created pool for execution.
Monitor the job progress: Monitor the job progress and check for any errors or issues that might arise.
Retrieve the inference results: Retrieve the inference results from the output location specified in the inference task.
Manage the pool: Manage the Batch pool and its resources as required, including scaling up or down, and deleting unneeded resources.
Repeat for additional models: Repeat the process for additional models, creating separate jobs for each model or grouping related models in the same job.