Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

The process for deploying multiple models with Azure Batch for inference involves the following steps:

  1. Prepare your models: Ensure your models are trained and ready for deployment. This involves creating the model architecture, training the model with relevant data, and evaluating the model's performance. Save the models in a format compatible for inference, such as ONNX or TensorFlow.

  2. Create Azure Batch pool: Create an Azure Batch pool that includes the required resources for inferencing, such as GPUs or CPUs, depending on the models and workload.

  3. Upload models to Azure Storage: Upload the saved models to Azure Storage. This can be done using Azure Blob Storage or Azure Data Lake Storage.

  4. Create a Batch job: Create a Batch job that references the uploaded models and defines the inferencing task.

  5. Define the inferencing task: Define the inference task that specifies which model to use, the input data to be used for inference, and the output location for the results.

  6. Submit the Batch job: Submit the Batch job to the created pool for execution.

  7. Monitor the job progress: Monitor the job progress and check for any errors or issues that might arise.

  8. Retrieve the inference results: Retrieve the inference results from the output location specified in the inference task.

  9. Manage the pool: Manage the Batch pool and its resources as required, including scaling up or down, and deleting unneeded resources.

  10. Repeat for additional models: Repeat the process for additional models, creating separate jobs for each model or grouping related models in the same job.