My Azure Virtual Machine Won’t Start
To benefit from elasticity on Microsoft Azure, we regularly leverage strategies that utilize Azure Automation to shut down and de-allocate Virtual Machines when they are not needed.
This strategy is excellent and usually works on pretty well. Due to Azure’s highly dynamic nature, the allocation of Large Virtual Machines can sometimes pose a challenge in the form of “The server encountered an internal error. Please retry the request.” or “Failed to produce an allocation for the service.“.
These error messages definitely cause much frustration, because we don’t understand their nature. A blog post written by Lichun Li a Program Manager in the Microsoft Azure product group helps us understand Allocation Failures and how to fix them.
The scenarios that described in her post, teach us about how things work under the hood. It also tells us that creating new resources is often easier than to restart a stopped (de-allocated) resource. By using automation to provision our Virtual Machines, we can take a page our of the Azure Batch and Azure Data Factory playbooks. Both of these services create brand new environments for the duration of our compute jobs. Once they are done, the compute environments are destroyed and the resources are released for everyone’s benefit.
On Azure, there are many ways to provision Virtual Machines, let’s walk through a couple of our options.
- Using PowerShell to provision a new Windows Virtual Machine and Remote PowerShell to bring it to our desired state.
- Using Azure CLI to provision a new Linux Virtual Machine and SSH to bring it to our desired state.
- Using Azure Resource Manager to provision a new Windows Virtual Machine and Desired State Configuration (DSC) to complete the configuration.
- Using Azure Resource Manager to provision a new Linux Virtual Machine and SSH to complete the configuration.
- Using Chef to provision and configure a new Virtual Machine.
- Using Puppet to provision and configure a new Virtual Machine.
- Using Salt Stack to provision and configure a new Virtual Machine.
- Using Docker to provision your workload through the magic of containers.
- Using Azure Container Services to provision your workload. (Learn more about Virtual Machines and Containers)
Now, I understand that leveraging automation based on Configuration as Code requires thought, planning and effort. And we’re not all in a position to drop what we are doing to automate everything. So here’s a quick fix for your current situation.
Quick Fix
To start a Virtual Machine that continuously fails to start, try to resize it. Be mindful of the number of Data Disks and make sure that you do scale to a tier that supports enough attached disks. This configuration change will force the Azure Fabric to seek new physical resources for your workload.
One way to deal with the maximum number of attached disks, is to scale the Virtual Machine to an equivalent configuration in another tier. For example, if my Virtual Machine is a Standard A4, I can scale it to a Standard D4 and then back to its original configuration.
For more details about virtual machines capacities, see Sizes for virtual machines.