Azure IAAS costs are mainly composed of Networking, Computing, Storage and Analytics.
Below are some of the ways to bring down the Azure IAAS computing cost.
- Resizing of VMs by identifying idle and underutilized resources
- Reserved Instances for VMs
- Reserve Instances for Storage
- Shutdown VMs during idle time
- Azure Dev/Test Pricing plan for Non-Production VMs
- Azure Automation
In this blog, we are mainly going to discuss the fourth option: to shutdown VMs during idle time and bring it up and running when required using a custom serverless solution.
Ideally, we don’t recommend to shut down Production VMs, however if necessary, if those production servers are only used for a certain amount of time every day/week/month, and if the business owner agrees, there is no harm to shut down an expensive VM when it’s not required.
Using Azure Automation, we can automate the auto start and auto shutdown of VMs. However, this solution sometimes may not be easy to implement when we have scenarios to shut down or start VMs on a specific order, when those VMs are utilized by multiple teams, when there is dependency on VMs on other VMs etc.
The next option is to grant necessary permission for someone to manually turn on and off virtual machines by going to Azure Portal or using scripts. This is not a recommended practice as anything we plan for a manual action can have consequences. That person needs to be available to make the appropriate VM action, person need to coordinate with the VM owners to know whether they need the VM or not at a specific time etc. The person may forget etc.
The third option is to come up with a custom solution where people can easily schedule the VMs. That is, they can plan a weekly schedule when the VMs must be up and running and when they have to be shut down. Also, it would be great if there is an option to override the schedule when they need to work on a VM urgently and extend the schedule if they have to continue their work beyond the schedule.
Optimum cost may not be achieved by only following fixed schedule approach due to some of the below factors:
- If the Developers doesn;t have access to Azure Portal for security reasons.
- If the VM is not required on certain days (when the project is idle).
- If the developer wants to use the VM outside scheduled hours or weekends (due to an urgent fix).
- Developer will have to raise support tickets to their service provider for extending or overriding the fixed schedule.
- Service provide support team may not be able to respond quickly on critical scenarios or on weekends.
The main components used for the solution are:
- Azure Resource Group
- Azure Tags
- Azure Event Grid
- Azure Logic App
- Azure Table Storage
- Azure Durable Function
- Azure App Service
- Azure AD
- Azure Management Libraries (.NET)
The following illustration provides an overview of the solution:
The main four layers in the above solution design diagram are:
- Azure Resource Groups section is where the VMs are hosted in a Subscription.
- Monitoring Azure VMs section is where the serverless components like Event Grid and Logic App are provisioned.
- Automatic Schedule section is where serverless components like Storage table, Function App and Durable Function Apps are provisioned.
- Web Application section is where the web application is hosted as a Serverless App service.
This section describes the follow of activities in the design diagram.
- The Development virtual machines are hosted in Azure Resource Groups.
- Using the Azure Event Grid, details of VMs, any new VMs provisioned or change in the VM power state etc. are captured.
- The Logic App will be triggered when a new event flow occurs in Event grid and the details will be passed to Table storage.
- The Logic App will be also triggering an email to the VM owner about the action happened to the VM.
- The Table Storage will be storing all the details that will be inserted from Logic App as well as the VM grouping, VM Sequencing and VM schedule data from the Web Application.
- VM Owners will be accessing the web application using a web browser where they will be authenticated using Azure AD.
- The application will be authorizing users based on the roles defined in the application.
- The application itself will use Service Principal with the least amount of privilege to start/stop/restart VMs.
- The Web application will have the following functionalities:
- VM owners will be able to view the list of their accountable VMs and their status. The data will be coming from the storage table which will have up to date data of VMs.
- VM owners will be capable to start/restart and shutdown VMs. Using Azure Management libraries, these actions can be directly triggered from the application.
- VM owners will be able to group and sequence VMs. The data will be stored in storage table.
- VM owners will be able to schedule the auto start/shutdown of VMs. The data will be stored in storage table
- VM owners will be able to extend the auto shutdown (on demand override of schedule). The data will be stored in storage table.
- The Durable Starter function which will invoked using a timer trigger.
- The Durable Orchestrator functions will be invoked by the started functions where Activity functions will be able to manage the auto shutdown and start of VMs based on the schedule data which is available from the storage table. The function will exclude any VMs that have an extension request for shutdown.
- Azure Resource Group: A resource group is a logical container into which Azure resources like virtual machines, web apps, databases, and storage accounts are deployed and managed.
- Azure Event Grid: Event Grid is a fully managed event service that enables us to manage events across many different Azure services and applications.
- In the proposed solution, with an event subscription, the Azure Event Grid will be monitoring the changes in VMs in resource groups in Azure subscriptions. Any change in the VM power status, provision of new VMs, decommission of existing VMs etc. will produce events flow to Event Grid.
- Azure Logic App: Azure Logic Apps is a cloud service that automates the execution of business processes.
- In the proposed solution, Logic App will poll the Event Grid. If any event is occurred in Event Grid, the Logic App is fired and insert the VM status as well as any change details into the Azure Table storage.
- Azure Storage Account: An Azure storage account contains all Azure Storage data objects: blobs, files, queues, tables, and disks. The storage account provides a unique namespace for Azure Storage data that is accessible over secured http.
- App Service Plan: App Service Plan represents the collection of physical resources for the App Service. An App Service Plan can have multiple web apps in an app service plan.
- Azure App Service: Azure Apps service provides a platform to build an App in Azure without having to deploy, configure and maintain Azure VMs.
- VM owners will be accessing the Application hosted in the App Service. Based on the VM owner information in the Azure Table storage, only those VMs which they are owner will be listed in the application for them to take the necessary actions (On Demand Start / Shutdown / restart / Schedule, Extend Shutdown (override schedule), Grouping VMs and Sequencing of VMs for Shutdown and Start actions.
- Sample Application Pages and their description:
|Page Name||Page Description|
|Login Page||VM owner will have to enter their AD user account and password to login to the application through the login page.|
|VM List Page||VM Details will be shown in this page where the VM owner can take necessary actions (On Demand Start / On Demand Shutdown / Restart)|
|Shutdown Extension Page||For extending the shutdown of a VM (override the schedule), VM owner can specify the hours for extending the shutdown.|
|VM Grouping Page||For VMs which have dependency, they should be grouped. This page will fulfill the purpose of grouping VMs.|
|VM Sequencing Page||Most VMs need to sequentially start, stop and restarted based on the nature of the application hosted in it and the role it plays. Provision to put start and shutdown sequence for VMs can be specified in this page.|
|Audit Report Page||Any action by the user and system (in case of auto start / shutdown) will be reported via the Audit Report page|
|VM Scheduling Page||Weekly schedule of VMs (Grouping of VMs) can be configured this page by the VM owners.|
Azure Management Libraries for .NET will be utilized for the On Demand Start / Shutdown / Restart of VMs in Azure Resource group.
The App Service will be communicating to Table Storage for displaying/inserting/updating data for VM Grouping, Sequencing and Schedule.
- Azure Table Storage: Azure Table storage is a service that stores structured NoSQL data in the cloud, providing a key/attribute store with a schema-less design.
- The Tables will store the following details:
|Table Description||Field Description|
|VM Details||VM NameVM Resource GroupVM OwnerVM Power Status|
|VM Grouping||Group NameVM Name|
|VM Sequencing||Group NameAction: Start/Shutdown/RestartSequence|
|VM Schedule||Group NameScheduleShutdown Extension (if any)|
- Azure AD: The Azure Active Directory (Azure AD) enterprise identity service provides single sign-on and multi-factor authentication.
- Azure AD will be utilized for authenticating users (mainly VM owners) to login to the application.
- Multi Factor Authentication (MFA): is an authentication method in which a computer user is granted access only after successfully presenting two or more pieces of evidence (or factors) to an authentication mechanism.
- Azure AD App Registration: By registering an app in the Azure AD, the application establishes a trust relationship between the app and the Microsoft identity platform.
- This way the Microsoft identity platform can provide authentication and authorization services for the application and its users.
- Azure Tags: Tags lets us organize Azure resources and resource groups by assigning them a “name: value” pair.
- VM Owner tag values will be utilized to identify the VM owners. The same will be stored in the table storage to identify and list the designated VMs in the application where the owner name and authenticated username will be compared to match.
- Azure Function: Azure Functions is a serverless compute service that runs event-triggered code without having to explicitly provision or manage infrastructure.
- Azure function will be invoked using a timer trigger where it will utilize the schedule data from the Table storage for automatic Start and Shutdown of VMs.
- Azure Durable Function: Durable Functions is an extension of Azure Functions that allows stateful functions in a serverless compute environment. The extension permits stateful workflows by using orchestrator functions and activity functions.
- For the solution, we will be using function chaining pattern and Sub-orchestrations will be used as we have to shut down and Start multiple VMs on specific order at the same time.
- Azure Management Libraries (.NET): The Azure Management Libraries for .NET is a higher-level, object-oriented API for managing Azure resources.
- Using the Azure Management Libraries, we will be managing the Start/Shutdown activities of VM from the web application.
- Log Analytics Data Collector API: Log Analytics Data Collector API enables us to send custom log data to a Log Analytics workspace in Azure Monitor from any client that can call a REST API.
- The Web Application will be sending custom logs to Log Analytics for actions like Automatic /On Demand – Start/Restart/Shutdown of VMs. This way we can query/search user/system actions, aggregate and build reports based on the data quickly from the built-in features of the Log Analytics dashboards.
The below screen shot shows the sample interface for the On Demand Start and Shutdown of VMs
The below screen shot shows the sample interface for Scheduling the VMs
- All the components in the solution will be utilizing the Azure AD authentication.
- RBAC will be achieved inside the application with custom VMAdmin and custom VMReader Role.
- For the application to start/stop/restart VMs, it will be using AD Service Principal.
- Custom Role will be created for the Service Principal with the least amount of privilege to start/stop/restart VMs.