Preventing Jobs From Running Simultaneously on Multiple Role Instances

September 10, 2013 — 11 Comments

reservation

There are times where we need to schedule maintenance jobs to maintain our Azure Cloud Services. Usually, it requires us to design our systems with an extra Role in order to host these jobs.

Adding this extra Role (Extra Small VM) costs about $178.56 a year!

But don’t worry! There’s another way to schedule jobs. You can use Blob leasing to control the number of concurrent execution of each job. You can also use Blob leasing to help distribute the jobs over multiple Role instances.

To limit the number of concurrent executions of a job, you need to acquire a Blob lease before you start to execute the job. If the acquisition fails, the job is already running elsewhere and the Role instance must wait for the lease to be released before trying to run the job.

To distribute jobs evenly over multiple Role instances, you can execute a single job per Role instance and have each role determine which job to execute based on the Blob lease that they successfully acquire.

In a previous post I mentioned that it’s better to acquire a short lease and to keep renewing it than to acquire an indefinite lease. This also applies to job scheduling, because if a Role is taken down for maintenance, other roles must be able to execute the job.

To help me manage this complexity, I created a Job Reservation Service that uses a Blob Lease Manager in order to regulate how jobs are executed throughout my Cloud Services. The rest of this post will demonstrate how to use the Job Reservation Service.

 

DEMO Code: Distributing Jobs over Multiple Worker Role Instances

The worker role below tries to distribute 2 jobs over an unknown number of instances. I have made the assumption, that there are always a minimum of 2 instances.

In order to limit concurrent executions of these jobs, I start by creating a JobReservationService instance that I preserve throughout Role’s lifetime. Then when the Role starts, I find and reserve an available job. Once the job is reserved, the Role is able to execute the job safely. Finally, when the Role is stopped, we release the job reservation so that other Roles may start executing it.

Code:

public class WorkerRole : RoleEntryPoint
{
    private readonly JobReservationService jobReservationService;

    readonly Dictionary<string, Func<IJob>> jobs = new Dictionary<string, Func<IJob>>
    {
        { MonitorCloudServiceJob.Name, () => new MonitorCloudServiceJob() },
        { CleanDatabaseJob.Name, () => new CleanDatabaseJob() },
    };

    public WorkerRole()
    {
        var reserverName = RoleEnvironment.CurrentRoleInstance.Id;
        jobReservationService = new JobReservationService("Storage", "jobs", reserverName);
    }

    string job;

    public override void Run()
    {
        Trace.TraceInformation("Worker entry point called", "Information");

        if (!string.IsNullOrWhiteSpace(job))
            Task.Run(() => jobs[job].Invoke().Execute());

        while (true)
        {
            Thread.Sleep(10000);

            TraceJobExecution();
        }
    }

    private void TraceJobExecution()
    {
        if (string.IsNullOrWhiteSpace(job))
            Trace.TraceInformation("Not Working on a Job", "Information");
        else
            Trace.TraceInformation("Working on Job |> " + job, "Information");
    }

    public override bool OnStart()
    {
        ServicePointManager.DefaultConnectionLimit = 12;

        job = SelectJob();

        return base.OnStart();
    }

    private string SelectJob()
    {
        foreach (var j in jobs)
        {
            if (!jobReservationService.TryReserveJob(j.Key, 60)) continue;
            return j.Key;
        }
        return string.Empty;
    }

    public override void OnStop()
    {
        if (!string.IsNullOrWhiteSpace(job))
            jobReservationService.CancelReservation(job);

        base.OnStop();
    }
}

Sample Jobs

These sample jobs are empty shells used to demonstrate how to distribute job execution over multiple roles.

internal class MonitorCloudServiceJob : IJob
{
    public void Execute()
    {

    }

    public static string Name
    {
        get
        {
            return “monitor-cloud-service.job”;
        }
    }
}

internal class CleanDatabaseJob : IJob
{
    public void Execute()
    {

    }

    public static string Name
    {
        get
        {
            return “clean-database.job”;
        }
    }
}

internal interface IJob
{
    void Execute();
}

The Job Reservation Service

I created the JobReservationService to encapsulate Blob leasing and job reservation logging.

The service will create a Blob for each job reservation and will use the Blobs content to keep track of the last 10 reservations.

  • The connectionString parameter is the name of the setting that contains the actual connection string.
  • The containerName parameter is the name of the container that is used to track jobs.
  • The reserverName is the name of the Role instance which makes reservations using the JobReservationService instance.

Please notify me if something isn’t clear. As a last note, I would like to point out that the code for this service is Open Source and can be found on my GitHub project.

public class JobReservationService : IDisposable
{
    private readonly string reserverName;

    private readonly BlobLeaseManager manager;
       
    private CloudBlobContainer container;

    public JobReservationService(string connectionString, 
                                    string containerName, 
                                    string reserverName)
    {
        this.reserverName = reserverName;
        Init(connectionString, containerName);

        manager =  new BlobLeaseManager();
    }

    private void Init(string connectionString, string containerName)
    {
        var cs = CloudConfigurationManager.GetSetting(connectionString);

        var account = CloudStorageAccount.Parse(cs);

        var client = account.CreateCloudBlobClient();

        var deltaBackoff = new TimeSpan(0, 0, 0, 2);
        client.RetryPolicy = new ExponentialRetry(deltaBackoff, 10);

        container = client.GetContainerReference(containerName);
        container.CreateIfNotExists();
    }

    public bool TryReserveJob(string jobName, double jobReservationInSeconds)
    {
        var blobReference = GetJobReservationBlob(jobName);
            
        if(!blobReference.Exists())
            InitializeLeaseBlob(blobReference);

        var acquireLease = manager.TryAcquireLease(blobReference, jobReservationInSeconds);
            
        if(acquireLease)
            UpdateReservationLog(jobName);

        return acquireLease;
    }

    public bool HasReservation(string jobName)
    {
        return manager.HasLease(GetJobReservationBlob(jobName));
    }

    public void CancelReservation(string jobName)
    {
        manager.ReleaseLease(GetJobReservationBlob(jobName));
    }

    private CloudBlockBlob GetJobReservationBlob(string jobName)
    {
        var blobReference = container.GetBlockBlobReference(jobName);
        return blobReference;
    }

    private void InitializeLeaseBlob(CloudBlockBlob blobReference)
    {
        var log = new JobReservationLog();
        UpdateBlobContent(log, blobReference);
    }

    private void UpdateBlobContent(JobReservationLog jobReservationLog, 
                                    CloudBlockBlob jobReservationBlob)
    {
        jobReservationLog.Add(MakeJobReservation());

        string leaseId = manager.GetLeaseId(jobReservationBlob);

        AccessCondition accessCondition = string.IsNullOrWhiteSpace(leaseId)
            ? null
            : new AccessCondition
            {
                LeaseId = leaseId
            };

        jobReservationBlob.UploadText(jobReservationLog.ToJson(), 
                                        null, 
                                        accessCondition);
    }

    private void UpdateReservationLog(string jobName)
    {
        CloudBlockBlob blobReference = GetJobReservationBlob(jobName);

        JobReservationLog jobReservationLog = JobReservationLog.Make(blobReference.DownloadText());
        JobReservation lastReservation = jobReservationLog.LastReservation;

        if (lastReservation.Reserver == reserverName) 
            return;
            
        UpdateBlobContent(jobReservationLog, blobReference);
    }

    private JobReservation MakeJobReservation()
    {
        return new JobReservation
        {
            Obtained = DateTime.UtcNow,
            Reserver = reserverName,
        };
    }
        
    public void Dispose()
    {
        manager.Dispose();
    }

    public struct JobReservation
    {
        public string Reserver { get; set; }
        public DateTime Obtained { get; set; }
    }

    private class JobReservationLog
    {
        private readonly List<JobReservation> reservations = new List<JobReservation>();

        public JobReservationLog()
        {
        }

        private JobReservationLog(List<JobReservation> lockReservations)
        {
            reservations = lockReservations;
        }

        internal JobReservation LastReservation
        {
            get { return reservations.FirstOrDefault(); }
        }

        internal static JobReservationLog Make(string json)
        {
            var list = JsonConvert.DeserializeObject<List<JobReservation>>(json);
            return new JobReservationLog(list);
        }

        internal void Add(JobReservation jobReservation)
        {
            reservations.Insert(0, jobReservation);

            if (reservations.Count > 10)
                reservations.Remove(reservations.Last());
        }

        internal string ToJson()
        {
            return JsonConvert.SerializeObject(reservations);
        }
    }
}

Summary

This job reservation service can be quite handy in helping to reduce maintenance costs by redistributing maintenance jobs over many Role instances. Furthermore, distributing these jobs over many Role instances will make them more reliable.

11 responses to Preventing Jobs From Running Simultaneously on Multiple Role Instances

  1. 

    I think you can accomplish the same task with a Queue. Each message can only be processed by one worker role.

    Like

    • 

      Interesting approach, but you will need some kind of singleton process to queue the initial tasks on a queue. Based on the fact that messages would need to live forever, you would probably use the Windows Azure Service Bus Queues, but you would still need to regularly reset the visibility of each message.

      It would be quite interesting to explore Queues in order to see whether they are the better solution for this problem.

      For informational purposes, the word Job in this post, refers to a recurring process that is either continuously running or that is scheduled to run at a specific moment in time.

      Like

      • 

        Queues are good. We use them tot for that scenario. One WorkerRole enqueues the jobs as commands and n Worker Roles on the other dequeue and execute them. Scales like hell.

        Like

        • 

          Agreed, I do the same in my projects, but I don’t think I expressed myself well enough in this post… The Jobs I’m talking about aren’t of the same nature.

          I’m talking about controlling instances of the Job that queues all the messages. You don’t want to have duplicate messages going through your system.

          I will try to find a way to clarify this.

          Like

      • 
        Reinhard Brongers July 24, 2014 at 7:31 AM

        The singleton process to queue the tasks can now be accomplished using Azure Scheduler. The Scheduler has the ability to post a message to a queue on a recurring basis. Still I like the centralized approach of the Job Reservation, but indeed it would be interesting to explore the Queue option… The benefit would be that you can also trigger a job at will (do maintenance job NOW), by just enqueueing a message.

        Like

        • 

          Great feedback. Queuing a task using the scheduler is perfect for some types of jobs. Other jobs like continuous archival of table storage or continuous processing of blobs from a specific location, may require something different. The Leader Election pattern used here ensures that a continuous Job can run as a singleton and lay dormant on other nodes at the same time. When the primary node is decommissioned, and dormant node kicks in to continue the work from where it was left off.

          Like

          • 
            Reinhard Brongers July 24, 2014 at 8:38 AM

            It’s always nice to have options and pick the proper solution/pattern for the problem at hand! It just struck me that your Leader Election pattern implementation originated from the need to avoid having an extra cost because of the extra worker role(s) involved. I just checked the cost of the Scheduler. If you need more than the free version, it will cost you as much as a worker (if I interpret the pricing table correctly)! So it seems the Blob-lease construct is the cheaper option…

            Like

  2. 

    really very good solution, I think others have not understood the use.

    Like

  3. 

    Bonjour

    Je me nomme Jean-Robert Lord et j’ai assiste a votre conference hier. Je voudrais vous questionner sur l’internationalisation et la localisation pour un site web, a savoir comment je pourrais executer le tout de facon professionnelle? Je possede un bouquin qui en parle beaucoup (Internationalization and Localization Using Microsoft .NET” mais qui ne propose pas de facon pour switcher d’une langue (disons anglais) vers une autre (francais ou autre). Auriez-vous un bouquin a me proposer?

    Merci!

    Like

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.