fbpx

AWS Batch was introduced last December and has quickly become one of the most adopted ways to run batch computing on the cloud. By letting you run batch jobs with no batch software installation, it makes running batches in the cloud the quickest way to start running batches. It’s strong job scheduling and queuing capabilities, combined with its capability for handling complex interdependencies also make it the easiest way to run them.

To take it one step further, AWS Batch also offers managed compute environments, taking care of all of the provisioning and scheduling work for you. Even letting you run on Spot to save money. And with the recently announced per-second billing, they save you even more by shutting your resources down the moment they’re idle.

Basically, AWS Batch is amazing. But as great as it is, many want more control over their compute environment, choosing to run unmanaged compute environments instead. Moreover, while AWS Batch’s managed compute environment can run on Spot, this capability is pretty limited.

And that’s where Elastigroups can help – by ensuring your compute environment is as cost-efficient as possible.

 

 

Running AWS Batches For Minimum Cost And Maximum Flexibility

Running AWS Batches with Elastigroup is the best way to take control over your Batch cost while minimizing interruptions. Managed AWS Batches run on Spot are already cost efficient, but they can be even more cost-efficient when Elastigroup takes the lead. Instead of running an AWS-managed compute environment, let Elastigroup take the lead to ensure your Batches proactively run on Spot.

Running your AWS Batch on Elastigroup gives you the flexibility to run your Batches in the way you best see fit. If you’re looking to minimize interruptions and get quicker results on your Batch job (also minimizing cost), Elastigroup’s Spot predictions will ensure you’re not only on the cheapest Spot but also find you the Spot Instance that won’t rise in price, increasing both cost savings and speed.

And if you’re just looking to save as much as possible, Elastigroup empowers you with the flexibility to pause your batch whenever Spot Instances are too expensive, automatically restarting the Batch once the Spot Market calms down again.

How To Run Your AWS Batch On Spot Today

Once you create an AWS Batch with unmanaged_compute_environment, there is an ECS cluster created behind the scenes by AWS. Elastigroup takes full ownership over this cluster, then starts working its magic to proactively utilize Spot Instances whenever you run a new batch job. Practically, you just need to do the following:

  1. Run a batch job on AWS Batch
  2. Select “unmanaged compute environment”. This will create an ECS cluster behind the scenes.
  3. Locate the ECS cluster that was created by AWS Batch
  4. Now head over to the Spotinst console and create an Elastigroup
  5. Connect the Elastigroup to the ECS

If you want your Batches only to run on Spot (absolute lowest cost but could delay job completion), be sure to unselect “fallback to On-Demand”. If you’re looking to emphasize speed, keep that selected. Now your Batch will run at the lowest cost and longest lasting Spot Instances (when available), ensuring the speediest results possible.

 

 

Automatically Scale Your Cluster

Spotinst Elastigroup automatically increases or decreases instances in your cluster to meet your application demand based on various metrics. In AWS Batch, your ECS Cluster can scale up and down according to the number of status of your jobs in the Batch Queue. This scaling solution is based on Spotinst Functions: The Scaling Batch function is constantly monitoring your jobs state, triggering the scaling according to the required resources.

Runnable Job is a job ready to be scheduled to a host. Jobs in this state are started as soon as sufficient resources are available in one of the compute environments that are mapped to the job’s queue. When Spotinst function detects jobs in this status, it will automatically scale up the cluster to meet the job’s needed resources.

Running Job is running as a container job on an Amazon ECS container instance within a compute environment.
Once there are no ‘runnable’ or ‘running’ jobs in the cluster, the function will automatically trigger a scale down action.

Dynamic Scaling Function will scale you cluster based on the required resources by the job. For example, if your job requires to resources for 5 m3.medium instances, they will be launched automatically by the function. Please note that we will never scale the cluster above the maximum or below the minimum.

Download Batch ZIP function and Get Started Using Spotinst Functions
Or,
Download Dynamic Scaling ZIP function

Required parameters for the functions:

Serverless.yml file:

environment Modify the  parameter according to your Serverless environment ID

handler.py file:

# Spotinst Credentials
group Your Spotinst group ID (i.e sig-1234567)
account Spotinst Account ID, can be found under ‘settings’ -> Account in Spotinst console (i.e act-123456)
token Spotinst API token, can be found under ‘settings’ -> API -> Token

 

# AWS Credentials
Create a new user for AWS Batch Scaling purposes, and assign it the following IAM policy: AWSBatchFullAccess

aws_account AWS Account ID of the new Batch user
aws_secret AWS Account Secret of the new Batch user
region AWS region
queue AWS Batch Queue for scaling