Serverless Infrastructure Autoscaling
Large-scale compute clusters are expensive, so it is important to use them well. Utilization and efficiency can be increased by running a mix of workloads on the same machines: CPU- and memory-intensive jobs, small and large ones, and a mix of offline and low-latency jobs – ones that serve end-user requests or provide infrastructure services such as storage, naming or locking.
The challenge of scaling containers
The ECS topology is built on Clusters, where each cluster has Services (which can be referred to as applications), and services run Tasks. Each Task has a Task definition which tells the scheduler how much resources the Task requires.
For example, if a cluster runs 10 machines of
2 vCPUs and
3.8 GiB of RAM) and 10 machines of
4 vCPUs and
7.5 GiB of RAM), the total vCPUs is
61,440 CPU Units and the total
RAM is 113 GiB.
The issue here is that if a single Task requires more RAM than the individual instance has, it can’t be scheduled. In the example above, a task with
16 GiB of RAM won’t start, despite the total available RAM being 113 GiB. MCS’s Autoscaler matches the Task with the appropriate Instance Type / Size, with zero overhead or management.
The ECS Autoscaler dynamically scales the cluster up and down to ensure there are always sufficient resources to run all tasks and at the same time maximizing resource efficiency in the cluster. It does this by optimizing task placement across the cluster in a process we call Tetris Scaling, and by automatically managing
Headroom – a buffer of spare capacity (in terms of both memory and CPU) that makes sure that when you want to scale quickly more containers, you don’t have to wait for new VMs (Instances) to be provisioned.
Scale Down Behavior
Elastigroup monitors the Cluster for idle instances. An instance is considered idle if it has less than
40% CPU and Memory utilization.
When an instance is found idle for the specified amount of consecutive periods, Elastigroup will ensure that the running containers on the idle instance will find a new home on other instances, then, it will drain the Containers, reschedule those on other instances and terminate the idle instance.
Scale down uses the Evaluation Period which is defined as the number of consecutive minutes to check before determining that an instance is underutilized.
Labels & Constraints
Elastigroup supports built-in and custom Task placement constraints within the scaling logic. Task placement constraints give you the ability to control where tasks are scheduled, such as in a specific Availability Zone or on instances of a specific type. You can utilize the built-in ECS container attributes or create your own custom key-value attribute and add a constraint to place your tasks based on the desired attribute. To get started with task placement constraints see the tutorial here.
Daemon tasks run on each instance or on a selected set of instances in an Amazon ECS cluster and can be used to provide common functionality, such as logging and monitoring. MCS’s Autoscaler now automatically identifies and accounts for daemon tasks when optimizing capacity allocation to make sure the launched instances has enough capacity for both the
daemon services and the
pending tasks. It also monitors for new container instances in the cluster and adds the daemon tasks to them. The Autoscaler supports and considers Daemon services and tasks, both for scale down and scale up behavior.
Scale down: Daemon task which was a part of a scaled down instance won’t initialize a launch of a new instance and will not be placed on a different container instance.
Scale up: In case there is a Daemon scheduling strategy configured to one of the cluster services, the ECS Auto-Scaler will consider all newly launched instances to have enough spare capacity available in order to run the Daemon task properly in addition to other pending tasks.