fbpx

MCS’s Health Checks for ECS

MCS’s health checks for ECS is a process which detects the status of your ECS instance and marks it as healthy (ready for task operating) or as unhealthy (can’t process tasks as needed).

ECS Health Check is based on 2 parameters:
1. Instance status (active/draining/inactive)
2. Spotinst ECS agent connectivity status (true/false)

Based on these 2 parameters the Health Check status is evaluated and can return one of the following:

  • HEALTHY: The ECS Agent is connected and active, and the container instance status is either draining or active.
  • UNKNOWN: The API call ECS Describe Container Instances returned an error for the cluster. Alternatively, the API call may have failed.
  • UNHEALTHY: The unhealthy status might be a result of one of the following situations:
    • The Instance was not registered to the ECS cluster properly.
    • The ECS Agent isn’t connected.
    • The Instance status is neither draining nor active.

We rely on ECS Describe Container Instances and specifically the AgentConnected and status keys to verify a container instance’s health.

Configuring ECS Auto Healing

ECS Auto Healing is a process which initializes an instance replacement in case your ECS instance is marked as unhealthyor Unknown (for a specified time). If an instance fails the health check, it is automatically replaced with a new instance and the unhealthy Instance is removed from the Elastigroup. This process prevents situations of Idle instances in your cluster which cannot process cluster tasks as needed. Autohealing is enabled by default for newly created ECS Elastigroups. You can edit the configuration of existing Elastigroups with the following steps.

Step 1: Open Auto Healing Configurations 

For existing Elastigroups integrated with ECS, click on Actions in the upper right-hand corner of the Elastigroup view and select Edit Configuration. In Elastigroup’s Management View select Switch to full edit wizard in the bottom left of the view.

Auto Healing is configured in the Compute view of the Creation Wizard, under Load Balancers. Select ECS under Auto Healing.

 

Step 2: Set Health Check Grace Period and Unhealthy Duration
  • Health Check Grace Period: Specify the time (in seconds) to allow an instance to boot and applications to fully start before the first health check. If an instance fails the health check after the given delay, it will be terminated and replaced with a new instance.
  • Unhealthy Duration: Specify the amount of time (in seconds) you want to keep existing instances that are deemed unhealthy before the instance is terminated and replaced with a new one.
What’s Next

Learn more about our MCS Elastigroups integration with ECS clusters.