- Setting up Auditing & Logging of Files/Objects Using Native Windows File Server Tools - 16th October 2020
- Designing Key Performance Indicators (KPI) - 15th July 2020
- DDOS Attacks and Website Hacking - 6th July 2020
A deeper look into a critical component of the Cloud
This is another entry in our series looking Inside the Cloud. In this blog post, we look a little more deeper into load balancers – what they are, what they do and how they work.
Load balancers play one of the most critical parts of a system – they are the components that provide the automated scalability and self-healing features of a solution. Load balancers broadly come in two types on the OSI Layer Model: Layer 7 (Application Load Balancers (ALB)) and Layer 4 (Network Load Balancers (NLB)).
Application and Network Load Balancers
ALBs use the information in the HTTP headers to determine where to route traffic. ALBs are thus context aware and can differentiate between requests for multiple applications by examining the application layer data available to it in a request. This means it can determine the direction requests should be forwarded to using a combination of variables such as content type, cookie data, custom headers, user location, or the application behaviour. Additionally, it can monitors the health of an application too.
NLBs on the other hand use only IP addresses and destination ports to decide where to route traffic without any awareness of the application at all.
Layer 4 load balancers are configured to
- Know all the addresses of all the servers within the environments it is load balancing
- Keep a regular check on the health, usage and availability of all the servers it is load balancing
Each load balancer has its benefits and which type is used is dependent on the use case and business requirements.
For the purpose of examples on this website, we we mainly use a Layer 4 NLB as they resolve requests quicker than ALBs and usually tend to be used more often in generic data centre use cases.
Load Balancing Algorithms and Server Pools
The load balancer is configured to keep a track of previous requests, and using it’s algorithms, determine which is the individual server most likely to provide a response in the most efficient (usually quickest) way. A common example of a load balancing algorithm is round robin.
In the round robin approach, the load balancer keeps a record of the previous request and which server it was forwarded to. When a new request comes in, it selects the next server in the server pool that has not previously had a request to forward the new request to. It will then add the first server back to the pool of servers that can be forwarded a request, but will add that server to the end of the server pool queue, so that the other servers that haven’t had a request yet will be given the new requests first.
Health and Usage and Availability Checks
This is a critical component of a scalable and self-healing solution. In fact, this is one of the major reasons why load balancers play such a critical role in modern cloud data centres.
Load balancers are configured to continuously, and regularly monitor the health, usage and availability of servers. Additionally, they are configured to react and implement fixes based on the monitoring data.
The load balancer will check to see if a server is reachable and responsive. It does this by checking to see if it can access a file (called the ‘health file’) in a certain location on the server. If it is, then the load balancer keeps that server in the pool of available servers to forward requests to. If however it is not responsive or reachable, the load balancer will begin to heal the overall system (which it does without the need for manual or human intervention).
Server templates and the creation of additional servers
In modern day solutions, templates are created of servers. Thus, when a server is created, it is actually created using the parameters and configuration of that server template. These templates enable clusters of servers to be made, which are exact copies of each other – they have the same data, the same configurations etc. as each other.
Thus, when the load balancer becomes aware of an unreachable and unresponsive server, it will create a new, additional, server based on the template of the server that is currently down, and add that to the pool of available servers (and remove the downed server from the server queue of the pool) so that overall system availability and performance are maintained.
Scalability – Performance and Usage
The load balancer also continuously monitors the usage load and performance of servers within it’s server pool. The resources it monitors, such as monitoring CPU utilisation, and the thresholds upon which load balancer actions are triggered are configurable. When a server is becoming over loaded and there is a risk of the system underperforming (for e.g. CPU utilisation due to many performance intensive requests), the load balancer will create another server using the server template, add it to the server pool and begin transferring traffic to the new server to even out performance amongst all the servers.
Similarly, when individual server performance begins to fall below a threshold, the load balancer shuts a server down and deletes it from it’s server pool to save on resource costs.
The load balancer also has a configurable minimum number of servers and maximum number of servers that it will always maintain in it’s server pool to balance out cost and performance. i.e. it will never shut down so many servers that the number of servers in the server pool falls below the defined minimum level and degrade system performance, and it will never create more servers than the defined maximum number of servers and result in unforeseen costs.
Pingback: Designing a Scalable, Highly Resilient, Self-Healing Cloud Architecture