Difference between revisions of "Dynamic Feedback Load Balancing Scheduling"

From LVSKB
Jump to: navigation, search
m
m (Examples)
 
(22 intermediate revisions by the same user not shown)
Line 3: Line 3:
 
Dynamic feedback load balancing scheduling algorithm considers the real-time loading and response time of each backend server, and adjusts the percentage of forwarded requests among the servers, in order to avoid some servers that may be overloaded still receive a lot of requests. This algorithm can help improve throughput of the whole system.
 
Dynamic feedback load balancing scheduling algorithm considers the real-time loading and response time of each backend server, and adjusts the percentage of forwarded requests among the servers, in order to avoid some servers that may be overloaded still receive a lot of requests. This algorithm can help improve throughput of the whole system.
  
The working environment of this algorithm is illustrated in the following figure, there is a Monitor Daemon running on the [[load balancer]], which is to monitor the availability and load information of each server. The Monitor Daemon can compute an aggregated load value based on all the load information for each server, and calculate a new server weight according to the aggregated load value. If the difference of new server weight and old server weight is more than the threshold, the Monitor Daemon adjusts the server weight with the new one in the IPVS inside the kernel. Usually, [[Weighted Round-Robin Scheduling|weighted round-robin scheduling]] or [[Weighted Least-Connection Scheduling|weighted least-connection scheduling]] is used for connection scheduling, because server weight can be adjusted in the weighted algorithms.
+
The working environment of this algorithm is illustrated in the following figure, there is a Monitor Daemon running on the [[load balancer]], which is to monitor the availability and load information of each server. The Monitor Daemon can compute an aggregated load value based on all the load information for each server, and calculate a new server weight according to the aggregated load value. If the difference of new server weight and old server weight is more than the threshold, the Monitor Daemon adjusts the server weight with the new one in the [[IPVS]] inside the kernel. Usually, [[Weighted Round-Robin Scheduling|weighted round-robin scheduling]] or [[Weighted Least-Connection Scheduling|weighted least-connection scheduling]] is used for connection scheduling, because server weight can be adjusted in the weighted algorithms.
  
 
[[Image:Load-manager.jpg|center]]
 
[[Image:Load-manager.jpg|center]]
Line 9: Line 9:
 
== Connection Scheduling ==
 
== Connection Scheduling ==
  
 +
When clients access network services, service time and computing resource consumption at servers for each request may vary a lot. It depends on many things, such as the type of service request, current network situation, and current resource consuption at server. Usually, heavy load requests may do computing-intensive query, database access, and serve long response data stream; light load requests may just access static HTML files or small image files.
 +
 +
The variance of request service time may lead to skew of server utilization, i.e. load imbalance among servers. For example, there are four web pages, A, B, C and D, in which D is time-consuming dynamic web page. When many people access those web pages, it is possible that all the requests for web page D are sent to the same server, which would cause this server overloaded, in the mean while the other servers are idle. Therefore, the overloaded servers may still receive new requests and put them into waiting queue, this would increase service time and lower the quality of service.
  
 
== Dynamic Feedback Mechanism ==
 
== Dynamic Feedback Mechanism ==
  
 +
Network services usually have a lot of small transcations and some long transactions, in which those long transaction may occupy high percentage of the whole work load. Therefore, we need to design a load balancing algorithm to avoid that long transactions might be always assigned to one or a small set of servers, and try to shape access burst into even distribution among servers.
 +
 +
We can use dynamic feedback mechanism to control new connection dispatching, to make load distribution more even among servers. For example, [[Weighted Round-Robin Scheduling]] is used to dispatch new connections inside the kernel in [[IPVS]] [[load balancer]]; there is a program called "Monitor Daemon" running in the user-space of [[load balancer]], which is to monitor the availability and load information of each server periodically. The Monitor Daemon can compute the aggregated load value based on all the load information for each server and compute a new server weight according to the aggregated load value. If the aggregated load value is too high, the new weight is adjusted to a smaller value than the current one, so less new connections will be assigned to this server. If the aggregated load value is small, the new weight is adjusted to a higher value than the current one, so more new connections will be sent to this server. The Monitor Daemon only adjusts the server weight inside the kernel in [[IPVS]] when the server weight difference is larger than a threshold. The Monitor Daemon collects the load information of each server and adjusts server weight periodically. In short, this is a negative feedback mechanism for better utitlization of servers.
 +
 +
In the weighted scheduling algorithms implemented inside the kernel for IPVS, no new connections will be sent to a server when its weight is set zero, but the established connections still get served by the server. System administrator can use this feature to  make a server quiescent. When all the established connections to this server have finished, he/she can switch this server out of the cluster for system maintenance. Therefore, the dynamic feedback load balancing algorithm should protect this feature, when the server weight is zero, the weight of this server will not be adjusted.
  
 
== Aggregate Load ==
 
== Aggregate Load ==
  
 +
There are two major types of load information for computing aggregate load: input metric and server metric. The input metric can be collected at the load balancer, and the server metric is all kinds of load information at servers. We want to use aggregate load value to indicate the real load information for servers, but for different application, there might be different load information, therefore we can introduce coefficient for different load information, to indicate the weight of different load information in aggregate load computing. System administrator can adjust those cofficients according to the requirement of different application, and can also configure the time interval to collect load information.
  
 +
The input metric is the ratio of connections that a server has received to the average connections received at a time unit. The load balancer can count all connections for each server. For server ''Si'', there are counters <math>C_{i1}</math> and <math>C_{i2}</math> for time ''T1'' and ''T2'', server ''Si'' has received the number of connections (''Ni = Ci2 - Ci1'') during the time interval (''T2 - T1''). The ''INPUTi'' can be calculated as follows:
 +
<center>
 +
<math>INPUT_i = \frac{N_i}{\sum_{m=1}^n N_m / n}</math>
 +
</center>
 +
 +
The server metric records all kinds of server load information, such as current server CPU usage '''LOADi''', current disk usage '''DISKi''', current memory usage '''MEMORYi''' and current process number '''PROCESSi'''. There are two ways to get those information. One is that all the servers run SNMP(Simple Network Management Protocol) service and the Monitor Daemon running on the load balancer use SNMP to query those information at servers, the other is to run agent at servers to collect load information, and the agents report the information to the Monitor Daemon on the load balancer.
 +
 +
The reponse metric is the response time of server processing request, which can reflect the length of request waiting queue and the processing time of requests. The Monitor Daemon on the load balancer can work as client to access servie and measure response time. For example, the Monitor Daemon can send "GET /" HTTP request to web server, and record the response time '''RESPONSEi''' of server processing this request. If there is no response from server in the specified time interval, the Monitor Daemon can consider that server is temporarily unavailable now, and set the weight of this server zero.
 +
 +
Before computing the aggregate load, we need to normalize LOADi, DISKi, MEMORYi, PROCESSi and RESPONSEi in the interval [0, ∞), in which 1 means right load, the value larger than 1 means overloaded, and the value of less than 1 means less loaded.
 +
 +
We can also introduce a set of coefficient '''Ri''' for different load metrics to indicate the importance of different load metrics, in which ΣRi = 1. The aggregate load can be calculated in the following formula:
 +
<center>
 +
<math>AGGREGATELOAD_i = R_1 * INPUT_i + R_2 * LOAD_i + R_3 * DISK_i + R_4 * MEMORY_i + R_5 * PROCESS_i + R_6 * RESPONSE_i</math>
 +
</center>
 +
 +
For example, we can use coefficient {0.1, 0.3, 0.1, 0.1, 0.1, 0.3} for web cluster, in which CPU load and response time are considered more important. If the coefficient Ri cannot reflect the load of application well, system administrator can adjust them, until the right coefficients is found for current application.
 +
 +
As for time interval to query server loads and adjust weight, although a short interval can reflect server load well, too frequent queries will bring more overhead in both server and load balancer, which may also affect the server load metrics. Therefore, it's a tradeoff, we usually suggest to use time interval between 5 and 20 seconds.
  
 
== Weight Computation ==
 
== Weight Computation ==
  
 +
When a server is added into cluster system, system administrator can assign the server an initial weight '''DEFAULT_WEIGHTi''', which is an indicator of the server processing capacity. The Monitor Daemon can use DEFAULT_WEIGHTi as reference in adjusting weight over time. In order to prevent server weight from turning into a huge number, we can set weight adjustion range (0, DEFAULT_WEIGHTi * SCALE], in which SCALE can be tuned, its initial value can be 10.
 +
 +
The Monitor Daemon does the following things periodially, if the server DEFAULT_WEIGHTi isn't zero, queries all server load metrics, and calculates the aggregate load of this server '''AGGREGATELOADi'''. The following formular is used to calculate the aggregate load:
 +
<center>
 +
<math>w_i = \begin{cases} w_i + A * \sqrt[3]{1 - AGGREGATELOAD_i} & AGGREGATELOAD_i \ne 1 \\
 +
w_i & AGGREGATELOAD_i = 1 \end{cases}
 +
</math>
 +
</center>
 +
 +
in which,1 is the system utilization that we want to achieve, A is an tunable coefficient (its default value is 5). When the aggregate load is 1, the server weight will remain the same; when the aggregate load is larger than 1, the server weight be adjusted to a smaller value; when the aggregate load is less than 1, the server weight be adjusted to a larger value. If the new server weight is out of the range (0, DEFAULT_WEIGHTi * SCALE], weight will not be adjusted.
 +
 +
In real application system, if each server weight is less than its DEFAULT_WEIGHTi, it means that the whole cluster system is overloaded, there is a need to add new servers to take over some load; if each server weight is near  DEFAULT_WEIGHTi * SCALE, it means that the whole system is under light load.
  
 
== Examples ==
 
== Examples ==
  
 +
In Red Hat cluster management tool [[Piranha]], a simple version of dynamic feedback load balancing algorithm is used. In the aggregate load calculation, it only considers the value of server load average, its formula is as follows:
 +
<center>
 +
<math>w_i = \begin{cases} w_i + A * \sqrt[3]{1 - LOAD_i} & LOAD_i \ne 1 \\
 +
w_i & LOAD_i = 1 \end{cases}
 +
</math>
 +
</center>
 +
 +
The server weight adjustion range is [DEFAULT_WEIGHTi, DEFAULT_WEIGHTi * 10], A is DEFAULT_WEIGHTi /2, the threshold of weight adjustion is DEFAULT_WEIGHTi /4. [[Piranha]] processes query server load, compute and adjust server weight every 20 seconds.
 +
 +
The other real example used in a production environment on JANET web cache cluster is at http://archive.linuxvirtualserver.org/html/lvs-users/2000-02/msg00064.html
  
 
== Conclusion ==
 
== Conclusion ==
  
 +
Since service time of each request varies a lot, the connection scheduling algorithm inside the kernel may cause skew among the servers. Therefore, an dynamic feedback load balancing algorithm is introduced, it combines the weighted connection scheduling algorithms inside the kernel, takes advantage of dynamic feedback load information from servers, and adjusts server weight to control the percentage of request received among servers. The dynamic feedback load balancing algorithm can help avoid load imbalance among servers, improve system utilization, and increase system throughput.
  
 
[[Category:Job Scheduling Algorithms]]
 
[[Category:Job Scheduling Algorithms]]

Latest revision as of 23:32, 23 August 2006

Introduction

Dynamic feedback load balancing scheduling algorithm considers the real-time loading and response time of each backend server, and adjusts the percentage of forwarded requests among the servers, in order to avoid some servers that may be overloaded still receive a lot of requests. This algorithm can help improve throughput of the whole system.

The working environment of this algorithm is illustrated in the following figure, there is a Monitor Daemon running on the load balancer, which is to monitor the availability and load information of each server. The Monitor Daemon can compute an aggregated load value based on all the load information for each server, and calculate a new server weight according to the aggregated load value. If the difference of new server weight and old server weight is more than the threshold, the Monitor Daemon adjusts the server weight with the new one in the IPVS inside the kernel. Usually, weighted round-robin scheduling or weighted least-connection scheduling is used for connection scheduling, because server weight can be adjusted in the weighted algorithms.

Load-manager.jpg

Connection Scheduling

When clients access network services, service time and computing resource consumption at servers for each request may vary a lot. It depends on many things, such as the type of service request, current network situation, and current resource consuption at server. Usually, heavy load requests may do computing-intensive query, database access, and serve long response data stream; light load requests may just access static HTML files or small image files.

The variance of request service time may lead to skew of server utilization, i.e. load imbalance among servers. For example, there are four web pages, A, B, C and D, in which D is time-consuming dynamic web page. When many people access those web pages, it is possible that all the requests for web page D are sent to the same server, which would cause this server overloaded, in the mean while the other servers are idle. Therefore, the overloaded servers may still receive new requests and put them into waiting queue, this would increase service time and lower the quality of service.

Dynamic Feedback Mechanism

Network services usually have a lot of small transcations and some long transactions, in which those long transaction may occupy high percentage of the whole work load. Therefore, we need to design a load balancing algorithm to avoid that long transactions might be always assigned to one or a small set of servers, and try to shape access burst into even distribution among servers.

We can use dynamic feedback mechanism to control new connection dispatching, to make load distribution more even among servers. For example, Weighted Round-Robin Scheduling is used to dispatch new connections inside the kernel in IPVS load balancer; there is a program called "Monitor Daemon" running in the user-space of load balancer, which is to monitor the availability and load information of each server periodically. The Monitor Daemon can compute the aggregated load value based on all the load information for each server and compute a new server weight according to the aggregated load value. If the aggregated load value is too high, the new weight is adjusted to a smaller value than the current one, so less new connections will be assigned to this server. If the aggregated load value is small, the new weight is adjusted to a higher value than the current one, so more new connections will be sent to this server. The Monitor Daemon only adjusts the server weight inside the kernel in IPVS when the server weight difference is larger than a threshold. The Monitor Daemon collects the load information of each server and adjusts server weight periodically. In short, this is a negative feedback mechanism for better utitlization of servers.

In the weighted scheduling algorithms implemented inside the kernel for IPVS, no new connections will be sent to a server when its weight is set zero, but the established connections still get served by the server. System administrator can use this feature to make a server quiescent. When all the established connections to this server have finished, he/she can switch this server out of the cluster for system maintenance. Therefore, the dynamic feedback load balancing algorithm should protect this feature, when the server weight is zero, the weight of this server will not be adjusted.

Aggregate Load

There are two major types of load information for computing aggregate load: input metric and server metric. The input metric can be collected at the load balancer, and the server metric is all kinds of load information at servers. We want to use aggregate load value to indicate the real load information for servers, but for different application, there might be different load information, therefore we can introduce coefficient for different load information, to indicate the weight of different load information in aggregate load computing. System administrator can adjust those cofficients according to the requirement of different application, and can also configure the time interval to collect load information.

The input metric is the ratio of connections that a server has received to the average connections received at a time unit. The load balancer can count all connections for each server. For server Si, there are counters <math>C_{i1}</math> and <math>C_{i2}</math> for time T1 and T2, server Si has received the number of connections (Ni = Ci2 - Ci1) during the time interval (T2 - T1). The INPUTi can be calculated as follows:

<math>INPUT_i = \frac{N_i}{\sum_{m=1}^n N_m / n}</math>

The server metric records all kinds of server load information, such as current server CPU usage LOADi, current disk usage DISKi, current memory usage MEMORYi and current process number PROCESSi. There are two ways to get those information. One is that all the servers run SNMP(Simple Network Management Protocol) service and the Monitor Daemon running on the load balancer use SNMP to query those information at servers, the other is to run agent at servers to collect load information, and the agents report the information to the Monitor Daemon on the load balancer.

The reponse metric is the response time of server processing request, which can reflect the length of request waiting queue and the processing time of requests. The Monitor Daemon on the load balancer can work as client to access servie and measure response time. For example, the Monitor Daemon can send "GET /" HTTP request to web server, and record the response time RESPONSEi of server processing this request. If there is no response from server in the specified time interval, the Monitor Daemon can consider that server is temporarily unavailable now, and set the weight of this server zero.

Before computing the aggregate load, we need to normalize LOADi, DISKi, MEMORYi, PROCESSi and RESPONSEi in the interval [0, ∞), in which 1 means right load, the value larger than 1 means overloaded, and the value of less than 1 means less loaded.

We can also introduce a set of coefficient Ri for different load metrics to indicate the importance of different load metrics, in which ΣRi = 1. The aggregate load can be calculated in the following formula:

<math>AGGREGATELOAD_i = R_1 * INPUT_i + R_2 * LOAD_i + R_3 * DISK_i + R_4 * MEMORY_i + R_5 * PROCESS_i + R_6 * RESPONSE_i</math>

For example, we can use coefficient {0.1, 0.3, 0.1, 0.1, 0.1, 0.3} for web cluster, in which CPU load and response time are considered more important. If the coefficient Ri cannot reflect the load of application well, system administrator can adjust them, until the right coefficients is found for current application.

As for time interval to query server loads and adjust weight, although a short interval can reflect server load well, too frequent queries will bring more overhead in both server and load balancer, which may also affect the server load metrics. Therefore, it's a tradeoff, we usually suggest to use time interval between 5 and 20 seconds.

Weight Computation

When a server is added into cluster system, system administrator can assign the server an initial weight DEFAULT_WEIGHTi, which is an indicator of the server processing capacity. The Monitor Daemon can use DEFAULT_WEIGHTi as reference in adjusting weight over time. In order to prevent server weight from turning into a huge number, we can set weight adjustion range (0, DEFAULT_WEIGHTi * SCALE], in which SCALE can be tuned, its initial value can be 10.

The Monitor Daemon does the following things periodially, if the server DEFAULT_WEIGHTi isn't zero, queries all server load metrics, and calculates the aggregate load of this server AGGREGATELOADi. The following formular is used to calculate the aggregate load:

<math>w_i = \begin{cases} w_i + A * \sqrt[3]{1 - AGGREGATELOAD_i} & AGGREGATELOAD_i \ne 1 \\ w_i & AGGREGATELOAD_i = 1 \end{cases} </math>

in which,1 is the system utilization that we want to achieve, A is an tunable coefficient (its default value is 5). When the aggregate load is 1, the server weight will remain the same; when the aggregate load is larger than 1, the server weight be adjusted to a smaller value; when the aggregate load is less than 1, the server weight be adjusted to a larger value. If the new server weight is out of the range (0, DEFAULT_WEIGHTi * SCALE], weight will not be adjusted.

In real application system, if each server weight is less than its DEFAULT_WEIGHTi, it means that the whole cluster system is overloaded, there is a need to add new servers to take over some load; if each server weight is near DEFAULT_WEIGHTi * SCALE, it means that the whole system is under light load.

Examples

In Red Hat cluster management tool Piranha, a simple version of dynamic feedback load balancing algorithm is used. In the aggregate load calculation, it only considers the value of server load average, its formula is as follows:

<math>w_i = \begin{cases} w_i + A * \sqrt[3]{1 - LOAD_i} & LOAD_i \ne 1 \\ w_i & LOAD_i = 1 \end{cases} </math>

The server weight adjustion range is [DEFAULT_WEIGHTi, DEFAULT_WEIGHTi * 10], A is DEFAULT_WEIGHTi /2, the threshold of weight adjustion is DEFAULT_WEIGHTi /4. Piranha processes query server load, compute and adjust server weight every 20 seconds.

The other real example used in a production environment on JANET web cache cluster is at http://archive.linuxvirtualserver.org/html/lvs-users/2000-02/msg00064.html

Conclusion

Since service time of each request varies a lot, the connection scheduling algorithm inside the kernel may cause skew among the servers. Therefore, an dynamic feedback load balancing algorithm is introduced, it combines the weighted connection scheduling algorithms inside the kernel, takes advantage of dynamic feedback load information from servers, and adjusts server weight to control the percentage of request received among servers. The dynamic feedback load balancing algorithm can help avoid load imbalance among servers, improve system utilization, and increase system throughput.