Building an LDAP cluster using LVS and NetWare real servers

From LVSKB
Revision as of 15:10, 9 September 2006 by Wensong (Talk | contribs) (tidy up and add more links)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Requirement

Improve resilience of LDAP service on an existing network through introduction of a load balancer with fail over/fail back capability for LDAP and LDAPS requests to NetWare 6.5 eDirectory servers. The fail over/fail back checks must determine whether the service is truly available (i.e. that a valid request returns a valid response). The network has a flat structure and this cannot change . Therefore the solution needs to work in a flat network where all computers can communicate directly with each other.

Solution

A separate load balancer is required, and there are two possible solutions for how communications will work

1. NAT based. All requests are addressed to the Load Balancer which forwards them to the real servers. In addition, the from address must be re-written on all forwarded requests so that responses come back through the load balancer, otherwise the response packets would go direct to the original requester and be rejected (this is due to the flat network, normally when NAT is used like this, the load balancer is the default gateway). High(er) processing overhead, lack of transparency of where requests are coming from on the realservers.

2. Direct routing. All requests come to the Load Balancer and are redirected to the same IP address on the Load Balancers, which then respond directly to the original requestor. Because this reponse appears to come from the ip address that the original request was sent to, it is accepted. This solution works fine in a flat network (as well as a routed one), but requires addresses to be bound onto the same interface as the real IP address on the real servers without the servers advertising this IP address through ARP (See ARP Issues in LVS/DR and LVS/TUN Clusters).

Direct routing chosen due to limitations of NAT. Linux Virtual Server selected as load balancer since it is free, easy to set up and easy to configure and monitor.

Architecture

One LVS director, two real servers – all on same subnet as clients.


Installation and initial configuration

Loadbalancer:

  • Install Debian Stable (Sarge)
  • Get ip_vs loading in the kernel by adding it to /etc/modules and rebooting
  • Check module loaded with lsmod
  • Following UltraMonkey installation, run /usr/bin/enc2xs -C <- required for the LDAP negotiate check


For the test configuration, hearbeat was not used, ldirectord was


The load balancer is given a primary IP on eth0 of 10.0.0.32 and a service VIP of 10.0.4.1 on eth0:1. The real servers have primary IPs of 10.0.3.1 (real server 1) and 10.0.3.2 (real server 2). Both have the 10.0.4.1 VIP bound in as a secondary on the same interface as the primary IP of the server, and set not to advertise this address through ARP.

In the test setup the real servers are NetWare running eDirectgory. Non-ARP secondaries are installed by:

  • Start INETCFG from the Console
  • Select BINDINGS
  • Select the card with the primary address bound to it
  • Select Secondary IP Address Support
  • Enable it an add the VIP as a secondary IP Address, with ARPABLE set to No.


I've done this on NetWare 6.5 SP2, but haven't tried it on earlier versions.


Ldirectord configuration

Following stanzas in ldirectord are a full config for two LDAP Virtual Servers (one LDAP, on LDAPS) with two Real Servers. Remember that each line except the ones starting "virtual" need to start with a <TAB>

virtual=10.0.4.1:636
	real=10.0.3.1:636 gate
	real=10.0.3.2:636 gate
	service=ldap
	checktype=negotiate
	checkport=389
	negotiatetimeout=10
	request="o=users"
	receive="o=USERS"
	scheduler=rr
	protocol=tcp
	quiescent=yes
	checktimeout=10
	checkinterval=10

virtual=10.0.4.1:389
	real=10.0.3.1:389 gate
	real=10.0.3.2:389 gate
	service=ldap
	checktype=negotiate
	negotiatetimeout=10
	request="o=users"
	receive="o=USERS"
	scheduler=rr
	protocol=tcp
	quiescent=yes
	checktimeout=10
	checkinterval=10

Negotiate is used to monitor the availability of the real LDAP servers. The alternative is simply to monitor that there is a process listening on the service port on the real server. This does not allow for either the service to have become degraded through high load, causing it to be slow to response, or a fault causing it to cease responding to requests, even though there is still a process listening on the port. Not that the LDAPS service still uses LDAP port 389 to monitor the real servers since ldirectord cannot do LDAPS.

In the ldirectord daemon that monitors the availability of a real LDAP server, a simple request is performed for a name context. In the example above this context is o=users. Ldirectord performs an anonymous bind and searches at this context with a search scope of base. As long as anonymous access is allowed, the context itself is returned. Please note that the "receive" parameter must match exactly what is returned, including case (unless you want to get into using regular expressions).

The result of all this is that the availability of the real servers is checked every 10 seconds with a simple anonymous bind a request. If a valid response does not come within 10 seconds the server is marked quiescent and no further new connections are routed to it.

Quiescent or not? In LVS, a quiescent real server had no new connections directed to it, but existing connections are maintained. This is useful is the request that caused the real server to be slow to respond was a large or complex query since the results would still be passed returned, which would not be the case if the server was non-quiescent and was taken out of the available server list completely, with existing connections terminated. It is less useful if you use persistent connections for connection pooling (for example), where persistent connections are used to avoid the overhead of constantly creating new connections. In this case it would be best to have the real server non-quiescent since a connection pool should remove this connection from the pool (and create a new one in it's place).