In an environment hosting large number of web applications employing the traditional architecture with web server farm in the DMZ, and web applications in the walled garden zone static routing of requests using Apache mod_proxy or similar variants require restarting of web servers which require careful planning for maintenance. In addition large number of entries for static routing in configuration files is error prone. A better alternate would be a mod_proxy replacement which doesn't require restart of web servers, and automatic discovery of availability of services with the capability to add new services, and withdraw existing services.
Discussed in this blog is a simple idea introducing a proxy between the web servers and the application servers to enable dynamic routing of requests from the web servers to the application servers which eliminates the need for restarting web servers.
Traditional Architecture
The traditional architecture with web server farm in DMZ, and application servers within walled garden zone provide a neat scalable architecture for high volume web services. In the traditional approach web servers in DMZ are configured as proxy servers to application servers using web server specific modules/connectors: e.g. apache mod_proxy.so for Apache Web Server, Tomcat AJP connector for routing the requests to Tomcat server, and several variations of the same proxy concept.Need for a Dynamic Proxy
Typically the proxy configuration is manual, and any change to the configuration requires a restart of the web server. While the manual configuration of routing can be automated using a variety of techniques, the requirement of restarting web servers for the changes to be effective is a significant drawback in case of micro services with large number of applications with large volumes of requests, new services added, existing services withdrawn. A desired solution would be a proxy module which can automatically pick up the changes in the environment, and the changes in the environment do not require web server restart for the changes to be effective. One such solution is to use a repository service where services register themselves, and de-register when withdrawn, and the proxy search the repository for the availability and end point of services. The repository could be a database, a network file system, a zookeeper or any other shared storage system. Apache Curator Framework provides extensions for such a storage system to store and query services and their endpoints. As such the dynamic proxy discussed herein is targeted towards using Apache Curator Framework.
Apache Curator Framework
Apache Curator Framework is a high-level API that greatly simplifies using ZooKeeper adding many features that build on ZooKeeper. In addition extensions for Service Discovery, and Service Discovery Server are provided.Self Registration
A service hosted in a servlet container can make use of the servlet container's life cycle methods for publishing their availability after servlet context initialized, and unavailability within the servlet context destroyed method. Within the servlet context initialized method using the servlet context hostname, and port bindings can be obtained and used to register the internal end point for self registration. With self registration of services, services can publish their end points and availability dynamically.Service Discovery
Depending on how the services publish their end points these end points could be accessible either externally or internally or both. Client would execute a query for the service end point, and the query would then send the request to the end point thus retrieved.
Issues with Accessing Services through Load Balancer
The published end points could be external server names/ip addresses or internal server names/ip addresses which the load balancer is unaware off until load balancer is reconfigured. While applications within the walled garden zone may be able to access these services using the internal addresses, either(both) load balancer reconfiguration, and web server proxy reconfiguration is(are) mandatory to make these services available through the load balancer. In addition publishing internal server names/ip addresses to to the external services is not desirable as it could potentially be a security threat or the internal server names could not be resolved by external clients or the ip addresses are not accessible.Services Aware Dynamic Proxy
Routing requests from load balancer/web server proxy to a services aware proxy server would eliminate the need for reconfiguring the load balancer or web server proxy, and restarting web servers in case of web servers being used as proxies. This dynamic proxy would be responsible for routing the services to actual service end points using the internal hostname/IP Address. As the proxy can access the services using the internal ip addresses, proxying the requests through the load balancer, web server would enable accessing the services externally through the load balancer.
This proxy will be responsible for intercepting the request, route the request to a functional endpoint using service discovery. Apache mod_proxy can be modified to dynamically route the requests or a new module using Apache HttpClient library can be created querying the service discovery server, and routing the requests dynamically, eliminating the need to restart the web server. Code at use-netty-to-proxy-your-requests can be modified to create the dynamic proxy as suggested herein.
Summary
Discussing the need for a dynamic proxy proposed a dynamic proxy querying the service discovery server to dynamically route the requests which would simplify deployment of services.