Tolerance is not a virtue that feels like it is growing in the world at the moment. This rings true on the internet more so than anywhere. Outages of web services, or websites going down is something that loses confidence in a business or organisation, so being able to offer services that are robust and highly tolerant of outage is increasingly a must. Web systems need to be able to handle sudden spikes in traffic, failure of servers, and anything else that can be thrown at them in order to still be able to serve customers reliably.
Here at Webmad we run a number of High Availability systems, so what we are going to do on this post, is to outline the basic concept behind them that we use to run PHP based web systems like WordPress and Moodle / Totara. We tend to use Amazon Web Services for setups like this as it has a tonne of tools that we are very familiar with that get the job done nicely, no matter how big the deployment.
Heres the graphic that outlines our usual setup:
So – lets go through the setup, what does what, and how we go about making it happen.
One of the key considerations of a high availability setup is for it to try to minimise any single points of failure within the system. Any points in the system where the failure of one component can mean an outage / failure, shouldn’t be acceptable – there should be redundancies to cater for outages etc etc. With this in mind, we select tools within the Amazon suite that factor for this.
First we start with the EFS (Elastic File System) service. This is selected, as one of the important things to bear in mind is that we can’t guarantee any page loads on the website will use the same web server each time. To rely on that would be tragic if the server the user was interacting with needed to be taken down / had a fault. By using a shared file system, replicated across multiple data centers and regions, uploads and data that needs to be persisted between all user facing servers can be shared effectively. Each server mounts the filesystem, and can access any files as needed. Our standard setups would only use shared filesystems for user contributed data, not for plugin or core system files. Shared network filesystems like EFS do not have the speed required for web systems, especially PHP systems, to include the multitude of files that typically get included into the system just to return one page load. By keeping these files on each servers EBS (Elastic Block Storage) based storage (equivalent to the servers hard drive) speed is optimal for a fast user experience. Typically user uploaded content does not need high performance, so using a network based filesystem is just fine.
The next service we make use of is the Relational Database Service (RDS) service. This service allows you to set up replicated database servers of any size you need, for mysql or postgresql based databases. They also have a service called Amazon Aurora which is a high efficiency cloud optimised mysql compatible service that allows for multiple replications over multiple data centers in multiple regions. These services allow you to scale your servers vertically (ie increase the power of the servers) and horizontally (more servers). Used with services like proxysql to spread load etc, you can get very flexible setups.
The core of many of our setups is to use a service called Elastic Beanstalk (EB). Elastic Beanstalk is a powerful set of services that allow you to operate and monitor a self healing, high availability setup. It sets up the load balancer used to route incoming web traffic to the web servers depending on load on each server etc, and also provides a firewall to restrict public access to just the open ports you need for your application to work. Elastic Beanstalk tracks how many virtual servers in the Elastic Compute Cloud (EC2) you will be running at any one time, and maintains this number of servers. It also allows you to define triggers to add additional servers depending on shared load across all servers, or any other triggers you define to add ro remove servers from the system.
One of the key considerations with Elastic Beanstalk is that you can only use server images from the one elastic beanstalk environment if you are looking to restore backups into the system. So – what we would normally do is fire up the elastic beanstalk environment, and then take an AMI image from one of the running servers. We would then create a new EC2 server from that image. This server will be used as a seed for the system. What I mean by a seed, is that we can make any changes to the system, or setup the application on this seed, and mount filesystems, and connect to databases etc, and then take an image of that seed once we are happy with it, and then within elastic beanstalk, update the base AMI id (the image that all servers started within the elastic beanstalk environment use) with the image from the seed.
The other advantage of running a seed server is that it can be used as a semi-staging server so you can test code changes before they are rolled out to full production, whilst still being in the production environment. The seed can also be used to run cron tasks for the system to keep cron tasks away from user facing servers so that extra load does not impact user experience. This is very useful for systems like Moodle / Totara that can run some rather large data collection / processing cron tasks. It is also handy to ensure that these tasks are only run on a single server, rather than all user facing nodes (servers) trying to run the same cron tasks at once.
With this setup, using elastic beanstalk to monitor server laoding and automatically cycle out replacement servers when anything goes wrong, or add and remove servers as needed to handle incoming load. There will always be a small period of time while new servers are launched where load may exceed capacity, but this can be minimised by having sensible early trigger levels for scaling, and suitably sized servers to handle typical load on the system. Ideally running at least 2 web servers is ideal.
To add capability to the system for speed optimisation or stability reasons, other fun things to try are to add AWS memcached or REDIS services into your application to cache session data or pre-compiled code in order to speed up operations. This is highly recommended for Moodle / Totara setups. You can also look to use tools like s3fs as alternatives to amazons EFS systems. This can be higher performing, but comes with additional risks with synchronisation settings. You can also investigate using rsync of files between shared filesystems and local (on-server system drives) in order to maintain optimised end user performance whilst maintaining the ability to update files across all servers relatively easily.
That’s a brief run-down of some of the high availability systems that the team at Webmad operate for various clients that need to factor for variable traffic loads without end user experience failure. If you’d like to get into more details, feel free to contact us to discuss your requirements, and how we can customise a system that will work for your needs.